Server Admin Log/Archive 67

2023-06-30

22:34 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
22:20 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2013.*
22:20 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2014.*
22:20 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2015.*
22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2016.*
22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2017.*
22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2018.*
22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2019.*
22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2020.*
22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2022.*
22:09 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 47s)
22:08 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
22:08 bking@deploy1002: deploy aborted: 0.3.124 (duration: 00m 00s)
22:08 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
22:00 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
22:00 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
21:58 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
21:58 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
21:07 jhathaway: debugging a cert issue on pki1001.eqiad
21:03 dzahn@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release
21:00 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
20:59 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
20:59 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
20:57 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
20:29 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:29 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
19:55 mutante: please hold code changes and deploys if using gitlab - upgrade in progress
19:53 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release
19:26 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:25 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
19:25 dzahn@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release
19:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
18:25 brennen@deploy1002: Finished scap: Backport for Fix bug in opening dialog (T340816) (duration: 08m 37s)
18:20 mutante: upgrading gitlab on gitlab-replica.wikimedia.org
18:19 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
18:18 brennen@deploy1002: brennen and jforrester: Backport for Fix bug in opening dialog (T340816) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
18:16 brennen@deploy1002: Started scap: Backport for Fix bug in opening dialog (T340816)
18:06 dzahn@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release
16:59 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
16:27 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
16:27 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
16:26 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
16:25 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
16:25 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
16:25 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
16:09 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1149.eqiad.wmnet with OS bullseye
15:35 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:35 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:21 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:43 jiji@cumin1001: conftool action : γετ; selector: service=kube-apiserver
14:42 sbassett: Deployed updated mitigation for T337593
14:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1149.eqiad.wmnet with OS bullseye
14:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
13:23 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
13:23 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
12:39 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
12:30 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS bullseye
12:22 jbond@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003']
12:20 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubestagemaster2002.codfw.wmnet with OS bullseye
12:17 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
12:17 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
12:16 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
12:10 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
12:09 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
12:03 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
11:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1002.eqiad.wmnet with OS bullseye
11:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
11:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
11:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1002.eqiad.wmnet with reason: host reimage
11:38 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
11:36 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1002.eqiad.wmnet with reason: host reimage
11:31 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
11:28 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubestagemaster2002.codfw.wmnet with OS bullseye
11:28 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
11:28 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubestagemaster1002.eqiad.wmnet with OS bullseye
11:23 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
11:23 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sretest1003']
11:22 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
11:15 jayme: published image docker-registry.discovery.wmnet/envoy:1.18.3-2-s3 and docker-registry.discovery.wmnet/envoy-future:1.23.10-1-s1 - T300324
11:14 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sretest1003']
11:14 jayme: imported envoyproxy 1.23.10 to component/envoy-future in buster-wikimedia - T300324
11:05 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
11:05 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
11:05 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
11:05 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sretest1003']
11:04 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
10:45 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
10:24 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:22 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:20 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:15 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
10:15 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sretest1003']
10:14 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
10:13 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sretest1003']
10:12 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
08:00 jayme: rolled back envoyproxy package in buster-wikimedia component/envoy-future to 1.18.3-1 - T300324
07:52 jayme: removed docker-registry.discovery.wmnet/envoy-future:1.26.1-1 - T300324
06:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on urldownloader[2001-2002].wikimedia.org with reason: pending decom
06:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on urldownloader[2001-2002].wikimedia.org with reason: pending decom
06:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on urldownloader[1001-1002].wikimedia.org with reason: Setup in progress
06:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on urldownloader[1001-1002].wikimedia.org with reason: Setup in progress

2023-06-29

21:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
21:25 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
21:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
21:18 samtar@deploy1002: Finished scap: Backport for IS: Phonos, reorder and enable for mediawikiwiki (T336763) (duration: 08m 26s)
21:11 samtar@deploy1002: samtar: Backport for IS: Phonos, reorder and enable for mediawikiwiki (T336763) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
21:10 samtar@deploy1002: Started scap: Backport for IS: Phonos, reorder and enable for mediawikiwiki (T336763)
20:13 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye
20:01 mutante: contint* servers: restarted apache after deploying gerrit:932435
19:50 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS bullseye
19:48 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:30 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
19:30 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
19:29 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
19:29 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
19:29 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
19:28 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
19:17 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
19:16 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
19:10 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
19:10 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
18:37 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Restarting to upgraded JVM - eevans@cumin1001
18:33 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore200[1-3]*: Restarting to upgraded JVM - eevans@cumin1001
18:29 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye
18:17 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Restarting to upgraded JVM - eevans@cumin1001
18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.15 refs T340243
18:15 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore200[1-3]*: Restarting to upgraded JVM - eevans@cumin1001
18:06 brennen: train 1.41.0-wmf.15 (T340243): no current blockers, logs calm, rolling to all wikis
17:46 taavi@deploy1002: Finished scap: Backport for Revert "Add extends warning to reference dialog" (T247922 T340757) (duration: 11m 06s)
17:38 taavi@deploy1002: matmarex and taavi: Backport for Revert "Add extends warning to reference dialog" (T247922 T340757) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
17:35 taavi@deploy1002: Started scap: Backport for Revert "Add extends warning to reference dialog" (T247922 T340757)
17:10 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
17:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1002.eqiad.wmnet with OS bullseye
17:07 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:06 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:06 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:05 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:04 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
16:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
16:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1002.eqiad.wmnet with reason: host reimage
16:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1002.eqiad.wmnet with reason: host reimage
16:50 jiji@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kubestagemaster2002.codfw.wmnet
16:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2002.codfw.wmnet with OS bullseye
16:41 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubestagemaster1002.eqiad.wmnet with OS bullseye
16:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
16:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
16:22 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
16:21 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
16:18 mutante: releases1003 - re-enabling puppet after recent webserver debugging
16:18 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubestagemaster2002.codfw.wmnet with OS bullseye
16:17 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
16:16 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
16:16 jiji@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kubestagemaster2002.codfw.wmnet on all recursors
16:16 jiji@cumin1001: START - Cookbook sre.dns.wipe-cache kubestagemaster2002.codfw.wmnet on all recursors
16:16 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:16 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
16:12 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp
16:11 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp
16:10 sukhe: systemctl restart bird.service on doh2002
16:04 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
16:04 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
16:04 klausman@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
16:03 klausman@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
16:03 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
15:59 jiji@cumin1001: START - Cookbook sre.dns.netbox
15:59 jiji@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubestagemaster2002.codfw.wmnet
15:49 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp
15:49 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp
15:49 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
15:48 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
15:47 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
15:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1486.eqiad.wmnet
15:35 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1486.eqiad.wmnet
15:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1485.eqiad.wmnet
15:34 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1485.eqiad.wmnet
15:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1484.eqiad.wmnet
15:34 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1484.eqiad.wmnet
15:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1483.eqiad.wmnet
15:34 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1483.eqiad.wmnet
15:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1482.eqiad.wmnet
15:34 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1482.eqiad.wmnet
15:31 claime: Pooled mw148[2-6].eqiad.wmnet as jobrunners - T329366
15:29 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw148[2-6].eqiad.wmnet,cluster=jobrunner
15:27 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1001"
15:25 cgoubert@cumin1001: conftool action : set/pooled=no; selector: name=mw148[2-6].eqiad.wmnet
15:25 cgoubert@cumin1001: conftool action : set/weight=10; selector: name=mw148[2-6].eqiad.wmnet
15:23 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1484.eqiad.wmnet with OS buster
15:21 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1485.eqiad.wmnet with OS buster
15:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1483.eqiad.wmnet with OS buster
15:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1482.eqiad.wmnet with OS buster
15:16 moritzm: installing Java 8 security updates on sessionstore/codfw
15:06 Daimona: Creating new DB tables for the CampaignEvents extension in x1.testwiki, x1.test2wiki, x1.officewiki, and x1.wikishared # T340000
14:54 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1486.eqiad.wmnet with reason: host reimage
14:53 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1484.eqiad.wmnet with reason: host reimage
14:51 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1485.eqiad.wmnet with reason: host reimage
14:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1483.eqiad.wmnet with reason: host reimage
14:47 jayme: published image docker-registry.discovery.wmnet/envoy-future:1.26.1-1 - T300324
14:46 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1482.eqiad.wmnet with reason: host reimage
14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1485.eqiad.wmnet with reason: host reimage
14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1486.eqiad.wmnet with reason: host reimage
14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1484.eqiad.wmnet with reason: host reimage
14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1483.eqiad.wmnet with reason: host reimage
14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1482.eqiad.wmnet with reason: host reimage
14:31 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw1484.eqiad.wmnet with OS buster
14:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host mw1486.eqiad.wmnet with OS buster
14:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host mw1485.eqiad.wmnet with OS buster
14:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host mw1484.eqiad.wmnet with OS buster
14:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host mw1484.eqiad.wmnet with OS buster
14:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host mw1483.eqiad.wmnet with OS buster
14:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host mw1482.eqiad.wmnet with OS buster
14:30 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:21 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw148[2-6].eqiad.wmnet
14:21 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: pick up Java 8 sec updates - jmm@cumin2002
14:20 claime: Depooling mw148[2-6].eqiad.wmnet from api_appserver to move them to jobrunners - T329366
14:19 jiji@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host kubestagemaster2002.codfw.wmnet
14:19 jiji@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kubestagemaster2002.codfw.wmnet on all recursors
14:19 jiji@cumin1001: START - Cookbook sre.dns.wipe-cache kubestagemaster2002.codfw.wmnet on all recursors
14:19 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:19 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
14:18 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
14:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet
14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
14:13 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:11 jiji@cumin1001: START - Cookbook sre.dns.netbox
14:10 jiji@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kubestagemaster2002.codfw.wmnet on all recursors
14:10 jiji@cumin1001: START - Cookbook sre.dns.wipe-cache kubestagemaster2002.codfw.wmnet on all recursors
14:10 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:10 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
14:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
14:10 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
14:07 jiji@cumin1001: START - Cookbook sre.dns.netbox
14:07 jiji@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubestagemaster2002.codfw.wmnet
14:04 jayme: imported envoyproxy 1.26.1 to component/envoy-future in buster-wikimedia - T300324
14:04 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
14:03 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
14:02 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
14:02 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
14:02 taavi: UTC afternoon backports done
14:01 taavi@deploy1002: Finished scap: Backport for Fix trying to get a PageRecord for a non-existent page (T340568), Revert "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace" (duration: 12m 01s)
14:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:00 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
13:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet
13:51 taavi@deploy1002: taavi and reedy: Backport for Fix trying to get a PageRecord for a non-existent page (T340568), Revert "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
13:49 taavi@deploy1002: Started scap: Backport for Fix trying to get a PageRecord for a non-existent page (T340568), Revert "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace"
13:44 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:44 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
13:44 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
13:43 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
13:40 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:40 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
13:40 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
13:39 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
13:38 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:38 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
13:38 moritzm: installing bind9 security updates (tools/libs only)
13:36 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
13:35 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:35 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
13:35 daniel@deploy1002: Finished scap: Backport for Disable PC writes for parsoid endpoints (T339867) (duration: 07m 07s)
13:32 moritzm: failover ganeti master in codfw to ganeti2020
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
13:29 daniel@deploy1002: daniel: Backport for Disable PC writes for parsoid endpoints (T339867) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
13:28 daniel@deploy1002: Started scap: Backport for Disable PC writes for parsoid endpoints (T339867)
13:27 taavi@deploy1002: Finished scap: Backport for Only send 1 suggestion per section (duration: 07m 08s)
13:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
13:22 taavi@deploy1002: mlitn and taavi: Backport for Only send 1 suggestion per section synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
13:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
13:20 taavi@deploy1002: Started scap: Backport for Only send 1 suggestion per section
13:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
13:14 taavi@deploy1002: Finished scap: Backport for Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace (T340697) (duration: 09m 05s)
13:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
13:10 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
13:10 derick@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
13:07 derick@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
13:07 taavi@deploy1002: taavi and func: Backport for Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace (T340697) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
13:07 derick@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
13:06 derick@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
13:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
13:05 taavi@deploy1002: Started scap: Backport for Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace (T340697)
13:05 derick@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
13:04 derick@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
13:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
13:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
13:00 derick@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
13:00 derick@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
12:58 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=parse1002.eqiad.wmnet
12:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
12:56 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=parse1002.eqiad.wmnet
12:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1002.eqiad.wmnet with OS buster
12:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1001"
12:53 btullis@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
12:50 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: pick up Java 8 sec updates - jmm@cumin2002
12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
12:48 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
12:46 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
12:46 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
12:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
12:43 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
12:42 btullis@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
12:42 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
12:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
12:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
12:34 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: pick up Java 8 sec updates - jmm@cumin2002
12:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
12:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
12:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2013.codfw.wmnet
12:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
12:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
12:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2013.codfw.wmnet
12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
11:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
11:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2014.codfw.wmnet
11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2015.codfw.wmnet
11:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2015.codfw.wmnet
11:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
11:31 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
11:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2015.codfw.wmnet
11:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
11:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
11:21 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudservices2005-dev.wikimedia.org
11:21 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudservices2005-dev.wikimedia.org
11:20 jiji@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host kubestagemaster1002.eqiad.wmnet
11:20 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubestagemaster1002.eqiad.wmnet with OS bullseye
11:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
11:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
11:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet
11:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
11:10 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
11:09 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
11:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
11:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
11:06 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: pick up Java 8 sec updates - jmm@cumin2002
11:02 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1001"
11:02 moritzm: installing Java 8 security updates
11:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
10:59 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
10:59 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
10:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
10:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
10:57 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
10:57 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
10:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
10:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2017.codfw.wmnet
10:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2018.codfw.wmnet
10:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
10:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
10:40 claime: vrt-wiki.wikimedia.org now hosted on mw-on-k8s - T340549
10:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:37 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp
10:35 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1002.eqiad.wmnet with reason: host reimage
10:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2018.codfw.wmnet
10:34 claime: Running puppet on cp-text trafficservers - T340549
10:32 claime: Redirect vrt-wiki.wikimedia.org to mw-on-k8s - T340549
10:32 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1002.eqiad.wmnet with reason: host reimage
10:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2019.codfw.wmnet
10:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
10:25 claime: office.wikimedia.org now hosted on mw-on-k8s - T337490
10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubestagemaster1002.eqiad.wmnet with OS bullseye
10:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
10:25 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster1002.eqiad.wmnet - jiji@cumin1001"
10:24 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster1002.eqiad.wmnet - jiji@cumin1001"
10:24 jiji@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kubestagemaster1002.eqiad.wmnet on all recursors
10:23 jiji@cumin1001: START - Cookbook sre.dns.wipe-cache kubestagemaster1002.eqiad.wmnet on all recursors
10:23 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:23 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster1002.eqiad.wmnet - jiji@cumin1001"
10:23 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster1002.eqiad.wmnet - jiji@cumin1001"
10:21 jiji@cumin1001: START - Cookbook sre.dns.netbox
10:21 jiji@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubestagemaster1002.eqiad.wmnet
10:20 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:20 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Delete records created by accident - jiji@cumin1001"
10:19 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Delete records created by accident - jiji@cumin1001"
10:19 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host parse1002.eqiad.wmnet with OS buster
10:18 claime: Running puppet on cp-text trafficservers - T337490
10:18 claime: Redirect office.wikimedia.org to mw-on-k8s - T337490
10:17 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp
10:17 jiji@cumin1001: START - Cookbook sre.dns.netbox
10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2019.codfw.wmnet
10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2020.codfw.wmnet
10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
10:10 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
10:09 jbond: puppetserver1001 added back to puppet-merge
10:09 claime: www.mediawiki.org now hosted on mw-on-k8s - T337490
10:08 jiji@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host kubestagemaster1002.eqiad.wmnet
10:08 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
10:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
10:06 jiji@cumin1001: START - Cookbook sre.dns.netbox
10:06 jiji@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubestagemaster1002.eqiad.wmnet
10:03 claime: Running puppet on cp-text trafficservers - T337490
10:02 claime: Redirect www.mediawiki.org to mw-on-k8s - T337490
10:00 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp
09:59 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp
09:58 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:58 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for wikikube-staging masters - jiji@cumin1001"
09:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
09:57 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for wikikube-staging masters - jiji@cumin1001"
09:57 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:53 jiji@cumin1001: START - Cookbook sre.dns.netbox
09:53 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2020.codfw.wmnet
09:46 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams and A:cp
09:43 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams and A:cp
09:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2022.codfw.wmnet
09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
09:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
09:36 moritzm: restarting FPM on mw canaries to pick up libx11 updates
09:30 moritzm: installing libx11 security updates
09:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2022.codfw.wmnet
09:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
09:22 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams and A:cp
09:21 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams and A:cp
09:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
09:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2025.codfw.wmnet
08:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
08:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
08:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
08:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2025.codfw.wmnet
08:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2026.codfw.wmnet
08:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
08:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2026.codfw.wmnet
08:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2027.codfw.wmnet
08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
08:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
08:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2027.codfw.wmnet
08:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
08:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
08:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
07:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
07:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
07:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2029.codfw.wmnet
07:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
07:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
06:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
06:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2030.codfw.wmnet
06:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
06:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
01:33 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
01:32 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply

2023-06-28

22:51 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
22:50 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
21:14 eileen_: civicrm upgraded from 0a59d203 to 9e04c92d
20:10 brennen@deploy1002: Finished scap: Backport for Revert "Deprecate use of targets" (duration: 07m 23s)
20:05 brennen@deploy1002: jdlrobson and brennen: Backport for Revert "Deprecate use of targets" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:03 brennen@deploy1002: Started scap: Backport for Revert "Deprecate use of targets"
19:46 brennen: train 1.41.0-wmf.15 (T340243): deploying a revert for T127268 related deprecation logspam - this is likely to impinge on upcoming backport window, which currently has no patches. will update when finished.
19:13 mutante: contint1002,2002,2001 - sudo chmod -R g-w /etc/zuul/wikimedia with deploying gerrit:927980 for T338277
19:03 mutante: contint* - temp disabled puppet - deploying gerrit:927980 - related to git cloning zuul config on CI servers
18:20 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.15 refs T340243 (duration: 06m 18s)
18:14 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.15 refs T340243
18:02 brennen: train 1.41.0-wmf.15 )
18:02 brennen: train 1.41.0-wmf.15 (T340243): no current blockers, rolling to group1.
18:01 brennen: train 1.41.0-wmf.15 (
17:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-test-coord1002.eqiad.wmnet
17:15 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:15 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-test-coord1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
17:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1003.eqiad.wmnet with OS bullseye
17:06 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-test-coord1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
17:03 btullis@cumin1001: START - Cookbook sre.dns.netbox
16:57 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-test-coord1002.eqiad.wmnet
16:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
16:46 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
16:45 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
16:34 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:33 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:33 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:33 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:33 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1003.eqiad.wmnet with OS bullseye
16:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1002.eqiad.wmnet with OS bullseye
16:23 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:23 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:22 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:22 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:22 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:20 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:19 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:18 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:17 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:11 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
16:10 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:10 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:07 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
16:01 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:54 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1002.eqiad.wmnet with OS bullseye
15:54 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
15:53 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
15:53 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
15:53 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
15:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2031.codfw.wmnet
15:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
15:50 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1001.eqiad.wmnet with OS bullseye
15:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
15:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
15:40 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2031.codfw.wmnet
15:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
15:37 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
15:37 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
15:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
15:34 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2032.codfw.wmnet
15:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
15:32 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
15:32 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
15:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
15:31 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
15:29 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
15:29 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
15:28 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
15:24 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
15:23 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
15:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
15:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
15:16 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS bullseye
15:08 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
15:08 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
15:06 akosiaris: Disable Vodafone DE BGP peering on cr2-esams to troubleshoot reports of users from Germany
14:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3001.esams.wmnet
14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3001.esams.wmnet
14:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3001.esams.wmnet
14:19 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:19 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:08 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
14:08 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
14:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
14:07 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
14:06 btullis@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
14:04 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:04 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:04 reedy@deploy1002: Synchronized wmf-config/: Various changes (duration: 06m 27s)
14:04 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:04 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
13:57 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
13:57 reedy@deploy1002: Synchronized private: I62beb6 (duration: 06m 22s)
13:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3001.esams.wmnet
13:54 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:54 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
13:50 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
13:49 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
13:42 hashar@deploy1002: Finished deploy [gerrit/gerrit@1ae182f]: Fix wm-custom-links to show links in footer again - T340372 (duration: 00m 08s)
13:42 hashar@deploy1002: Started deploy [gerrit/gerrit@1ae182f]: Fix wm-custom-links to show links in footer again - T340372
13:39 moritzm: failover ganeti master in esams to ganeti3003
13:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3002.esams.wmnet
13:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3002.esams.wmnet
13:38 sukhe: sudo cumin 'A:dns-auth' 'enable-puppet "merging CR 926509"'
13:37 jbond: remove puppetserver from puppet-merge
13:36 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
13:36 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
13:36 reedy@deploy1002: Finished scap: Backport for Revert "Add <link rel="me"> to verify Mastodon account on mediawiki.org" (duration: 08m 51s)
13:35 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
13:35 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
13:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3002.esams.wmnet
13:30 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
13:28 reedy@deploy1002: legoktm and reedy: Backport for Revert "Add <link rel="me"> to verify Mastodon account on mediawiki.org" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:28 sukhe: sudo cumin 'A:dns-auth' 'disable-puppet "merging CR 926509"'
13:28 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-test-worker1003.eqiad.wmnet with OS bullseye
13:27 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
13:27 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-test-worker1003.eqiad.wmnet with OS bullseye
13:27 reedy@deploy1002: Started scap: Backport for Revert "Add <link rel="me"> to verify Mastodon account on mediawiki.org"
13:26 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
13:25 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
13:25 reedy@deploy1002: Finished scap: Backport for Set $wgWBRepoSettings['defaultEntityNamespaces'] to false (T291617) (duration: 09m 19s)
13:23 btullis@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
13:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3002.esams.wmnet
13:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3003.esams.wmnet
13:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3003.esams.wmnet
13:20 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:19 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
13:17 reedy@deploy1002: reedy and lucaswerkmeister-wmde: Backport for Set $wgWBRepoSettings['defaultEntityNamespaces'] to false (T291617) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
13:16 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
13:16 reedy@deploy1002: Started scap: Backport for Set $wgWBRepoSettings['defaultEntityNamespaces'] to false (T291617)
13:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3003.esams.wmnet
13:15 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
13:14 reedy@deploy1002: Finished scap: Backport for eowikisource: Add project namespace alias (T340609) (duration: 08m 18s)
13:12 jbond: add puppetserver to puppet-merge
13:09 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw and not P{cp2027*} and A:cp
13:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3003.esams.wmnet
13:07 reedy@deploy1002: reedy and anzx: Backport for eowikisource: Add project namespace alias (T340609) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
13:05 reedy@deploy1002: Started scap: Backport for eowikisource: Add project namespace alias (T340609)
13:05 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
13:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
13:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
13:05 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
13:04 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
13:04 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
12:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
12:53 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw and not P{cp2027*} and A:cp
12:46 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2027*} and A:cp
12:44 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2027*} and A:cp
12:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
12:29 moritzm: failover ganeti master in eqsin to ganeti5007
12:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
12:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
12:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
11:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
11:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
11:33 volans@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
11:33 volans@cumin2002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
11:33 volans@cumin2002: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling restart_daemons on A:netbox-canary
11:33 volans@cumin2002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
11:18 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
11:08 claime: Reverting migration to rsync::quickdatacopy for deployment servers - T289857
11:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
11:04 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Roll restart to pick up Java 11 - elukey@cumin1001
11:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
11:02 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:58 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
10:57 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
10:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
10:57 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
10:55 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
10:52 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
10:51 claime: Migrating to rsync::quickdatacopy for deployment servers - T289857
10:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
10:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
10:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
10:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
10:47 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Roll restart to pick up Java 11 - elukey@cumin1001
10:47 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Roll restart to pick up Java 11 - elukey@cumin1001
10:44 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:42 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:42 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
10:41 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:38 fabfur@cumin1001: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-text_codfw
10:35 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
10:34 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
10:31 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:29 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Roll restart to pick up Java 11 - elukey@cumin1001
10:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
10:21 hnowlan: disabling puppet on A:cp-text for testing 933508
10:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
10:11 vgutierrez: repool cp4037
10:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
10:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
10:01 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
09:57 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
09:57 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
09:55 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
09:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 8 hosts with reason: Decommissioning
09:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 8 hosts with reason: Decommissioning
09:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
09:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
09:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
09:09 vgutierrez: depool cp4037 for some ATS tests
09:08 moritzm: failover ganeti master in codfw to ganeti4008
09:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
08:40 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
08:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
08:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
08:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
08:15 marostegui: Failover m5-master to dbproxy1027 T337812
08:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
08:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
08:07 marostegui: Failover m2-master to dbproxy1025 T337812
08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
08:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
07:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
07:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
07:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
07:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
07:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
07:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
07:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
07:08 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
07:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
07:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
07:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
06:42 marostegui: Failover m1-master to dbproxy1024 T337812
01:37 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui (duration: 02m 20s)
01:35 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui
01:24 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui (duration: 01m 58s)
01:22 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui

2023-06-27

23:58 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host xhgui2002.codfw.wmnet
23:58 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host xhgui2002.codfw.wmnet with OS bookworm
23:49 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host xhgui1002.eqiad.wmnet
23:49 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host xhgui1002.eqiad.wmnet with OS bookworm
23:43 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on xhgui2002.codfw.wmnet with reason: host reimage
23:40 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on xhgui2002.codfw.wmnet with reason: host reimage
23:34 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on xhgui1002.eqiad.wmnet with reason: host reimage
23:31 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on xhgui1002.eqiad.wmnet with reason: host reimage
23:23 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host xhgui2002.codfw.wmnet with OS bookworm
23:23 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM xhgui2002.codfw.wmnet - denisse@cumin1001"
23:22 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM xhgui2002.codfw.wmnet - denisse@cumin1001"
23:22 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) xhgui2002.codfw.wmnet on all recursors
23:22 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache xhgui2002.codfw.wmnet on all recursors
23:22 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:22 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM xhgui2002.codfw.wmnet - denisse@cumin1001"
23:21 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM xhgui2002.codfw.wmnet - denisse@cumin1001"
23:20 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host xhgui1002.eqiad.wmnet with OS bookworm
23:20 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM xhgui1002.eqiad.wmnet - denisse@cumin1001"
23:19 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM xhgui1002.eqiad.wmnet - denisse@cumin1001"
23:19 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) xhgui1002.eqiad.wmnet on all recursors
23:19 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache xhgui1002.eqiad.wmnet on all recursors
23:19 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:19 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM xhgui1002.eqiad.wmnet - denisse@cumin1001"
23:18 denisse@cumin1001: START - Cookbook sre.dns.netbox
23:18 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM xhgui1002.eqiad.wmnet - denisse@cumin1001"
23:18 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host xhgui2002.codfw.wmnet
23:16 denisse@cumin1001: START - Cookbook sre.dns.netbox
23:16 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host xhgui1002.eqiad.wmnet
22:43 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui (duration: 01m 27s)
22:42 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui
21:50 mutante: prometheus4002 - sudo a2dismod access_compat ; sudo systemctl restart apach2 ; sudo apachectl configtest -> Syntax OK :) - to proof it works without the access_compat module T258686
21:45 mutante: prometheus* - puppet and partially manaul restart of apaches after deploying gerrit:932443
20:50 TheresNoTime: close UTC late backport window
20:48 samtar@deploy1002: Finished scap: Backport for Title: Fix exists() assertion in toPageRecord() (T340568) (duration: 06m 52s)
20:43 samtar@deploy1002: matmarex and samtar: Backport for Title: Fix exists() assertion in toPageRecord() (T340568) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:41 samtar@deploy1002: Started scap: Backport for Title: Fix exists() assertion in toPageRecord() (T340568)
20:20 samtar@deploy1002: Finished scap: Backport for Remove most DiscussionTools feature configs (T322497), Remove references to auth-api.php (T204193) (duration: 06m 53s)
20:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:16 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:15 samtar@deploy1002: reedy and esanders and samtar: Backport for Remove most DiscussionTools feature configs (T322497), Remove references to auth-api.php (T204193) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:13 samtar@deploy1002: Started scap: Backport for Remove most DiscussionTools feature configs (T322497), Remove references to auth-api.php (T204193)
20:13 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
20:10 samtar@deploy1002: Finished scap: Backport for Remove unused config $wgVisualEditorAllowLossySwitching (T339871), Remove wgDiscussionToolsEnable config (T322497) (duration: 07m 35s)
20:04 samtar@deploy1002: esanders and samtar and matmarex: Backport for Remove unused config $wgVisualEditorAllowLossySwitching (T339871), Remove wgDiscussionToolsEnable config (T322497) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:03 samtar@deploy1002: Started scap: Backport for Remove unused config $wgVisualEditorAllowLossySwitching (T339871), Remove wgDiscussionToolsEnable config (T322497)
20:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
19:59 brennen@deploy1002: Finished deploy [phabricator/deployment@a25a737]: deploy latest state to phab1004 (duration: 00m 38s)
19:59 brennen@deploy1002: Started deploy [phabricator/deployment@a25a737]: deploy latest state to phab1004
19:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: patch application
19:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: patch application
19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: patch application
19:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: patch application
19:55 kindrobot@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
19:54 kindrobot@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
19:53 kindrobot@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
19:51 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
19:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
19:33 brennen@deploy1002: Finished scap: Backport for Drop redundant targets (T340499) (duration: 07m 51s)
19:27 brennen@deploy1002: brennen: Backport for Drop redundant targets (T340499) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
19:25 brennen@deploy1002: Started scap: Backport for Drop redundant targets (T340499)
19:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
19:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
18:47 sukhe: upgrade dns6001 to gdnsd 3.99.0~alpha2
18:41 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.15 refs T340243
18:40 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
18:31 brennen@deploy1002: Finished scap: Backport for Display the language button on pages without languages (T315036) (duration: 08m 53s)
18:29 jhathaway: puppet re-enabled, enjoy!
18:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
18:27 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
18:26 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
18:25 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
18:25 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
18:24 brennen@deploy1002: abi and brennen: Backport for Display the language button on pages without languages (T315036) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
18:22 brennen@deploy1002: Started scap: Backport for Display the language button on pages without languages (T315036)
18:18 jhathaway: disabling puppet to test stdlib upgrade patch
17:45 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:45 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
17:45 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:45 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
17:44 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
17:44 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
17:22 brennen@deploy1002: Pruned MediaWiki: 1.41.0-wmf.12 (duration: 02m 05s)
17:20 brennen@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.15 refs T340243 (duration: 42m 56s)
16:49 mutante: webperf1003/2003 restarted apache after deploying gerrit:932441
16:37 brennen@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.15 refs T340243
16:36 brennen: train 1.41.0-wmf.15: re-running scap stage-train (T340243)
16:03 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:51 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
15:36 jbond: puppet-merge fixed again
15:35 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
15:34 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
15:34 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
15:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
15:33 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
15:33 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
15:32 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
15:32 root@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
15:24 root@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
15:24 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
15:24 jbond: puppet-merge temprrarily broken
15:23 jbond: hi all fyi i have temporarily broken puppet-merge, fix is being done
15:23 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
15:23 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
15:21 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
15:20 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
15:01 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:53 mforns@deploy1002: Finished deploy [airflow-dags/analytics@5e77b01]: (no justification provided) (duration: 00m 10s)
14:52 mforns@deploy1002: Started deploy [airflow-dags/analytics@5e77b01]: (no justification provided)
14:47 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
14:46 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
14:41 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Roll restart to pick up new certs and openjdk version - elukey@cumin1001
14:27 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
14:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
14:23 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Roll restart to pick up new certs and openjdk version - elukey@cumin1001
14:21 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Roll restart to pick up new certs and openjdk version - elukey@cumin1001
14:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
14:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
14:04 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Roll restart to pick up new certs and openjdk version - elukey@cumin1001
13:32 elukey: expand ml-staging200[12] kubelet partitions - T339231
13:27 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
13:26 joal@deploy1002: Finished deploy [airflow-dags/analytics@9eca77f]: Regular analytics weekly train [airflow-dags/analytics@9eca77f7] (duration: 00m 09s)
13:26 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
13:26 joal@deploy1002: Started deploy [airflow-dags/analytics@9eca77f]: Regular analytics weekly train [airflow-dags/analytics@9eca77f7]
13:18 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
13:06 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
12:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
12:57 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
12:57 marostegui: Failover m3-master to dbproxy1026 T337812
11:55 daniel@deploy1002: Finished scap: Backport for Parsoid: Disable PC writes on enwiki (T339867) (duration: 12m 06s)
11:51 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
11:50 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
11:44 daniel@deploy1002: daniel: Backport for Parsoid: Disable PC writes on enwiki (T339867) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
11:43 daniel@deploy1002: Started scap: Backport for Parsoid: Disable PC writes on enwiki (T339867)
11:21 daniel@deploy1002: Finished scap: Backport for Parsoid: Disable PC writes on dewiki (T339867) (duration: 08m 34s)
11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
11:14 daniel@deploy1002: daniel: Backport for Parsoid: Disable PC writes on dewiki (T339867) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
11:12 daniel@deploy1002: Started scap: Backport for Parsoid: Disable PC writes on dewiki (T339867)
11:08 joal@deploy1002: Finished deploy [analytics/refinery@259c5e2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@259c5e2] (duration: 01m 43s)
11:06 joal@deploy1002: Started deploy [analytics/refinery@259c5e2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@259c5e2]
11:06 joal@deploy1002: Finished deploy [analytics/refinery@259c5e2] (thin): Regular analytics weekly train THIN [analytics/refinery@259c5e2] (duration: 00m 04s)
11:06 joal@deploy1002: Started deploy [analytics/refinery@259c5e2] (thin): Regular analytics weekly train THIN [analytics/refinery@259c5e2]
11:04 joal@deploy1002: Finished deploy [analytics/refinery@259c5e2]: Regular analytics weekly train [analytics/refinery@259c5e2] (duration: 08m 23s)
11:02 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
10:55 joal@deploy1002: Started deploy [analytics/refinery@259c5e2]: Regular analytics weekly train [analytics/refinery@259c5e2]
10:48 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Roll restart to pick up new certs and openjdk version - elukey@cumin1001
10:43 hnowlan: disabling puppet on A:cp-text to test rollout of r/929674
10:41 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
10:33 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
10:32 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
10:30 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Roll restart to pick up new certs and openjdk version - elukey@cumin1001
10:30 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
10:29 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
10:26 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
10:25 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
10:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
10:10 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
10:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
10:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
10:06 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
10:06 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
10:05 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:04 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:03 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
10:03 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-worker1002.eqiad.wmnet with OS bullseye
10:01 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
10:01 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
09:56 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
09:56 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
09:56 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
09:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
09:54 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
09:41 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
09:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: host reimage
09:36 akosiaris@deploy1002: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 07m 16s)
09:35 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin and not P{cp5032*} and A:cp
09:34 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: host reimage
09:27 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@1ddd94b] (releasing): (no justification provided) (duration: 00m 51s)
09:26 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@1ddd94b] (releasing): (no justification provided)
09:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1002.eqiad.wmnet with OS bullseye
09:20 moritzm: installing libvirt bugfix updates from Bullseye point release
09:12 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
09:12 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin and not P{cp5032*} and A:cp
09:11 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:11 kart_: Updated MinT to 2023-06-27-053706-production (T339896, T340236)
09:10 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
09:10 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
09:09 vgutierrez: repool cp1082
09:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
09:09 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
09:07 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
09:02 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
09:00 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
09:00 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
08:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes200[0-9].codfw.wmnet
08:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes100[0-9].eqiad.wmnet
08:58 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes100[0-9].eqiad.wmnet
08:58 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes200[0-9].codfw.wmnet
08:53 akosiaris@deploy1002: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 07m 21s)
08:52 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
08:47 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
08:45 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
08:42 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
08:42 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
08:41 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
08:41 kart_: Updated cxserver to 2023-06-27-053435-production (T339105)
08:38 elukey: revoked puppet cert for 'varnishkafka' and cleaned up its cergen's files in puppet private - T337825
08:33 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 19 hosts
08:33 root@cumin2002: START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 19 hosts
08:32 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
08:32 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 767 hosts
08:32 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
08:32 root@cumin2002: START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 767 hosts
08:31 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 1265 hosts
08:30 root@cumin2002: START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 1265 hosts
08:29 marostegui: Failover m2-master to dbproxy1022 T337812
08:28 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
08:28 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
08:25 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
08:24 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
08:14 kartik@deploy1002: Finished scap: Backport for Enable Content and Section Translation for 4 Wikipedias (T338123) (duration: 16m 17s)
08:03 moritzm: installing openjdk-8 security updates for bullseye
08:02 kartik@deploy1002: kartik: Backport for Enable Content and Section Translation for 4 Wikipedias (T338123) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
07:58 kartik@deploy1002: Started scap: Backport for Enable Content and Section Translation for 4 Wikipedias (T338123)
07:54 moritzm: uploaded openjdk-8 8u372-ga-1~deb11u1 to component/jdk8 for bullseye (forward port of Java 8 for Buster)
07:48 hashar: Restart Zuul due to stuck connection | T340518 | T309376
07:15 elukey: `sudo kill `pgrep -u paramd`` on stat1005 to unblock puppet
06:22 marostegui: Failover m1-master to dbproxy1022 T337812

2023-06-26

23:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery
23:21 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery
23:07 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
23:02 sbassett: Deployed updated mitigation for T336027
23:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
22:55 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
22:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
22:46 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
22:33 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
22:31 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
22:24 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
22:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
22:17 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)
22:17 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
22:17 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
22:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
22:05 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
21:58 eevans@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance
21:57 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
21:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
21:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
21:53 eevans@cumin2002: START - Cookbook sre.discovery.service-route pool sessionstore in codfw: maintenance
21:53 urandom: pooling sessionstore/codfw for bullseye upgrades — T340043
21:45 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
21:44 eevans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2003.codfw.wmnet with OS bullseye
21:43 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
21:39 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
21:36 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
21:26 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
21:22 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
21:22 eevans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
21:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
21:18 eevans@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
21:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
21:13 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2022.*
21:13 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
21:13 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:02 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2003.codfw.wmnet with OS bullseye
20:55 eevans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2003.codfw.wmnet with OS bullseye
20:45 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2003.codfw.wmnet with OS bullseye
20:42 eevans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2001.codfw.wmnet with OS bullseye
20:34 brennen@deploy1002: Finished deploy [phabricator/deployment@0529926]: deploy latest state to phab1004 (duration: 00m 31s)
20:33 brennen@deploy1002: Started deploy [phabricator/deployment@0529926]: deploy latest state to phab1004
20:30 brennen@deploy1002: Finished deploy [phabricator/deployment@a25a737]: deploy latest state to phab1004 (duration: 00m 34s)
20:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab2002.codfw.wmnet with reason: patch application
20:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on phab2002.codfw.wmnet with reason: patch application
20:30 brennen@deploy1002: Started deploy [phabricator/deployment@a25a737]: deploy latest state to phab1004
20:29 brennen@deploy1002: Finished deploy [phabricator/deployment@a25a737]: deploy latest state to phab2002 (duration: 00m 38s)
20:29 brennen@deploy1002: Started deploy [phabricator/deployment@a25a737]: deploy latest state to phab2002
20:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab1004.eqiad.wmnet with reason: patch application
20:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on phab1004.eqiad.wmnet with reason: patch application
20:27 brennen: deploying minor phabricator updates shortly
20:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab1004.eqiad.wmnet with reason: first setup
20:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on phab1004.eqiad.wmnet with reason: first setup
20:18 eevans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
20:16 eevans@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
20:00 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
19:49 akosiaris: force puppet run on cp hosts T340483
19:48 akosiaris: revert "Redirect www.mediawiki.org to mw-on-k8s", debugging T340483
19:24 eevans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2002.codfw.wmnet with OS bullseye
19:02 eevans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
18:57 eevans@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
18:42 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2002.codfw.wmnet with OS bullseye
18:38 eevans@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in codfw: maintenance
18:33 eevans@cumin2002: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: maintenance
18:33 urandom: depooling sessionstore/codfw for bullseye upgrades — T340043
18:07 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
18:07 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
18:06 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
18:05 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
18:05 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
18:05 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
18:04 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
18:04 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
18:03 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
18:03 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@32b4b99]: update dags to use discolytics 0.15.0 (duration: 00m 17s)
18:03 ebernhardson@deploy1002: Started deploy [airflow-dags/search@32b4b99]: update dags to use discolytics 0.15.0
18:02 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
17:53 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
17:53 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
16:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
16:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
16:21 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
16:21 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
16:19 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
16:18 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:52 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:45 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:41 moritzm: installing Java 8 security updates on stat* hosts
15:28 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
15:27 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
15:26 sukhe: upgrade dns5003 to gdnsd 3.99.0~alpha2
15:26 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
15:25 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
15:11 sukhe: re-enable puppet on P{C:bird::anycast_healthchecker} and finish rolling out CR 922514
15:01 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
15:01 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
15:00 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
15:00 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
14:55 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
14:55 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
14:54 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
14:53 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
14:53 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
14:53 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
14:51 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
14:51 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
14:47 hashar@deploy1002: Finished deploy [gerrit/gerrit@7db3f9b]: Fix up attribution name in wm-app-theme.js plugin (duration: 00m 08s)
14:46 hashar@deploy1002: Started deploy [gerrit/gerrit@7db3f9b]: Fix up attribution name in wm-app-theme.js plugin
14:40 sukhe: rolling out CR 922514 to A:durum: T336792
14:40 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
14:40 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
14:37 sukhe: rolling out CR 922514 to A:dns-auth: T336792
14:32 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
14:32 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
14:31 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
14:30 sukhe: rolling out CR 922514 to A:wikidough (-s1 -b30): T336792
14:30 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
14:28 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
14:28 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
14:23 sukhe: restart pdns-rec.service on doh6001 to test systemd binding to anycast-hc
14:19 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
14:17 sukhe: sudo cumin 'P{C:bird::anycast_healthchecker}' 'disable-puppet "merging CR 922514"'
14:16 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
14:06 elukey: move varnishkafka instances in esams to pki
13:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
13:50 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
13:48 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
13:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
13:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
13:46 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
13:45 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
13:40 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
13:39 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
13:29 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
13:29 sukhe: sudo cumin 'A:dns-auth' 'enable-puppet "merging CR 932248"'
13:26 daniel@deploy1002: Finished scap: Backport for Parsoid: Disable PC writes on frwiki (T339867) (duration: 10m 20s)
13:25 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
13:22 sukhe: sudo cumin 'A:dns-auth' 'disable-puppet "merging CR 932248"'
13:18 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@b3751e6]: (no justification provided) (duration: 00m 09s)
13:18 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@b3751e6]: (no justification provided)
13:17 daniel@deploy1002: daniel: Backport for Parsoid: Disable PC writes on frwiki (T339867) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:17 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
13:15 daniel@deploy1002: Started scap: Backport for Parsoid: Disable PC writes on frwiki (T339867)
13:05 claime: parse1012 pooled inactive for flapping investigation
13:03 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=parse1012.eqiad.wmnet
11:59 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudservices2005-dev
11:59 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices2005-dev
11:00 moritzm: installing libfastjson security updates
10:33 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudservices2005-dev - aborrero@cumin2002"
10:32 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudservices2005-dev - aborrero@cumin2002"
10:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts krb2001.codfw.wmnet
10:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: krb2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:25 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices2005-dev.codfw.wmnet with OS bullseye
10:25 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
10:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: krb2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:24 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox
10:19 claime: mw-on-k8s: Redirect www.mediawiki.org to mw-on-k8s - T337490
10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts krb2001.codfw.wmnet
10:01 claime: mw-on-k8s: Redirect closed wikis to mw-on-k8s - T337490
09:40 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices2005-dev.codfw.wmnet with reason: host reimage
09:37 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices2005-dev.codfw.wmnet with reason: host reimage
09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
09:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
09:18 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices2005-dev.codfw.wmnet with OS bullseye
09:17 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudservices2005-dev.codfw.wmnet with OS bullseye
09:17 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudservices2005-dev
09:17 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices2005-dev
09:11 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices2005-dev.codfw.wmnet with OS bullseye
09:10 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudservices2005-dev.codfw.wmnet on all recursors
09:10 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudservices2005-dev.codfw.wmnet on all recursors
09:10 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudservices2005-dev.mgmt.codfw.wmnet on all recursors
09:10 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudservices2005-dev.mgmt.codfw.wmnet on all recursors
09:09 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:09 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2005-dev - aborrero@cumin2002"
09:08 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2005-dev - aborrero@cumin2002"
09:06 aborrero@cumin2002: START - Cookbook sre.dns.netbox
08:19 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Paramita Das out of all services on: 19 hosts
08:18 root@cumin2002: START - Cookbook sre.idm.logout Logging Paramita Das out of all services on: 19 hosts
08:18 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Paramita Das out of all services on: 771 hosts
08:17 root@cumin2002: START - Cookbook sre.idm.logout Logging Paramita Das out of all services on: 771 hosts
08:15 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Paramita Das out of all services on: 1261 hosts
08:14 root@cumin2002: START - Cookbook sre.idm.logout Logging Paramita Das out of all services on: 1261 hosts
08:07 aborrero@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
08:07 aborrero@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
08:07 aborrero@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
08:06 aborrero@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
07:48 taavi@deploy1002: Finished scap: Backport for extwiki: Add an alias for old NS_PROJECT name (duration: 08m 49s)
07:41 taavi@deploy1002: taavi: Backport for extwiki: Add an alias for old NS_PROJECT name synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
07:39 taavi@deploy1002: Started scap: Backport for extwiki: Add an alias for old NS_PROJECT name
07:37 taavi@deploy1002: Sync cancelled.
07:36 taavi@deploy1002: taavi: Backport for extwiki: Update project namespace name (T337696) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
07:34 taavi@deploy1002: Started scap: Backport for extwiki: Update project namespace name (T337696)
07:31 taavi@deploy1002: Sync cancelled.
07:16 taavi@deploy1002: anzx and taavi: Backport for Change dewiki import sources (T340264), Rename namespace on extwiki (T337696) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
07:07 taavi@deploy1002: Started scap: Backport for Change dewiki import sources (T340264), Rename namespace on extwiki (T337696)
06:28 kart_: Updated cxserver to 2023-06-26-050753-production (T340236, T339896)
06:27 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:26 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1118 from dbctl T326683', diff saved to https://phabricator.wikimedia.org/P49477 and previous config saved to /var/cache/conftool/dbconfig/20230626-062036-marostegui.json
06:15 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:14 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
06:11 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
06:10 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-06-25

01:45 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui
01:35 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui (duration: 04m 05s)
01:31 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui
01:30 andrew@deploy1002: deploy aborted: asdf (duration: 00m 01s)
01:30 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: asdf

2023-06-23

16:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
16:23 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
16:02 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:51 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:27 urbanecm@deploy1002: Finished scap: Backport for Section images: Placeholder should serialize to empty string (T340170) (duration: 06m 56s)
14:26 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance
14:21 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in codfw: maintenance
14:20 urbanecm@deploy1002: Started scap: Backport for Section images: Placeholder should serialize to empty string (T340170)
14:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: HW issues
14:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: HW issues
13:35 Emperor: update private wiki container ACLs in eqiad-swift
13:30 Emperor: update private wiki container ACLs in codfw-swift
13:29 godog: add 200G to prometheus/k8s in eqiad
12:40 elukey: move varnishkafka drmrs instances to pki
12:10 Emperor: updating ACLs on wikipedia-office containers T340189 T338765
11:24 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:12 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:02 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1110.eqiad.wmnet
10:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1110.eqiad.wmnet
10:12 moritzm: installing vim security updates
09:26 moritzm: uploaded openjdk-8 8u372-ga-1~deb10u1 to component/jdk8 (forward port of Java 8 for Buster)
09:20 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1110.eqiad.wmnet
08:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-cache1001.eqiad.wmnet with reason: Working on pki
08:48 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-cache1001.eqiad.wmnet with reason: Working on pki
08:37 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1110.eqiad.wmnet
05:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14860
05:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 14860
04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P49472 and previous config saved to /var/cache/conftool/dbconfig/20230623-045758-root.json
01:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
01:15 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer

2023-06-22

21:00 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
19:41 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host phab-test1001.eqiad.wmnet
19:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host phab-test1001.eqiad.wmnet with OS buster
19:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab-test1001.eqiad.wmnet with reason: host reimage
19:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab-test1001.eqiad.wmnet with reason: host reimage
19:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:14 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host phab-test1001.eqiad.wmnet with OS buster
19:13 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
19:12 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1001.eqiad.wmnet on all recursors
19:11 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1001.eqiad.wmnet on all recursors
19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
19:11 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
19:09 dzahn@cumin1001: START - Cookbook sre.dns.netbox
17:32 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=97) Rolling upgrade/restart of Apache Traffic Server on P{cp1082*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:c
17:32 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp1082*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
17:04 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
17:03 brett@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-wikimedia-dns (exit_code=0) rolling restart_daemons on P{doh6001*} and A:wikidough
17:03 brett@cumin2002: START - Cookbook sre.dns.roll-restart-wikimedia-dns rolling restart_daemons on P{doh6001*} and A:wikidough
16:54 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
16:27 eevans@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
16:26 eevans@cumin2002: START - Cookbook sre.puppet.renew-cert for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
16:24 eevans@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
16:24 eevans@cumin2002: START - Cookbook sre.puppet.renew-cert for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
16:23 eevans@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
16:22 eevans@cumin2002: START - Cookbook sre.puppet.renew-cert for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
16:21 eevans@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
16:21 eevans@cumin2002: START - Cookbook sre.puppet.renew-cert for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
16:17 eevans@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
16:17 eevans@cumin2002: START - Cookbook sre.puppet.renew-cert for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
16:07 eevans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
16:00 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
15:58 eevans@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
15:52 eevans@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
15:52 eevans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
15:50 sukhe: running authdns-update to repool codfw
15:48 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:48 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
15:46 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
15:38 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
15:34 eevans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
15:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:29 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
15:22 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
15:01 eevans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
14:53 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
14:50 sukhe: upgrade dns3001 to gdnsd 3.99.0~alpha2
14:47 eevans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
14:37 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
14:32 stevemunene@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
14:20 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:12 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
14:11 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudservices2004-dev.wikimedia.org
14:10 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudservices2004-dev.wikimedia.org
14:07 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
14:07 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
14:05 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:03 stevemunene@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
14:01 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
14:00 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
14:00 Lucas_WMDE: UTC afternoon backport+config window done
14:00 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for GrowthExperiments: Deploy section-level images structured task (T339126) (duration: 12m 49s)
13:54 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
13:48 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and tgr: Backport for GrowthExperiments: Deploy section-level images structured task (T339126) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
13:47 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for GrowthExperiments: Deploy section-level images structured task (T339126)
13:17 elukey: move varnishafka instances in eqiad to PKI
13:16 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
13:15 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
13:15 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
13:14 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
13:14 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
13:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
13:11 samtar@deploy1002: Finished scap: Backport for IS: Enable Phonos on 'small' projects, set PhonosInlineAudioPlayerMode (T336763) (duration: 09m 26s)
13:03 samtar@deploy1002: samtar: Backport for IS: Enable Phonos on 'small' projects, set PhonosInlineAudioPlayerMode (T336763) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:02 samtar@deploy1002: Started scap: Backport for IS: Enable Phonos on 'small' projects, set PhonosInlineAudioPlayerMode (T336763)
12:32 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
12:32 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
12:28 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
12:28 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
12:28 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
12:27 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
12:26 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
12:26 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
12:25 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
12:25 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
12:25 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
12:17 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
12:06 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
12:06 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
12:06 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
12:06 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
12:05 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
12:04 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
12:04 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
12:04 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
12:03 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
12:03 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
11:57 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
11:57 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
11:45 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
11:45 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
11:44 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
11:44 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
11:41 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
11:41 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
11:37 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
11:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
11:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
11:33 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
11:33 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
11:32 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
11:32 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
11:32 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['sessionstore2001']
11:32 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001']
11:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of testvm2002.codfw.wmnet to plain
11:23 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of testvm2002.codfw.wmnet to plain
10:42 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:33 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
10:33 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
10:33 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
10:32 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
10:32 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
10:31 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
10:29 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
10:29 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
10:25 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
10:25 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
10:24 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
10:23 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
10:23 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
10:22 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
10:07 moritzm: installing Apache security updates on Bullseye
09:51 ladsgroup@deploy1002: Finished scap: Backport for Fix adding a domain when the page doesn't exist (duration: 08m 05s)
09:44 ladsgroup@deploy1002: ladsgroup: Backport for Fix adding a domain when the page doesn't exist synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
09:43 ladsgroup@deploy1002: Started scap: Backport for Fix adding a domain when the page doesn't exist
09:40 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
09:40 root@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
09:33 root@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
09:33 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
09:29 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
09:29 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
09:27 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
09:26 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
09:12 vgutierrez: increasing maxconns to 2000 in haproxy for port 80 - T339898
08:50 vgutierrez: tighten HAProxy timeouts on port 80 globally - T339898
08:23 jbond@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "test 931926 - jbond@cumin2002"
08:22 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test 931926 - jbond@cumin2002"
07:43 moritzm: installing containerd security updates
06:55 apergos: rsync in ariel screensession on dumpsdata1003 pulling from dumpsdata1004, bwlimit 100000 (=1G) of misc dumps files
06:39 kart_: Updated cxserver to 2023-06-21-112200-production (T339896, T338123)
06:38 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:38 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
06:36 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:35 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
06:35 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:35 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 for dbproxy102[47] - marostegui@cumin1001"
06:34 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 for dbproxy102[47] - marostegui@cumin1001"
06:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
06:32 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
06:31 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 for dbproxy1023 - marostegui@cumin1001"
06:29 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 for dbproxy1023 - marostegui@cumin1001"
06:27 marostegui@cumin1001: START - Cookbook sre.dns.netbox
05:57 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
05:17 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
05:17 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 for dbproxy1022 - marostegui@cumin1001"
05:16 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 for dbproxy1022 - marostegui@cumin1001"
05:14 marostegui@cumin1001: START - Cookbook sre.dns.netbox
03:17 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
03:16 rzl@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
03:07 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
03:05 rzl@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
02:52 rzl@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
02:51 rzl@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
02:37 rzl@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
02:35 rzl@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
00:46 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host phab-test1001.eqiad.wmnet
00:46 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
00:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1025.eqiad.wmnet with OS bullseye
00:40 dzahn@cumin1001: START - Cookbook sre.dns.netbox
00:40 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host phab-test1001.eqiad.wmnet
00:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host phab-test1001.eqiad.wmnet
00:33 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host phab-test1001.eqiad.wmnet with OS buster
00:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
00:22 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
00:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1025.eqiad.wmnet with OS bullseye
00:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1023.eqiad.wmnet with OS bullseye
00:09 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:08 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
00:07 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host phab-test1001.eqiad.wmnet with OS buster
00:02 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
00:01 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
00:01 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1001.eqiad.wmnet on all recursors
00:01 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1001.eqiad.wmnet on all recursors
00:01 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:01 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
00:00 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"

2023-06-21

23:58 dzahn@cumin1001: START - Cookbook sre.dns.netbox
23:58 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host phab-test1001.eqiad.wmnet
23:56 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host phab-test1002.eqiad.wmnet
23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1002.eqiad.wmnet on all recursors
23:56 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1002.eqiad.wmnet on all recursors
23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
23:55 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
23:53 dzahn@cumin1001: START - Cookbook sre.dns.netbox
23:53 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1002.eqiad.wmnet on all recursors
23:53 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1002.eqiad.wmnet on all recursors
23:53 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:53 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
23:52 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
23:50 dzahn@cumin1001: START - Cookbook sre.dns.netbox
23:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host phab-test1002.eqiad.wmnet
23:50 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host phab-test1002.eqiad.wmnet
23:50 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1002.eqiad.wmnet on all recursors
23:50 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1002.eqiad.wmnet on all recursors
23:50 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:50 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
23:49 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
23:49 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
23:47 dzahn@cumin1001: START - Cookbook sre.dns.netbox
23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1002.eqiad.wmnet on all recursors
23:47 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1002.eqiad.wmnet on all recursors
23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
23:46 tstarling@deploy1002: Synchronized multiversion: Fix some mwscript bugs and clean up the style (duration: 06m 31s)
23:46 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
23:42 dzahn@cumin1001: START - Cookbook sre.dns.netbox
23:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host phab-test1002.eqiad.wmnet
23:37 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
23:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host phab-test1001.eqiad.wmnet
23:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
23:32 urbanecm: Move a large translatable page on foundationwiki (T338217)
23:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
23:32 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host phab-test1001.eqiad.wmnet
23:30 urbanecm: Move a large translatable page (T339154)
23:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host phab-test1001.eqiad.wmnet
23:27 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host phab-test1001.eqiad.wmnet with OS buster
23:27 urbanecm: Move large translatable page (`mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki 'Movement Strategy and Governance/Movement Charter Amb[776/776] Program/grant' 'Movement Charter/Ambassadors Program/Grant' 'Martin Urbanec' --reason='restructuring of the Movement Charter's Meta infrastructure (per request)'`; T338808)
23:15 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
23:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
23:09 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host phab-test1001.eqiad.wmnet with OS buster
23:09 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
23:09 mutante: created temporary test VM phab-test1001.eqiad.wmnet which we need for a one-time test for T335080 - it will soon be destroyed again
23:08 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
23:08 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1001.eqiad.wmnet on all recursors
23:07 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1001.eqiad.wmnet on all recursors
23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
23:07 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
23:02 dzahn@cumin1001: START - Cookbook sre.dns.netbox
23:02 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host phab-test1001.eqiad.wmnet
23:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
23:00 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
22:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
22:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
22:54 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:54 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1023 - jclark@cumin1001"
22:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1023 - jclark@cumin1001"
22:52 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp1078.eqiad.wmnet,cp1080.eqiad.wmnet,cp1082.eqiad.wmnet,cp1084.eqiad.wmnet,cp1086.eqiad.wmnet,cp1088.eqiad.wmnet,cp1090.eqiad.wmnet} and A:cp
22:51 jclark@cumin1001: START - Cookbook sre.dns.netbox
22:48 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp1077.eqiad.wmnet,cp1079.eqiad.wmnet,cp1081.eqiad.wmnet,cp1083.eqiad.wmnet,cp1085.eqiad.wmnet,cp1087.eqiad.wmnet,cp1089.eqiad.wmnet} and A:cp
22:39 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
22:39 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
22:39 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
22:39 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
22:38 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
22:38 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
22:38 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
22:38 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
22:38 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
22:38 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
22:36 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:36 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1023 - jclark@cumin1001"
22:35 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1023 - jclark@cumin1001"
22:33 jclark@cumin1001: START - Cookbook sre.dns.netbox
22:33 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
22:33 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
22:33 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
22:33 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
22:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gerrit1001.wikimedia.org
22:26 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:26 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: gerrit1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
22:25 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: gerrit1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
22:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
22:19 eevans@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2001.codfw.wmnet with OS bullseye
22:16 mutante: destroying previous production gerrit server gerrit1001 - T336427
22:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts gerrit1001.wikimedia.org
22:10 mutante: rsyncing data from cobalt.wikimedia.org (:p) from gerrit1001 to gerrit1003, /srv/gerrit/cobalt/
21:30 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
21:28 eevans@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
21:24 eevans@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
21:23 eevans@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
21:23 eevans@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
21:23 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
20:52 kostajh: UTC late deploys done
20:52 kharlan@deploy1002: Finished scap: Backport for Section images: Select placeholder when inserting it (T335209) (duration: 10m 21s)
20:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people2002.codfw.wmnet
20:45 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:45 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: people2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
20:43 kharlan@deploy1002: kharlan: Backport for Section images: Select placeholder when inserting it (T335209) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:42 kharlan@deploy1002: Started scap: Backport for Section images: Select placeholder when inserting it (T335209)
20:41 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: people2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
20:36 dzahn@cumin1001: START - Cookbook sre.dns.netbox
20:30 mutante: gerrit1001 (formerly gerrit prod) - creating tarball of entire /home/ in /home/ and copying it over to gerrit1003 - simultaneousy adding /home on gerrit servers to bacula from now on - T336427
20:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people2002.codfw.wmnet
20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1003.eqiad.wmnet
20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: people1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
20:13 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: people1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
20:09 dzahn@cumin1001: START - Cookbook sre.dns.netbox
20:04 mutante: deleting VMs people1003.eqiad.wmnet and people2002.codfw.wmnet T338827
20:03 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1003.eqiad.wmnet
19:59 mutante: people.wikimedia.org - disabling shell access to people1003/people2002 (bullseye), use people1004/people2002 (bookworm) or people.eqiad.wmnet / people.codfw.wmnet in your configs if you have something automated or git repos - T338827
19:28 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:28 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
19:24 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:24 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
19:22 ejegg: civicrm upgraded from 4a4b014a to 98b2b5de
19:03 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:03 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
19:01 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
19:01 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
19:00 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
19:00 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
18:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1025.eqiad.wmnet with OS bullseye
18:48 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
18:29 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp1078.eqiad.wmnet,cp1080.eqiad.wmnet,cp1082.eqiad.wmnet,cp1084.eqiad.wmnet,cp1086.eqiad.wmnet,cp1088.eqiad.wmnet,cp1090.eqiad.wmnet} and A:cp
18:27 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp1077.eqiad.wmnet,cp1079.eqiad.wmnet,cp1081.eqiad.wmnet,cp1083.eqiad.wmnet,cp1085.eqiad.wmnet,cp1087.eqiad.wmnet,cp1089.eqiad.wmnet} and A:cp
18:24 sukhe: sudo ipmitool -I lanplus -H "sessionstore2001.mgmt.codfw.wmnet" -U root -E mc reset cold
18:14 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
18:13 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
18:12 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
18:12 mutante: miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/racktables T327405
18:12 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
18:09 mutante: miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/annualreport T337041
18:08 mutante: miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/bienvenida T337047
18:06 mutante: miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/TransparencyReport T338781
18:00 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
18:00 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:59 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:59 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:44 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:44 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:43 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:42 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:40 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:39 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:39 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
17:37 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
17:37 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:36 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:35 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
17:34 ejegg: civicrm upgraded from b11db56d to 4a4b014a
17:34 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
17:30 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:30 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:29 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
17:28 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
17:27 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
17:27 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
17:25 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
17:24 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
17:23 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
17:23 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
17:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:22 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
17:22 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:22 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
17:21 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:21 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:21 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
17:21 sukhe@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:21 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
17:20 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
17:18 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
17:18 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
17:16 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
17:16 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
17:14 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
17:13 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
17:01 sukhe: sudo ipmitool -I lanplus -H "sessionstore2001.mgmt.codfw.wmnet" -U root -E chassis power reset
16:47 eevans@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
16:47 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
16:45 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
16:45 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
16:43 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
16:42 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
16:42 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
16:42 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
16:40 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
16:39 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1003.eqiad.wmnet
16:39 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
16:39 eevans@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
16:39 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
16:39 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1003.eqiad.wmnet
16:39 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
16:38 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
16:38 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1003.eqiad.wmnet
16:37 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1003.eqiad.wmnet
16:15 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2001.codfw.wmnet with OS bullseye
16:02 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1003.eqiad.wmnet
16:01 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1003.eqiad.wmnet
16:00 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:58 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster2001.codfw.wmnet
15:46 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3053.*
15:45 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:45 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
15:45 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2028.*
15:44 mforns@deploy1002: Finished deploy [airflow-dags/analytics@d9a9135]: (no justification provided) (duration: 00m 09s)
15:44 mforns@deploy1002: Started deploy [airflow-dags/analytics@d9a9135]: (no justification provided)
15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=ats-be
15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=cdn
15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=cdn
15:27 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
15:24 sukhe: run authdns-update to depool codfw
15:24 sukhe: run authdns-update to depool cofw
15:23 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-upload_eqiad
15:23 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-text_eqiad
15:20 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqiad
15:19 moritzm: installing php7.3 security updates
15:18 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqiad
15:09 moritzm: installing joblib security updates
15:08 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in codfw: maintenance
15:03 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: maintenance
15:03 urandom: depooling sessionstore/codfw — T340043
14:50 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
14:49 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
14:47 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
14:47 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
14:47 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
14:47 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
14:40 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
14:40 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
14:33 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
14:33 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
14:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1024.eqiad.wmnet with OS bullseye
14:29 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
14:23 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
14:21 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:21 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudservices2005-dev.private.codfw.wikimedia.cloud on all recursors
14:20 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudservices2005-dev.private.codfw.wikimedia.cloud on all recursors
14:20 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:20 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2005-dev - aborrero@cumin2002"
14:19 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2005-dev - aborrero@cumin2002"
14:17 aborrero@cumin2002: START - Cookbook sre.dns.netbox
14:11 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:11 bking@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
14:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1024.eqiad.wmnet with reason: host reimage
14:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
14:02 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1024.eqiad.wmnet with reason: host reimage
14:01 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
13:50 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1024.eqiad.wmnet with OS bullseye
13:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1025.eqiad.wmnet with OS bullseye
13:47 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024
13:46 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
13:46 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:45 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1024 - robh@cumin1001"
13:45 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1024 - robh@cumin1001"
13:44 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1025
13:43 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
13:43 robh@cumin1001: START - Cookbook sre.dns.netbox
13:39 volans: installed spicerack 7.2.1 to the cumin/cloudcumin hosts
13:36 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
13:36 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
13:31 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024
13:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
13:30 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
13:30 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
13:28 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:27 jclark@cumin1001: START - Cookbook sre.dns.netbox
13:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
13:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
13:23 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
13:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
13:22 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
13:22 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
13:19 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1025
13:18 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
13:18 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
13:18 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
13:17 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024
13:16 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
13:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
12:58 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:57 aborrero@cumin2002: START - Cookbook sre.dns.netbox
12:54 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:53 aborrero@cumin2002: START - Cookbook sre.dns.netbox
12:51 elukey: move varnishafka instances in codfw to PKI
12:47 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices2005-dev
12:47 aborrero@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
12:47 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2005-dev decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
12:42 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test change - jbond@cumin1001"
12:41 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test change - jbond@cumin1001"
12:41 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test change - jbond@cumin1001"
12:39 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test change - jbond@cumin1001"
12:30 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2005-dev decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
12:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
12:12 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test change - jbond@cumin1001"
12:08 aborrero@cumin2002: START - Cookbook sre.dns.netbox
12:06 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test change - jbond@cumin1001"
12:06 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "test change - jbond@cumin1001"
12:03 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test change - jbond@cumin1001"
12:01 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudservices2005-dev
11:40 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
11:40 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
11:40 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
11:40 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
11:39 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
11:39 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
11:39 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
11:39 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
11:02 moritzm: installing python2.7 security updates
10:58 vgutierrez: re-enable puppet in A:cp - T339898
10:57 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) (duration: 00m 48s)
10:57 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided)
10:51 volans: uploaded spicerack_7.2.1 to apt.wikimedia.org bullseye-wikimedia
10:37 dcausse@deploy1002: Finished deploy [airflow-dags/search@29d9615]: search: schedule cirrus_consistency_check (take 2) (duration: 00m 10s)
10:37 dcausse@deploy1002: Started deploy [airflow-dags/search@29d9615]: search: schedule cirrus_consistency_check (take 2)
10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db[1124,1133].eqiad.wmnet with reason: Testing cloning
10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db[1124,1133].eqiad.wmnet with reason: Testing cloning
09:59 dcausse@deploy1002: Finished deploy [airflow-dags/search@9c03845]: search: schedule cirrus_consistency_check (duration: 00m 18s)
09:58 dcausse@deploy1002: Started deploy [airflow-dags/search@9c03845]: search: schedule cirrus_consistency_check
09:38 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
09:21 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test sync - jbond@cumin1001"
09:20 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) (duration: 01m 14s)
09:20 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test sync - jbond@cumin1001"
09:19 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided)
09:18 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test sync - jbond@cumin1001"
09:17 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test sync - jbond@cumin1001"
09:17 jbond@cumin1001: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "test sync - jbond@cumin1001"
09:16 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test sync - jbond@cumin1001"
09:15 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test sync - jbond@cumin1001"
09:14 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test sync - jbond@cumin1001"
09:06 jbond: disable puppet on R:git::clone to deploy gerrit:927750
08:36 vgutierrez: disable puppet on A:cp before merging Ie84c15
08:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
08:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
07:23 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
07:22 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
07:21 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
07:21 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
07:13 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1003.eqiad.wmnet with OS bullseye
06:44 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
06:44 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
06:40 hashar@deploy1002: Finished deploy [integration/docroot@51d2552]: Add TimedMediaHandler to docroot - T338458 (duration: 00m 11s)
06:40 hashar@deploy1002: Started deploy [integration/docroot@51d2552]: Add TimedMediaHandler to docroot - T338458
06:07 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1003.eqiad.wmnet with reason: host reimage
06:04 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1003.eqiad.wmnet with reason: host reimage
06:03 ariel@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1003.eqiad.wmnet with OS bullseye
00:20 tzatziki: removing one file for legal compliance
00:13 tzatziki: removing 2files for legal compliance
00:11 tzatziki: removing one file for legal compliancee

2023-06-20

23:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
22:47 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2021.codfw.wmnet with OS buster
22:37 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
22:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
22:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
22:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
22:13 tgr_: UTC late backports done
22:12 tgr@deploy1002: Finished scap: Backport for Section images: Fix ve.scrollIntoView override (T339900 T335209), Backport translations from master (T339225) (duration: 22m 30s)
22:01 tgr@deploy1002: tgr and kharlan: Backport for Section images: Fix ve.scrollIntoView override (T339900 T335209), Backport translations from master (T339225) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
21:59 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_esams
21:59 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS buster
21:49 tgr@deploy1002: Started scap: Backport for Section images: Fix ve.scrollIntoView override (T339900 T335209), Backport translations from master (T339225)
21:36 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: wgEventStreams - page_content_change should use eventgate-analytics-external for canary events - T336817 (duration: 07m 22s)
21:26 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
21:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
21:25 sbassett: Deployed updated mitigation for T336027
21:18 tgr@deploy1002: Finished scap: Backport for Remove unused data attribs on a/v sources (T199129) (duration: 18m 45s)
21:01 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp3053.esams.wmnet,cp3055.esams.wmnet,cp3057.esams.wmnet,cp3059.esams.wmnet,cp3061.esams.wmnet,cp3063.esams.wmnet,cp3065.esams.wmnet} and A:cp
21:01 tgr@deploy1002: jforrester and tgr: Backport for Remove unused data attribs on a/v sources (T199129) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
21:00 tgr@deploy1002: Started scap: Backport for Remove unused data attribs on a/v sources (T199129)
20:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1026.eqiad.wmnet with OS bullseye
20:47 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:46 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1027.eqiad.wmnet with OS bullseye
20:46 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:44 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:42 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:42 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
20:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
20:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
20:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
20:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
20:26 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
20:25 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
20:16 samtar@deploy1002: Finished scap: Backport for Turn off Zebra test for multiple wikis (T337956) (duration: 13m 32s)
20:13 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1026.eqiad.wmnet with OS bullseye
20:13 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
20:09 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS buster
20:03 samtar@deploy1002: ksarabia and samtar: Backport for Turn off Zebra test for multiple wikis (T337956) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:02 samtar@deploy1002: Started scap: Backport for Turn off Zebra test for multiple wikis (T337956)
19:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
19:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts parse1002.eqiad.wmnet
19:30 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1022.eqiad.wmnet with OS bullseye
19:30 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
19:28 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
19:13 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
19:13 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS buster
19:13 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
19:13 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
19:13 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
19:13 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1022.eqiad.wmnet with reason: host reimage
19:13 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
19:09 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
19:08 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:07 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
19:06 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
19:05 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
19:05 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
19:04 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
18:54 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
18:54 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
18:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
18:50 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
18:47 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
18:47 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
18:37 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
18:37 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
18:33 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
18:28 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
18:28 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
18:28 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
18:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
18:24 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
18:24 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
18:18 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
18:17 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
18:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics@d55173d]: (no justification provided) (duration: 00m 11s)
18:12 mforns@deploy1002: Started deploy [airflow-dags/analytics@d55173d]: (no justification provided)
18:03 joal@deploy1002: Finished deploy [analytics/refinery@181eac6] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@181eac6] (duration: 01m 52s)
18:01 joal@deploy1002: Started deploy [analytics/refinery@181eac6] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@181eac6]
18:01 joal@deploy1002: Finished deploy [analytics/refinery@181eac6] (thin): Hotfix analytics deploy THIN [analytics/refinery@181eac6] (duration: 00m 04s)
18:01 joal@deploy1002: Started deploy [analytics/refinery@181eac6] (thin): Hotfix analytics deploy THIN [analytics/refinery@181eac6]
18:00 joal@deploy1002: Finished deploy [analytics/refinery@181eac6]: Hotfix analytics deploy [analytics/refinery@181eac6] (duration: 06m 22s)
17:54 joal@deploy1002: Started deploy [analytics/refinery@181eac6]: Hotfix analytics deploy [analytics/refinery@181eac6]
17:54 sukhe: running authdns-update for T339942
17:44 ottomata: remove stream-enrichment-poc namespace and related resources from dse-k8s-eqiad - T325303
17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
17:13 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_esams
16:59 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
16:55 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3053.esams.wmnet,cp3055.esams.wmnet,cp3057.esams.wmnet,cp3059.esams.wmnet,cp3061.esams.wmnet,cp3063.esams.wmnet,cp3065.esams.wmnet} and A:cp
16:52 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
16:52 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
16:52 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
16:52 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
16:49 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on P{cp3053.esams.wmnet,cp3055.esams.wmnet,cp3057.esams.wmnet,cp3059.esams.wmnet,cp3061.esams.wmnet,cp3063.esams.wmnet,cp3065.esams.wmnet} and A:cp
16:49 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3053.esams.wmnet,cp3055.esams.wmnet,cp3057.esams.wmnet,cp3059.esams.wmnet,cp3061.esams.wmnet,cp3063.esams.wmnet,cp3065.esams.wmnet} and A:cp
16:44 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
16:44 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
16:28 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: wgEventStreams - remove unused rc stream names for page_change related streams - T336817 (duration: 07m 35s)
16:21 sukhe: sudo cumin 'A:cp' 'enable-puppet "merging CR 931626"'
16:17 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventBusStreamNamesMap - Remove page_change stream name override - T336817 (duration: 07m 42s)
16:14 sukhe: sudo cumin 'A:cp' 'disable-puppet "merging CR 931626"'
16:09 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
16:09 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
15:25 moritzm: installing unbound security updates
15:14 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
15:13 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
15:13 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
15:13 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
14:55 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:55 bking@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
14:42 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
14:36 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
14:36 arturo: homer run for CR eqiad/codfw to allow bacula traffic in from cloud-hosts (T338132, T339894)
14:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
14:26 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
14:26 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
14:26 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
14:24 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
14:24 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
14:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host parse1002.eqiad.wmnet
14:16 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes20[12][0-9].codfw.wmnet
14:15 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes10[12][0-9].eqiad.wmnet
14:15 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes202[0-9].codfw.wmnet
14:15 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes201[0-9].codfw.wmnet
14:15 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes102[0-9].eqiad.wmnet
14:15 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0-9].eqiad.wmnet
14:14 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1*.eqiad.wmnet
14:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4044.ulsfo.wmnet,cp4051.ulsfo.wmnet} and A:cp
14:07 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4044.ulsfo.wmnet,cp4051.ulsfo.wmnet} and A:cp
14:06 vgutierrez: test HAProxy 2.6.14 on cp4044 and cp4051
14:03 vgutierrez: fetch HAProxy 2.6.14 on thirdparty/haproxy26 for bullseye (apt.wm.o)
13:22 vgutierrez: repooling cp3050 - T339898
13:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
13:18 moritzm: installing python2.7 security updates
13:15 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts otrs1001.eqiad.wmnet
13:15 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:15 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: otrs1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aokoth@cumin1001"
13:14 urbanecm@deploy1002: Finished scap: Backport for Enable Extension:Translate on pt.wikisource.org (T339139) (duration: 09m 11s)
13:13 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: otrs1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aokoth@cumin1001"
13:10 aokoth@cumin1001: START - Cookbook sre.dns.netbox
13:06 urbanecm@deploy1002: albertoleoncio and urbanecm: Backport for Enable Extension:Translate on pt.wikisource.org (T339139) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
13:05 urbanecm: Create ext:Translate tables on ptwikisource (T339139)
13:04 urbanecm@deploy1002: Started scap: Backport for Enable Extension:Translate on pt.wikisource.org (T339139)
13:04 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts otrs1001.eqiad.wmnet
13:04 urbanecm: Start foreachwikiindblist 'group2 & s1' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all on a tmux in mwmaint1002 (T315510)
12:58 jclark@cumin1001: START - Cookbook sre.hosts.reboot-single for host parse1002.eqiad.wmnet
12:57 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts parse1002.eqiad.wmnet
12:47 aokoth@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts otrs1001.eqiad.wmnet
12:46 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts otrs1001.eqiad.wmnet
12:37 vgutierrez: depooling cp3050 - T339898
12:32 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
12:32 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
12:26 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
12:25 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
11:27 jnuche@deploy1002: deploy aborted: (no justification provided) (duration: 01m 32s)
11:26 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided)
11:15 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
11:14 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
11:13 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
11:13 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
11:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
11:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
10:57 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:30 ladsgroup@deploy1002: Finished scap: Backport for Stop setting wgLegacyEncdoing (T128150 T128151) (duration: 08m 06s)
10:23 ladsgroup@deploy1002: ladsgroup: Backport for Stop setting wgLegacyEncdoing (T128150 T128151) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
10:22 ladsgroup@deploy1002: Started scap: Backport for Stop setting wgLegacyEncdoing (T128150 T128151)
10:16 Lucas_WMDE: deployed patches for T339111
09:35 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:23 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1003.eqiad.wmnet with OS bullseye
09:20 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:02 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-eqiad
08:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1119.eqiad.wmnet with OS bookworm
08:37 ariel@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1003.eqiad.wmnet with OS bullseye
08:37 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-eqiad
08:06 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1003.eqiad.wmnet with OS bullseye
07:40 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-codfw
07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1119.eqiad.wmnet with reason: host reimage
07:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1119.eqiad.wmnet with reason: host reimage
07:20 ariel@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1003.eqiad.wmnet with OS bullseye
07:18 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-codfw
07:18 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1119.eqiad.wmnet with OS bookworm
07:14 kartik@deploy1002: Finished scap: Backport for Enable Content and Section Translation for a 3rd group of 10 languages previously lacking MT (T337834) (duration: 10m 25s)
07:07 moritzm: installing openssl securit updates on buster
07:05 kartik@deploy1002: kartik: Backport for Enable Content and Section Translation for a 3rd group of 10 languages previously lacking MT (T337834) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
07:04 kartik@deploy1002: Started scap: Backport for Enable Content and Section Translation for a 3rd group of 10 languages previously lacking MT (T337834)
06:34 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1119.eqiad.wmnet with OS bookworm
06:29 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1119.eqiad.wmnet with OS bookworm
05:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14860
05:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14860
00:14 zabe: Deployed patch for T330968

2023-06-19

16:41 ladsgroup@deploy1002: Finished scap: Backport for Revert "Temporarily bring back legacy encoding in four wikis" (duration: 15m 19s)
16:27 ladsgroup@deploy1002: ladsgroup: Backport for Revert "Temporarily bring back legacy encoding in four wikis" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
16:26 ladsgroup@deploy1002: Started scap: Backport for Revert "Temporarily bring back legacy encoding in four wikis"
16:22 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
16:16 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
16:09 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:50 elukey@cumin1001: END (ERROR) - Cookbook sre.cassandra.roll-restart (exit_code=97) for nodes matching A:ml-cache-codfw: Applying internode-encryption: all - elukey@cumin1001
15:47 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Applying internode-encryption: all - elukey@cumin1001
15:22 brett: Rolling reboot of codfw cache_text nodes to apply Linux update for CVE-2023-1872 - T335835
15:07 moritzm: installing libxpm security updates
15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
14:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
14:47 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
14:47 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
14:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
14:46 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
14:46 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
14:45 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
14:45 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
14:39 ladsgroup@deploy1002: Finished scap: Backport for file: Make pre-gen rendering of multi-page files (pdf, ...) serial (T337649) (duration: 20m 07s)
14:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
14:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
14:26 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
14:26 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
14:24 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
14:23 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
14:20 ladsgroup@deploy1002: ladsgroup: Backport for file: Make pre-gen rendering of multi-page files (pdf, ...) serial (T337649) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:19 ladsgroup@deploy1002: Started scap: Backport for file: Make pre-gen rendering of multi-page files (pdf, ...) serial (T337649)
14:17 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
14:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
14:04 elukey: move varnishafka instances in eqsin to PKI
13:44 kamila_: updated DNS: added discovery records for rest-gateway and device-analytics T335505
13:14 moritzm: installing openjdk-17 security updates
12:21 moritzm: uploaded wmfmariadbpy 0.10+deb12u1 T339835
12:01 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
12:01 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
12:00 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
12:00 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
11:55 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:55 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
11:54 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:54 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
11:53 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:53 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
11:39 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1060.eqiad.wmnet
11:39 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:39 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1060.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
11:38 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1060.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
11:36 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
11:28 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1060.eqiad.wmnet
11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Maint over (T338354)', diff saved to https://phabricator.wikimedia.org/P49449 and previous config saved to /var/cache/conftool/dbconfig/20230619-110207-ladsgroup.json
10:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1059.eqiad.wmnet
10:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1059.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
10:58 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1059.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
10:56 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1124.eqiad.wmnet with OS bookworm
10:52 moritzm: imported megacli and ssacli to thirdparty/hwraid for bookworm-wikimedia T339847
10:48 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1059.eqiad.wmnet
10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Maint over (T338354)', diff saved to https://phabricator.wikimedia.org/P49448 and previous config saved to /var/cache/conftool/dbconfig/20230619-104702-ladsgroup.json
10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Maint over (T338354)', diff saved to https://phabricator.wikimedia.org/P49447 and previous config saved to /var/cache/conftool/dbconfig/20230619-103157-ladsgroup.json
10:17 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
10:16 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 10%: Maint over (T338354)', diff saved to https://phabricator.wikimedia.org/P49446 and previous config saved to /var/cache/conftool/dbconfig/20230619-101653-ladsgroup.json
10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P49445 and previous config saved to /var/cache/conftool/dbconfig/20230619-101623-ladsgroup.json
10:15 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
10:15 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
10:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1124.eqiad.wmnet with reason: host reimage
10:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1124.eqiad.wmnet with reason: host reimage
10:00 claime: Switching test.wikipedia.org to mw-on-k8s - T337489
09:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1124.eqiad.wmnet with OS bookworm
09:43 ladsgroup@deploy1002: Finished scap: Backport for Enable new spam block page in all wikis except meta, commons, wikidata (T337431) (duration: 10m 45s)
09:40 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1058.eqiad.wmnet
09:40 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:40 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1058.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
09:34 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:34 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
09:34 ladsgroup@deploy1002: ladsgroup: Backport for Enable new spam block page in all wikis except meta, commons, wikidata (T337431) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
09:32 ladsgroup@deploy1002: Started scap: Backport for Enable new spam block page in all wikis except meta, commons, wikidata (T337431)
09:30 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1058.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
09:30 ladsgroup@deploy1002: Finished scap: Backport for Blocked domains: Fix removing a domain via the special page (T337431) (duration: 08m 24s)
09:27 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
09:22 ladsgroup@deploy1002: ladsgroup: Backport for Blocked domains: Fix removing a domain via the special page (T337431) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
09:21 ladsgroup@deploy1002: Started scap: Backport for Blocked domains: Fix removing a domain via the special page (T337431)
09:21 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1058.eqiad.wmnet
09:15 kart_: Updated MinT to 2023-06-16-042302-production, Updated people egress (T339271, T335491)
09:12 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
09:12 ladsgroup@deploy1002: Finished scap: Backport for blocked domains: Make sure users can't bypass the list by using uppercase (T337431) (duration: 09m 53s)
09:07 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
09:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
09:03 ladsgroup@deploy1002: ladsgroup: Backport for blocked domains: Make sure users can't bypass the list by using uppercase (T337431) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
09:02 ladsgroup@deploy1002: Started scap: Backport for blocked domains: Make sure users can't bypass the list by using uppercase (T337431)
09:01 ladsgroup@deploy1002: Finished scap: Backport for Temporarily bring back legacy encoding in four wikis (T128150) (duration: 07m 31s)
09:00 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
08:55 ladsgroup@deploy1002: ladsgroup: Backport for Temporarily bring back legacy encoding in four wikis (T128150) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
08:53 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
08:53 ladsgroup@deploy1002: Started scap: Backport for Temporarily bring back legacy encoding in four wikis (T128150)
08:51 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
08:49 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1124.eqiad.wmnet with OS bookworm
08:45 ladsgroup@deploy1002: Finished scap: Backport for moveToExternal: First decompress gziped entries before iconv (T128150) (duration: 08m 52s)
08:38 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3050.esams.wmnet
08:38 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3051.esams.wmnet
08:37 ladsgroup@deploy1002: ladsgroup: Backport for moveToExternal: First decompress gziped entries before iconv (T128150) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
08:36 ladsgroup@deploy1002: Started scap: Backport for moveToExternal: First decompress gziped entries before iconv (T128150)
08:30 fabfur: rebooting cp3051 and cp3051 for kernel upgrade (T335835)
08:29 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3050.esams.wmnet
08:29 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3051.esams.wmnet
08:20 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Chad out of all services on: 19 hosts
08:20 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Chad out of all services on: 19 hosts
08:19 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Chad out of all services on: 776 hosts
08:19 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Chad out of all services on: 776 hosts
08:04 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1124.eqiad.wmnet with OS bookworm
08:03 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1124.eqiad.wmnet with OS bullseye
07:55 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Chad out of all services on: 1259 hosts
07:54 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Chad out of all services on: 1259 hosts
07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1124.eqiad.wmnet with reason: host reimage
07:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1124.eqiad.wmnet with reason: host reimage
07:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1124.eqiad.wmnet with OS bullseye
07:39 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1124.eqiad.wmnet with OS bookworm
07:38 moritzm: uploaded wmfmariadbpy 0.10+deb12u1
07:14 kartik@deploy1002: Finished scap: Backport for Use Parsoid for all Wikis for Content Translation (T339322) (duration: 11m 31s)
07:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1124.eqiad.wmnet with OS bookworm
07:04 kartik@deploy1002: kartik: Backport for Use Parsoid for all Wikis for Content Translation (T339322) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
07:03 kartik@deploy1002: Started scap: Backport for Use Parsoid for all Wikis for Content Translation (T339322)
06:39 urbanecm@deploy1002: Finished scap: Backport for Add throttle rule (duration: 07m 10s)
06:32 urbanecm@deploy1002: Started scap: Backport for Add throttle rule
05:34 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
05:14 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
04:49 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
04:29 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply

2023-06-18

10:02 hashar@deploy1002: Synchronized php-1.41.0-wmf.13/extensions/CirrusSearch: T339810 - token_count_router: infer the analyzer from the field (duration: 05m 50s)
09:50 hashar@deploy1002: Synchronized php-1.41.0-wmf.13/extensions/WikibaseCirrusSearch: T339810 - token_count_router: infer the analyzer from the field (duration: 14m 11s)
09:29 hashar: mwdebug1001: scap pull of https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CirrusSearch/+/930910 and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseCirrusSearch/+/930909 # T339810

2023-06-16

22:25 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS buster
21:29 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS buster
21:08 sbassett: Deployed updated security mitigation for T336027
21:04 brett: Finished rolling reboot of codfw cache_upload nodes to apply Linux update for CVE-2023-1872 - T335835
19:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1156.eqiad.wmnet with OS bullseye
19:03 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
19:03 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
18:51 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1156.eqiad.wmnet with OS bullseye
17:53 wfan: civicrm upgraded from d61220cd to b11db56d
16:14 brett: Rolling reboot of codfw cache_upload nodes to apply Linux update for CVE-2023-1872 - T335835
16:09 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
15:59 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
15:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:28 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
15:13 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:13 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack IPv6 - aborrero@cumin1001"
15:12 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack IPv6 - aborrero@cumin1001"
15:09 aborrero@cumin1001: START - Cookbook sre.dns.netbox
14:59 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:59 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack IPv6 - aborrero@cumin1001"
14:57 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack IPv6 - aborrero@cumin1001"
14:54 aborrero@cumin1001: START - Cookbook sre.dns.netbox
14:40 aborrero@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
14:40 aborrero@cumin1001: START - Cookbook sre.dns.netbox
13:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: hw troubleshooting
13:55 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: hw troubleshooting
13:54 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
13:52 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a1-codfw.mgmt.codfw.wmnet
13:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
13:51 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a1-codfw.mgmt.codfw.wmnet
12:16 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts parse1002.eqiad.wmnet
12:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts parse1002.eqiad.wmnet
12:07 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
12:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
12:02 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
12:00 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
12:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:59 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:56 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:55 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
11:53 aborrero@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
11:53 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:53 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:50 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:47 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:46 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1002.eqiad.wmnet
11:15 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
11:14 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
11:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1002.eqiad.wmnet
10:38 Amir1: root@cumin1001:/home/ladsgroup/software2/dbtools# cat s1.dblist | grep -v "#" | while read db; do cat tables_to_check.txt | while read table index; do echo "$db.$table"; db-compare $db $table $index db1135.eqiad.wmnet:3306 db1118 db1139:3311 || break 2; done ; done (T338354)
09:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:02 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
08:47 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
08:41 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
08:35 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
08:25 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: name=parse1002.eqiad.wmnet
08:19 hashar@deploy1002: Finished scap: Backport for Revert "Structured tasks: Fix toolbar rewriting" (T339292 T338934) (duration: 21m 08s)
08:00 hashar@deploy1002: hashar: Backport for Revert "Structured tasks: Fix toolbar rewriting" (T339292 T338934) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
07:58 hashar@deploy1002: Started scap: Backport for Revert "Structured tasks: Fix toolbar rewriting" (T339292 T338934)
07:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief2001.codfw.wmnet
07:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.remove-downtime for acmechief2001.codfw.wmnet
01:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
01:50 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
01:36 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b8-codfw.mgmt.codfw.wmnet
01:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:36 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b8-codfw - pt1979@cumin2002"
01:35 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b8-codfw - pt1979@cumin2002"
01:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
01:31 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b8-codfw.mgmt.codfw.wmnet
01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b7-codfw.mgmt.codfw.wmnet
01:28 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b7-codfw - pt1979@cumin2002"
01:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b7-codfw - pt1979@cumin2002"
01:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
01:24 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b7-codfw.mgmt.codfw.wmnet
01:22 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b6-codfw.mgmt.codfw.wmnet
01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b6-codfw - pt1979@cumin2002"
01:20 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b6-codfw - pt1979@cumin2002"
01:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
01:17 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b6-codfw.mgmt.codfw.wmnet
01:16 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b5-codfw.mgmt.codfw.wmnet
01:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:15 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b5-codfw - pt1979@cumin2002"
01:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b5-codfw - pt1979@cumin2002"
01:10 pt1979@cumin2002: START - Cookbook sre.dns.netbox
01:10 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b5-codfw.mgmt.codfw.wmnet
01:07 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b4-codfw.mgmt.codfw.wmnet
01:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:06 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b4-codfw - pt1979@cumin2002"
01:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b4-codfw - pt1979@cumin2002"
01:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
01:01 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b4-codfw.mgmt.codfw.wmnet
01:01 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b3-codfw.mgmt.codfw.wmnet
01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b3-codfw - pt1979@cumin2002"
00:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b3-codfw - pt1979@cumin2002"
00:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1149.eqiad.wmnet with OS bullseye
00:56 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:56 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b3-codfw.mgmt.codfw.wmnet
00:47 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b2-codfw.mgmt.codfw.wmnet
00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b2-codfw - pt1979@cumin2002"
00:46 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b2-codfw - pt1979@cumin2002"
00:42 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:42 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b2-codfw.mgmt.codfw.wmnet
00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a7-codfw.mgmt.codfw.wmnet
00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a7-codfw - pt1979@cumin2002"
00:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a7-codfw - pt1979@cumin2002"
00:25 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:25 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a7-codfw.mgmt.codfw.wmnet
00:23 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2003.codfw.wmnet with OS bullseye
00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a6-codfw.mgmt.codfw.wmnet
00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a6-codfw - pt1979@cumin2002"
00:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a6-codfw - pt1979@cumin2002"
00:18 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:18 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a6-codfw.mgmt.codfw.wmnet
00:17 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a5-codfw.mgmt.codfw.wmnet
00:16 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:16 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a5-codfw - pt1979@cumin2002"
00:15 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a5-codfw - pt1979@cumin2002"
00:11 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:11 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a5-codfw.mgmt.codfw.wmnet
00:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a4-codfw.mgmt.codfw.wmnet
00:10 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:10 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a4-codfw - pt1979@cumin2002"
00:09 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a4-codfw - pt1979@cumin2002"
00:06 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:06 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a4-codfw.mgmt.codfw.wmnet
00:05 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a3-codfw.mgmt.codfw.wmnet
00:04 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a3-codfw - pt1979@cumin2002"
00:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a3-codfw - pt1979@cumin2002"
00:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
00:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1149.eqiad.wmnet with OS bullseye
00:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:00 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a3-codfw.mgmt.codfw.wmnet

2023-06-15

23:58 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
23:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1149']
23:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a2-codfw.mgmt.codfw.wmnet
23:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
23:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a2-codfw - pt1979@cumin2002"
23:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a2-codfw - pt1979@cumin2002"
23:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
23:51 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a2-codfw.mgmt.codfw.wmnet
23:47 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a1-codfw.mgmt.codfw.wmnet
23:47 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
23:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
23:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
23:45 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a1-codfw.mgmt.codfw.wmnet
23:44 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['an-worker1153']
23:44 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1153']
23:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1153']
23:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1153']
23:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
23:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
23:42 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2003.codfw.wmnet with OS bullseye
23:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
23:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1154']
23:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1154']
23:26 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a1-codfw.mgmt.codfw.wmnet
23:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1154']
23:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1154']
23:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1151']
23:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1151']
23:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1152']
23:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1152']
23:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1153']
23:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2002.codfw.wmnet with OS bullseye
23:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1153']
23:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1154']
23:10 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1154']
23:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1155']
23:02 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1155']
23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1156']
22:54 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1156']
22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1156']
22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1156']
22:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1156']
22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1156']
22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1155']
22:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
22:30 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
22:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1155']
22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1154']
22:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
22:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
22:14 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS bullseye
22:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
22:14 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
22:14 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1154']
22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1153']
22:01 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1153']
21:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS bullseye
21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1152']
21:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1152']
21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1151']
21:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1151']
21:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
21:24 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
21:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
21:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
21:19 jhancock@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
21:19 jhancock@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
21:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1150']
21:14 thcipriani: parse1002 having ssh connection problems during backport window
21:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
21:13 thcipriani@deploy1002: Finished scap: Backport for Revert "Targets: Use align:'after' instead of actionGroups" (T339292), HelpCompletionTool wasn't added to extension.json (T338254) (duration: 16m 09s)
21:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1150']
21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
21:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1150']
21:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
21:08 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS bullseye
20:58 thcipriani@deploy1002: thcipriani and matmarex: Backport for Revert "Targets: Use align:'after' instead of actionGroups" (T339292), HelpCompletionTool wasn't added to extension.json (T338254) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:57 thcipriani@deploy1002: Started scap: Backport for Revert "Targets: Use align:'after' instead of actionGroups" (T339292), HelpCompletionTool wasn't added to extension.json (T338254)
20:54 thcipriani@deploy1002: Finished scap: Backport for [uzwiki] Add the 'patroller' usergroup (T338826) (duration: 15m 27s)
20:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
20:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1149']
20:42 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
20:40 thcipriani@deploy1002: superpes and thcipriani: Backport for [uzwiki] Add the 'patroller' usergroup (T338826) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:38 thcipriani@deploy1002: Started scap: Backport for [uzwiki] Add the 'patroller' usergroup (T338826)
20:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1149']
20:30 thcipriani@deploy1002: Finished scap: Backport for Remove GDI survey from RU and JA wikis. (T338926) (duration: 16m 30s)
20:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
20:15 thcipriani@deploy1002: essexigyan and thcipriani: Backport for Remove GDI survey from RU and JA wikis. (T338926) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:13 thcipriani@deploy1002: Started scap: Backport for Remove GDI survey from RU and JA wikis. (T338926)
19:06 ladsgroup@deploy1002: Finished scap: Backport for Enable blocked domain list in testwiki and fawiki (T337431) (duration: 17m 40s)
18:50 ladsgroup@deploy1002: ladsgroup: Backport for Enable blocked domain list in testwiki and fawiki (T337431) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
18:48 ladsgroup@deploy1002: Started scap: Backport for Enable blocked domain list in testwiki and fawiki (T337431)
18:48 ryankemper: [WDQS] `ryankemper@wdqs2012:~$ sudo pool`
18:44 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
18:44 ladsgroup@deploy1002: Finished scap: Backport for BlockedDomains: Add logging in case of hit (T337431) (duration: 30m 33s)
18:25 ladsgroup@deploy1002: ladsgroup: Backport for BlockedDomains: Add logging in case of hit (T337431) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
18:13 ladsgroup@deploy1002: Started scap: Backport for BlockedDomains: Add logging in case of hit (T337431)
17:13 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:13 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:12 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:12 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:12 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:11 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:02 joal@deploy1002: Finished deploy [airflow-dags/analytics@bba655e]: (no justification provided) (duration: 00m 11s)
17:02 joal@deploy1002: Started deploy [airflow-dags/analytics@bba655e]: (no justification provided)
17:00 jnuche@deploy1002: Installation of scap version "4.53.0" completed for 594 hosts
16:59 jnuche@deploy1002: Installing scap version "4.53.0" for 594 hosts
16:55 jnuche@deploy1002: Installing scap version "4.53.0" for 595 hosts
16:53 jnuche@deploy1002: Installing scap version "4.53.0" for 595 hosts
16:52 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices2004-dev.codfw.wmnet with OS bullseye
16:52 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
16:51 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
16:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on acmechief2001.codfw.wmnet with reason: https://letsencrypt.status.io/pages/55957a99e800baa4470002da
16:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on acmechief2001.codfw.wmnet with reason: https://letsencrypt.status.io/pages/55957a99e800baa4470002da
16:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on acmechief2001.codfw.wmnet with reason: https://letsencrypt.status.io/pages/55957a99e800baa4470002da
16:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on acmechief2001.codfw.wmnet with reason: https://letsencrypt.status.io/pages/55957a99e800baa4470002da
16:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief2001.codfw.wmnet
16:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.remove-downtime for acmechief2001.codfw.wmnet
15:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:56 joal@deploy1002: Finished deploy [airflow-dags/analytics@c584b62]: (no justification provided) (duration: 00m 12s)
15:56 joal@deploy1002: Started deploy [airflow-dags/analytics@c584b62]: (no justification provided)
15:51 mutante: phabricator - made jnuche (https://phabricator.wikimedia.org/people/manage/32076/) an Administrator T339174
15:46 milimetric@deploy1002: Finished deploy [analytics/refinery@106bf30] (thin): Patch for HiveToDruid with snapshots [thin] (duration: 00m 04s)
15:45 milimetric@deploy1002: Started deploy [analytics/refinery@106bf30] (thin): Patch for HiveToDruid with snapshots [thin]
15:44 claime: mw2323.codfw.wmnet repooled following T326564
15:44 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2323.codfw.wmnet
15:44 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2323.codfw.wmnet
15:44 milimetric@deploy1002: Finished deploy [analytics/refinery@106bf30]: Patch for HiveToDruid with snapshots (duration: 07m 01s)
15:43 claime: mw2324.codfw.wmnet repooled following T326564
15:39 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2323.codfw.wmnet
15:37 milimetric@deploy1002: Started deploy [analytics/refinery@106bf30]: Patch for HiveToDruid with snapshots
15:36 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2324.codfw.wmnet
15:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2324.codfw.wmnet
15:36 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2324.codfw.wmnet
15:33 cgoubert@cumin1001: conftool action : set/pooled=no; selector: name=mw2324.codfw.wmnet
15:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
15:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
15:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
15:28 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
15:28 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
15:27 claime: mw2411.codfw.wmnet repooled following T326564
15:26 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2411.codfw.wmnet
15:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
15:24 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2411.codfw.wmnet
15:24 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2411.codfw.wmnet
15:23 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
15:22 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
15:21 claime: mw2401.codfw.wmnet repooled following T326564
15:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
15:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
15:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
15:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
15:17 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
15:17 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
15:16 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
15:16 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
15:16 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2401.codfw.wmnet
15:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2401.codfw.wmnet
15:16 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2401.codfw.wmnet
15:14 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
15:14 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
15:14 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
15:14 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
15:13 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
15:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
15:12 claime: Deploying new mediawiki chart: Gracefully handle termination - T331609
15:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
15:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
15:10 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
15:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
15:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
15:09 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
15:09 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
15:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
15:00 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:00 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
14:55 claime: Powering down mw2401 mw2411 mw2324 mw2323 - T326564
14:54 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2323.codfw.wmnet
14:54 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2324.codfw.wmnet
14:54 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2411.codfw.wmnet
14:54 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2401.codfw.wmnet
14:53 claime: Depooling mw2401 mw2411 mw2324 mw2323 as invalid for powerdown - T326564
14:53 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw2323.codfw.wmnet with reason: powering off for T326564
14:52 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw2323.codfw.wmnet with reason: powering off for T326564
14:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw2324.codfw.wmnet with reason: powering off for T326564
14:52 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw2324.codfw.wmnet with reason: powering off for T326564
14:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw2411.codfw.wmnet with reason: powering off for T326564
14:52 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw2411.codfw.wmnet with reason: powering off for T326564
14:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw2401.codfw.wmnet with reason: powering off for T326564
14:51 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw2401.codfw.wmnet with reason: powering off for T326564
14:40 Lucas_WMDE: UTC afternoon backport+config window done (maintenance script runs are ongoing and “will probably take a few weeks to complete”)
14:39 Lucas_WMDE: Start `foreachwikiindblist 'group2 & s6' DiscussionTools:persistRevisionThreadItems --current --all; touch ~/T315510-s6-exited-$?` in tmux on mwmaint1002 (T315510)
14:39 Lucas_WMDE: Start `foreachwikiindblist 'group2 & s5' DiscussionTools:persistRevisionThreadItems --current --all; touch ~/T315510-s5-exited-$?` in tmux on mwmaint1002 (T315510)
14:35 Lucas_WMDE: Start `foreachwikiindblist 'group2 & s3' DiscussionTools:persistRevisionThreadItems --current --all; touch ~/T315510-s3-exited-$?` in tmux on mwmaint1002 (T315510)
14:34 Lucas_WMDE: Start `foreachwikiindblist 'group2 & s2' DiscussionTools:persistRevisionThreadItems --current --all; touch ~/T315510-s2-exited-$?` in tmux on mwmaint1002 (T315510)
14:29 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "Implement Language Converter for yue (Cantonese)" (T59106 T337527) (duration: 09m 53s)
14:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
14:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
14:21 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and wsung: Backport for Revert "Implement Language Converter for yue (Cantonese)" (T59106 T337527) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
14:19 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "Implement Language Converter for yue (Cantonese)" (T59106 T337527)
14:01 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "Temporarily disable UCoC link from non tech wikis" (T280886) (duration: 08m 44s)
14:00 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5032.eqsin.wmnet
14:00 moritzm: remove ruby2.5 2.5.5-3+deb10u5+wmf1 (superseded by corrected Debian build 2.5.5-3+deb10u6 T338294
14:00 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5024.eqsin.wmnet
13:55 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
13:54 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
13:54 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
13:54 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
13:54 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
13:54 lucaswerkmeister-wmde@deploy1002: reedy and lucaswerkmeister-wmde: Backport for Revert "Temporarily disable UCoC link from non tech wikis" (T280886) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:54 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
13:53 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
13:53 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
13:53 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "Temporarily disable UCoC link from non tech wikis" (T280886)
13:51 moritzm: installing ruby2.5 security updates
13:49 fabfur: reboot cp5024 and cp5032 for kernel upgrade (T335835)
13:49 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5024.eqsin.wmnet
13:49 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5032.eqsin.wmnet
13:48 samtar@deploy1002: Finished scap: Backport for Section images: Fix scrolling to placeholder (T335209), Section images: update rtl asset with flipped question mark (T335207) (duration: 09m 40s)
13:40 samtar@deploy1002: kharlan and samtar: Backport for Section images: Fix scrolling to placeholder (T335209), Section images: update rtl asset with flipped question mark (T335207) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:39 samtar@deploy1002: Started scap: Backport for Section images: Fix scrolling to placeholder (T335209), Section images: update rtl asset with flipped question mark (T335207)
13:28 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
13:24 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5031.eqsin.wmnet
13:24 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5023.eqsin.wmnet
13:21 daniel@deploy1002: Finished scap: Backport for Switch VisualEditor to bypass RESTbase on all wikis. (T320529) (duration: 11m 48s)
13:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
13:13 fabfur: reboot cp5023 and cp5031 for kernel upgrade (T335835)
13:13 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5031.eqsin.wmnet
13:13 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5023.eqsin.wmnet
13:10 daniel@deploy1002: daniel: Backport for Switch VisualEditor to bypass RESTbase on all wikis. (T320529) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:09 daniel@deploy1002: Started scap: Backport for Switch VisualEditor to bypass RESTbase on all wikis. (T320529)
13:08 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
13:08 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
13:07 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
13:05 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
12:59 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
12:58 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
12:57 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
12:57 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
12:53 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5022.eqsin.wmnet
12:51 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5030.eqsin.wmnet
12:48 stevemunene@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
12:45 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
12:40 fabfur: reboot cp5022 and cp5030 for kernel upgrade (T335835)
12:40 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5022.eqsin.wmnet
12:40 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5030.eqsin.wmnet
12:35 moritzm: installing ffmpeg security updates
12:34 joal@deploy1002: Finished deploy [airflow-dags/analytics@d458338]: (no justification provided) (duration: 00m 09s)
12:34 joal@deploy1002: Started deploy [airflow-dags/analytics@d458338]: (no justification provided)
12:27 moritzm: installing containerd security updates
12:18 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5029.eqsin.wmnet
12:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5021.eqsin.wmnet
12:14 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
12:11 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
12:07 fabfur: reboot cp5021 and cp5029 for kernel upgrade (T335835)
12:06 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5021.eqsin.wmnet
12:06 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5029.eqsin.wmnet
12:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
12:02 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
11:58 moritzm: restarting exim on lists1001
11:52 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices2004-dev.codfw.wmnet with OS bullseye
11:52 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
11:51 claime: Repooled parse1002.eqiad.wmnet after powercycle
11:49 moritzm: restarting slapd on seagorgium/serpens
11:48 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
11:48 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
11:46 ladsgroup@deploy1002: Finished scap: Backport for Switch five large wikis to extlinks read new (T335343) (duration: 09m 10s)
11:45 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-codfw
11:40 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-eqiad
11:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on parse1002.eqiad.wmnet with reason: Powercycle
11:40 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on parse1002.eqiad.wmnet with reason: Powercycle
11:39 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema
11:39 claime: parse1002 not responding to ssh or console, depooled
11:38 ladsgroup@deploy1002: ladsgroup: Backport for Switch five large wikis to extlinks read new (T335343) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
11:37 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=parse1002.eqiad.wmnet
11:37 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema
11:37 ladsgroup@deploy1002: Started scap: Backport for Switch five large wikis to extlinks read new (T335343)
11:32 ladsgroup@deploy1002: Finished scap: Backport for Remove nlwiki from windows-1252 encoding (T128154) (duration: 17m 38s)
11:31 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
11:29 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
11:28 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-eqiad
11:16 ladsgroup@deploy1002: ladsgroup: Backport for Remove nlwiki from windows-1252 encoding (T128154) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
11:14 ladsgroup@deploy1002: Started scap: Backport for Remove nlwiki from windows-1252 encoding (T128154)
11:11 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices2004-dev.codfw.wmnet with OS bullseye
11:08 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5028.eqsin.wmnet
11:08 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5020.eqsin.wmnet
10:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:57 fabfur: reboot cp5020 and cp5028 for kernel upgrade (T335835)
10:57 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5020.eqsin.wmnet
10:57 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5028.eqsin.wmnet
10:56 fabfur@cumin1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet
10:54 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:34 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
10:34 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
10:34 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-codfw
10:30 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
10:30 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:22 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-codfw
10:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
10:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
10:18 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
10:18 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
10:17 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
10:16 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
10:15 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
10:14 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
10:09 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:07 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
10:06 btullis: removed hadoop packages incorrectly labelled for i386 in thirdparty/bigtop15 bullseye-wikimedia
10:04 Amir1: root@clouddb1021.eqiad.wmnet[metawiki]> ALTER TABLE pagelinks ROW_FORMAT=COMPRESSED; (T337961)
10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb1021.eqiad.wmnet with reason: T337961
10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb1021.eqiad.wmnet with reason: T337961
10:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:57 moritzm: restarting FPM on mw canaries
09:54 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:53 moritzm: installing openssl security updates on buster
09:51 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp5019.eqsin.wmnet
09:47 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5027.eqsin.wmnet
09:43 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudservices2004-dev.private.codfw.wikimedia.cloud on all recursors
09:43 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudservices2004-dev.private.codfw.wikimedia.cloud on all recursors
09:42 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:42 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
09:41 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
09:39 aborrero@cumin2002: START - Cookbook sre.dns.netbox
09:34 fabfur: reboot cp5019 and cp5027 for kernel upgrade (T335835)
09:34 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5019.eqsin.wmnet
09:34 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5027.eqsin.wmnet
09:13 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5018.eqsin.wmnet
09:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5026.eqsin.wmnet
09:08 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices2004-dev.codfw.wmnet with OS bullseye
09:07 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudservices2004-dev.mgmt.codfw.wmnet on all recursors
09:07 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudservices2004-dev.mgmt.codfw.wmnet on all recursors
09:06 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudservices2004-dev.codfw.wmnet on all recursors
09:06 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudservices2004-dev.codfw.wmnet on all recursors
09:05 elukey: move varnishkafka instances in ulsfo to PKI - T337825
09:05 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:05 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2004-dev - aborrero@cumin2002"
09:04 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2004-dev - aborrero@cumin2002"
09:02 aborrero@cumin2002: START - Cookbook sre.dns.netbox
09:01 fabfur: reboot cp5018 and cp5026 for kernel upgrade (T335835)
09:01 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5018.eqsin.wmnet
09:01 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5026.eqsin.wmnet
09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
09:00 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1492.eqiad.wmnet
09:00 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1492.eqiad.wmnet
08:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1492.eqiad.wmnet
08:59 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1492.eqiad.wmnet
08:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
08:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1492.eqiad.wmnet with OS buster
08:52 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1001"
08:31 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5025.eqsin.wmnet
08:31 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5017.eqsin.wmnet
08:20 fabfur: reboot cp5017 and cp5025 for kernel upgrade (T335835)
08:20 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5017.eqsin.wmnet
08:20 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5025.eqsin.wmnet
08:15 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
08:13 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.13 refs T337527
08:13 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
08:11 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
08:10 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
07:55 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1001"
07:34 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49438 and previous config saved to /var/cache/conftool/dbconfig/20230615-073248-root.json
07:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1492.eqiad.wmnet with reason: host reimage
07:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1492.eqiad.wmnet with reason: host reimage
07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49437 and previous config saved to /var/cache/conftool/dbconfig/20230615-071744-root.json
07:11 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host mw1492.eqiad.wmnet with OS buster
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49436 and previous config saved to /var/cache/conftool/dbconfig/20230615-070239-root.json
06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49435 and previous config saved to /var/cache/conftool/dbconfig/20230615-064734-root.json
06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49434 and previous config saved to /var/cache/conftool/dbconfig/20230615-063230-root.json
06:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID 2066
06:31 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID 2066
06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49433 and previous config saved to /var/cache/conftool/dbconfig/20230615-061725-root.json
06:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49432 and previous config saved to /var/cache/conftool/dbconfig/20230615-060220-root.json
05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49431 and previous config saved to /var/cache/conftool/dbconfig/20230615-054716-root.json
05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 to upgrade to 10.6.14 T338918', diff saved to https://phabricator.wikimedia.org/P49430 and previous config saved to /var/cache/conftool/dbconfig/20230615-053318-root.json

2023-06-14

23:38 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe
23:30 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe
21:38 mutante: phabricator - made dancy (https://phabricator.wikimedia.org/people/manage/25411/) and administrator (T339174)
21:02 taavi@deploy1002: Finished scap: Backport for Fix thumb styling on file description page (T337804) (duration: 10m 44s)
20:54 taavi@deploy1002: arlolra and taavi: Backport for Fix thumb styling on file description page (T337804) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:52 taavi@deploy1002: Started scap: Backport for Fix thumb styling on file description page (T337804)
20:34 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on otrs1001.eqiad.wmnet with reason: Replacing Host
20:34 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on otrs1001.eqiad.wmnet with reason: Replacing Host
20:29 taavi@deploy1002: Finished scap: Backport for Enable mobile page tabs for everyone in ptwikisource. (T338974) (duration: 10m 23s)
20:21 taavi@deploy1002: taavi and albertoleoncio: Backport for Enable mobile page tabs for everyone in ptwikisource. (T338974) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:19 taavi@deploy1002: Started scap: Backport for Enable mobile page tabs for everyone in ptwikisource. (T338974)
20:12 taavi@deploy1002: Finished scap: Backport for simplewiki: Remove "changetags" from registered user (T339124) (duration: 08m 55s)
20:10 mutante: https://ticket.wikimedia.org down for migration
20:06 taavi@deploy1002: taavi and stang: Backport for simplewiki: Remove "changetags" from registered user (T339124) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:03 taavi@deploy1002: Started scap: Backport for simplewiki: Remove "changetags" from registered user (T339124)
20:01 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on otrs1001.eqiad.wmnet with reason: Replacing Host
20:01 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on otrs1001.eqiad.wmnet with reason: Replacing Host
18:29 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@3d6caed]: Deploying mostly to rerun druid loading for mediawiki history reduced (duration: 00m 09s)
18:29 milimetric@deploy1002: Started deploy [airflow-dags/analytics@3d6caed]: Deploying mostly to rerun druid loading for mediawiki history reduced
18:11 moritzm: installing libssh security updates on buster
17:04 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
17:03 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
17:03 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
17:02 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
17:01 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
17:01 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
16:52 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS bullseye
16:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1492.mgmt.eqiad.wmnet with reboot policy FORCED
16:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw1492.mgmt.eqiad.wmnet with reboot policy FORCED
15:42 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) (duration: 02m 03s)
15:40 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided)
15:00 lucaswerkmeister-wmde:: Deployed security patch for T250720
14:53 lucaswerkmeister-wmde:: Deployed security patch for T250720
14:36 Amir1: mwscript findBadBlobs.php --wiki=nlwiki --revisions 880583,880584,880585,880586,880587,880588,880589,880590,880591,880592,880593,880594,880595,880596,880597,880598,880599,880600,880601,880602 --mark "T128154"
14:33 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
14:32 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
14:32 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
14:30 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
14:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
14:22 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Declare mediawiki.page_outlink_topic_prediction_change.v1 stream - T328899 (duration: 10m 25s)
14:19 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
14:17 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
14:17 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
14:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
14:15 bblack: dns2006: updating gdnsd package
14:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
14:13 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:12 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
14:12 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
13:59 topranks: adjusting port buffer partition asw2-esams T284592
13:58 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
13:58 topranks: adjusting port buffer partition asw1-eqsin T284592
13:58 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
13:57 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
13:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
13:57 topranks: adjusting port buffer partition asw2-ulsfo T284592
13:53 topranks: adjusting port buffer partition asw-d-codfw T284592
13:52 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
13:49 topranks: adjusting port buffer partition asw-c-codfw T284592
13:46 topranks: adjusting port buffer partition asw-b-codfw T284592
13:44 moritzm: imported jenkins 2.401.1 to thirdparty/ci for buster-wikimedia
13:42 topranks: adjusting port buffer partition asw-a-codfw T284592
13:42 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
13:14 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
13:05 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
12:57 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
12:47 jnuche@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.13 refs T337527 (duration: 06m 10s)
12:41 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.13 refs T337527
12:29 ladsgroup@deploy1002: Finished scap: Backport for Fix cases of LogicException in $update->getParserOutputForMetaData() (T339094) (duration: 08m 21s)
12:23 ladsgroup@deploy1002: ladsgroup: Backport for Fix cases of LogicException in $update->getParserOutputForMetaData() (T339094) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
12:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw1492.eqiad.wmnet
12:21 ladsgroup@deploy1002: Started scap: Backport for Fix cases of LogicException in $update->getParserOutputForMetaData() (T339094)
12:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1492.eqiad.wmnet
12:01 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "fix mgmt for cloudservices2004-dev - jbond@cumin1001"
12:00 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "fix mgmt for cloudservices2004-dev - jbond@cumin1001"
11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: T337961
11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: T337961
11:11 XioNoX: eqiad row D, move VRRP primary back to cr2 - T313463
11:06 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "fix mgmt - jbond@cumin1001"
11:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "fix mgmt - jbond@cumin1001"
11:00 XioNoX: disable cr2<->row D link for link migration - T313463
10:40 XioNoX: eqiad row D, move VRRP primary to cr1 - T313463
10:30 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1040-1043].eqiad.wmnet
10:30 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:30 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[1040-1043].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1001"
10:24 mvernon@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[1040-1043].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1001"
10:22 mvernon@cumin1001: START - Cookbook sre.dns.netbox
10:11 XioNoX: disable cr1<->row D link for link migration - T313463
10:03 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.13 refs T337527
10:03 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1040-1043].eqiad.wmnet
10:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
10:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
09:54 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:28 jnuche@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.13 refs T337527 (duration: 06m 56s)
09:21 moritzm: installing php7.4 security updates
09:21 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.13 refs T337527
09:11 hashar: zuul: rolled back config changes for T309376 and restarted Zuul. CI is back up.
09:00 tgr_: UTC morning deploys done
08:59 tgr@deploy1002: Finished scap: Backport for Section images: Pass section parameters to VE in add image tasks (T339046) (duration: 07m 55s)
08:58 hashar: Rolling back Zuul config change and restarting Zuul to clear ssh connections
08:53 tgr@deploy1002: tgr: Backport for Section images: Pass section parameters to VE in add image tasks (T339046) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
08:51 tgr@deploy1002: Started scap: Backport for Section images: Pass section parameters to VE in add image tasks (T339046)
08:51 hashar: Restarting Zuul to apply config change for T309376
08:48 tgr@deploy1002: Finished scap: Backport for Revert "jquery.makeCollapsible: Use `unset: all` on buttons" (T333357 T338927) (duration: 08m 14s)
08:41 tgr@deploy1002: tgr: Backport for Revert "jquery.makeCollapsible: Use `unset: all` on buttons" (T333357 T338927) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
08:40 tgr@deploy1002: Started scap: Backport for Revert "jquery.makeCollapsible: Use `unset: all` on buttons" (T333357 T338927)
08:18 tgr@deploy1002: Finished scap: Backport for Structured tasks: Fix toolbar rewriting (T338934) (duration: 12m 52s)
08:07 tgr@deploy1002: tgr: Backport for Structured tasks: Fix toolbar rewriting (T338934) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
08:05 tgr@deploy1002: Started scap: Backport for Structured tasks: Fix toolbar rewriting (T338934)
07:46 tgr_: backporting https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/929966 (can't edit wikitech due to DB issues)
07:40 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 29
07:40 ayounsi@cumin2002: START - Cookbook sre.network.debug for Netbox circuit ID 29
07:32 tgr_: test
07:31 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 3 Wikipedias (T338123) (duration: 09m 54s)
07:23 kartik@deploy1002: kartik: Backport for testwiki: Enable Section Translation for 3 Wikipedias (T338123) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
07:21 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 3 Wikipedias (T338123)
07:19 kartik@deploy1002: Backport cancelled.
07:18 kartik@deploy1002: Finished scap: Backport for Enable Content and Section Translation for a 2nd group of 9 languages previously lacking machine translation (T337669) (duration: 13m 35s)
07:06 kartik@deploy1002: kartik: Backport for Enable Content and Section Translation for a 2nd group of 9 languages previously lacking machine translation (T337669) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
07:04 kartik@deploy1002: Started scap: Backport for Enable Content and Section Translation for a 2nd group of 9 languages previously lacking machine translation (T337669)
07:04 marostegui: Test
04:34 ejegg: civicrm upgraded from fd87e0df to d61220cd
04:01 ejegg: civicrm upgraded from a675c2c9 to fd87e0df
01:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
01:57 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
01:50 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
01:47 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
01:41 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
01:05 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS bullseye
00:09 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye

2023-06-13

23:57 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
23:40 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
23:00 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
22:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
22:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1035']
22:14 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
21:26 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:06 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
21:06 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on people1004.eqiad.wmnet with reason: first setup
20:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on people1004.eqiad.wmnet with reason: first setup
20:55 ebernhardson@deploy1002: Finished scap: Backport for cirrus: Enable analysis chain deduplication for wikibase (T334194) (duration: 07m 36s)
20:49 ebernhardson@deploy1002: ebernhardson: Backport for cirrus: Enable analysis chain deduplication for wikibase (T334194) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1004.eqiad.wmnet
20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host people1004.eqiad.wmnet with OS bookworm
20:48 ebernhardson@deploy1002: Started scap: Backport for cirrus: Enable analysis chain deduplication for wikibase (T334194)
20:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on people1004.eqiad.wmnet with reason: host reimage
20:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on people1004.eqiad.wmnet with reason: host reimage
20:29 urbanecm@deploy1002: Finished scap: Backport for Exclude after-aligned tools when creating target widgets (T338978) (duration: 08m 10s)
20:22 urbanecm@deploy1002: matmarex and urbanecm: Backport for Exclude after-aligned tools when creating target widgets (T338978) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:21 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host people1004.eqiad.wmnet with OS bookworm
20:20 urbanecm@deploy1002: Started scap: Backport for Exclude after-aligned tools when creating target widgets (T338978)
20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people1004.eqiad.wmnet - dzahn@cumin1001"
20:17 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people1004.eqiad.wmnet - dzahn@cumin1001"
20:17 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
20:17 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
20:17 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
20:14 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
20:09 urbanecm: Start `foreachwikiindblist 'group2 & s7' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all` in a tmux session on mwmaint1002 (T315510)
20:01 dzahn@cumin1001: START - Cookbook sre.dns.netbox
20:01 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1004.eqiad.wmnet
19:56 eileen: civicrm: revision a675c2c9, config c83f9a1a
19:50 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people2003.codfw.wmnet
19:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host people2003.codfw.wmnet with OS bookworm
19:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on people2003.codfw.wmnet with reason: host reimage
19:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on people2003.codfw.wmnet with reason: host reimage
19:33 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
19:18 bblack: dns4004 - downtime removed, agent back to normal, etc
19:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
19:17 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host people2003.codfw.wmnet with OS bookworm
19:09 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people2003.codfw.wmnet - dzahn@cumin1001"
19:08 bblack: dns4004: downtiming and stopping agent for a bit, to test some new software
19:08 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people2003.codfw.wmnet - dzahn@cumin1001"
19:08 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people2003.codfw.wmnet on all recursors
19:08 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people2003.codfw.wmnet on all recursors
19:08 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:08 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people2003.codfw.wmnet - dzahn@cumin1001"
18:48 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people2003.codfw.wmnet - dzahn@cumin1001"
18:47 Amir1: root@clouddb1021.eqiad.wmnet[commonswiki]> ALTER TABLE externallinks ROW_FORMAT=COMPRESSED; (T337961)
18:44 ladsgroup@deploy1002: Finished scap: Backport for Retrieve external links from PreparedUpdate (T65632 T264104) (duration: 12m 18s)
18:43 dzahn@cumin1001: START - Cookbook sre.dns.netbox
18:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people2003.codfw.wmnet
18:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1017.eqiad.wmnet with OS buster
18:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:35 mutante: grafana2001 - apt-get clean
18:34 ladsgroup@deploy1002: ladsgroup: Backport for Retrieve external links from PreparedUpdate (T65632 T264104) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
18:32 ladsgroup@deploy1002: Started scap: Backport for Retrieve external links from PreparedUpdate (T65632 T264104)
18:30 mutante: ganeti2021 - deleting VM people2003
18:30 mutante: ganeti1028 - deleting VM people2003
18:29 mutante: ganeti1028 - deleting VM people1004
18:29 Amir1: root@clouddb1021.eqiad.wmnet[ruwikinews]> ALTER TABLE externallinks ROW_FORMAT=COMPRESSED; (T337961)
18:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb1021.eqiad.wmnet with reason: T337961
18:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb1021.eqiad.wmnet with reason: T337961
18:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1016.eqiad.wmnet with OS buster
18:23 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
18:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1016.eqiad.wmnet with reason: host reimage
17:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1016.eqiad.wmnet with reason: host reimage
17:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
17:55 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host snapshot1016.eqiad.wmnet with OS buster
17:38 ladsgroup@deploy1002: Finished scap: Backport for Make old_links retrieval cleaner (duration: 18m 09s)
17:28 otto@deploy1002: Finished deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f] - to stat1009f (duration: 01m 25s)
17:27 otto@deploy1002: Started deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f] - to stat1009f
17:22 otto@deploy1002: Finished deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f] - to stat1009 (duration: 00m 02s)
17:22 otto@deploy1002: Started deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f] - to stat1009
17:22 ladsgroup@deploy1002: ladsgroup: Backport for Make old_links retrieval cleaner synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
17:22 otto@deploy1002: Finished deploy [analytics/refinery@c337e2f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@c337e2f] (duration: 01m 43s)
17:20 ladsgroup@deploy1002: Started scap: Backport for Make old_links retrieval cleaner
17:20 otto@deploy1002: Started deploy [analytics/refinery@c337e2f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@c337e2f]
17:20 otto@deploy1002: Finished deploy [analytics/refinery@c337e2f] (thin): Regular analytics weekly train THIN [analytics/refinery@c337e2f] (duration: 00m 04s)
17:20 otto@deploy1002: Started deploy [analytics/refinery@c337e2f] (thin): Regular analytics weekly train THIN [analytics/refinery@c337e2f]
17:13 otto@deploy1002: Finished deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f] (duration: 07m 51s)
17:06 otto@deploy1002: Started deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f]
17:05 otto@deploy1002: Finished deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f] (duration: 24m 03s)
16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1017.eqiad.wmnet with reason: host reimage
16:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2112.codfw.wmnet with reason: Maintenance
16:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2112.codfw.wmnet with reason: Maintenance
16:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1163.eqiad.wmnet with reason: Maintenance
16:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1163.eqiad.wmnet with reason: Maintenance
16:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2176.codfw.wmnet with reason: Maintenance
16:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2176.codfw.wmnet with reason: Maintenance
16:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2174.codfw.wmnet with reason: Maintenance
16:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2174.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2173.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2173.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2170.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2170.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2167.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2167.codfw.wmnet with reason: Maintenance
16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2153.codfw.wmnet with reason: Maintenance
16:53 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1017.eqiad.wmnet with reason: host reimage
16:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2153.codfw.wmnet with reason: Maintenance
16:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2146.codfw.wmnet with reason: Maintenance
16:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2146.codfw.wmnet with reason: Maintenance
16:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2145.codfw.wmnet with reason: Maintenance
16:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2145.codfw.wmnet with reason: Maintenance
16:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2141.codfw.wmnet with reason: Maintenance
16:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2141.codfw.wmnet with reason: Maintenance
16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2130.codfw.wmnet with reason: Maintenance
16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2130.codfw.wmnet with reason: Maintenance
16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2116.codfw.wmnet with reason: Maintenance
16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2116.codfw.wmnet with reason: Maintenance
16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2103.codfw.wmnet with reason: Maintenance
16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2103.codfw.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2102.codfw.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2102.codfw.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2097.codfw.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2097.codfw.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1219.eqiad.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1219.eqiad.wmnet with reason: Maintenance
16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1218.eqiad.wmnet with reason: Maintenance
16:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1218.eqiad.wmnet with reason: Maintenance
16:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1207.eqiad.wmnet with reason: Maintenance
16:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1207.eqiad.wmnet with reason: Maintenance
16:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1206.eqiad.wmnet with reason: Maintenance
16:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1206.eqiad.wmnet with reason: Maintenance
16:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1196.eqiad.wmnet with reason: Maintenance
16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1196.eqiad.wmnet with reason: Maintenance
16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1186.eqiad.wmnet with reason: Maintenance
16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1186.eqiad.wmnet with reason: Maintenance
16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1184.eqiad.wmnet with reason: Maintenance
16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1184.eqiad.wmnet with reason: Maintenance
16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1169.eqiad.wmnet with reason: Maintenance
16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1169.eqiad.wmnet with reason: Maintenance
16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1140.eqiad.wmnet with reason: Maintenance
16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1140.eqiad.wmnet with reason: Maintenance
16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1139.eqiad.wmnet with reason: Maintenance
16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1139.eqiad.wmnet with reason: Maintenance
16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1135.eqiad.wmnet with reason: Maintenance
16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1135.eqiad.wmnet with reason: Maintenance
16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1134.eqiad.wmnet with reason: Maintenance
16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1134.eqiad.wmnet with reason: Maintenance
16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1132.eqiad.wmnet with reason: Maintenance
16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1132.eqiad.wmnet with reason: Maintenance
16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1128.eqiad.wmnet with reason: Maintenance
16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1128.eqiad.wmnet with reason: Maintenance
16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1119.eqiad.wmnet with reason: Maintenance
16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1119.eqiad.wmnet with reason: Maintenance
16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1118.eqiad.wmnet with reason: Maintenance
16:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1118.eqiad.wmnet with reason: Maintenance
16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1106.eqiad.wmnet with reason: Maintenance
16:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1106.eqiad.wmnet with reason: Maintenance
16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2140.codfw.wmnet with reason: Maintenance
16:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2140.codfw.wmnet with reason: Maintenance
16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1160.eqiad.wmnet with reason: Maintenance
16:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1160.eqiad.wmnet with reason: Maintenance
16:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2179.codfw.wmnet with reason: Maintenance
16:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2179.codfw.wmnet with reason: Maintenance
16:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2172.codfw.wmnet with reason: Maintenance
16:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2172.codfw.wmnet with reason: Maintenance
16:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2155.codfw.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2155.codfw.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2147.codfw.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2147.codfw.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2139.codfw.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2139.codfw.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2138.codfw.wmnet with reason: Maintenance
16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2138.codfw.wmnet with reason: Maintenance
16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance
16:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance
16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2136.codfw.wmnet with reason: Maintenance
16:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2136.codfw.wmnet with reason: Maintenance
16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2119.codfw.wmnet with reason: Maintenance
16:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2119.codfw.wmnet with reason: Maintenance
16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2110.codfw.wmnet with reason: Maintenance
16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2110.codfw.wmnet with reason: Maintenance
16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2106.codfw.wmnet with reason: Maintenance
16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2106.codfw.wmnet with reason: Maintenance
16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2099.codfw.wmnet with reason: Maintenance
16:41 otto@deploy1002: Started deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f]
16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2099.codfw.wmnet with reason: Maintenance
16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
16:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1221.eqiad.wmnet with reason: Maintenance
16:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1221.eqiad.wmnet with reason: Maintenance
16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1199.eqiad.wmnet with reason: Maintenance
16:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1199.eqiad.wmnet with reason: Maintenance
16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1190.eqiad.wmnet with reason: Maintenance
16:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1190.eqiad.wmnet with reason: Maintenance
16:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
16:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
16:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1149.eqiad.wmnet with reason: Maintenance
16:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1149.eqiad.wmnet with reason: Maintenance
16:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1148.eqiad.wmnet with reason: Maintenance
16:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1148.eqiad.wmnet with reason: Maintenance
16:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1147.eqiad.wmnet with reason: Maintenance
16:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1147.eqiad.wmnet with reason: Maintenance
16:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1146.eqiad.wmnet with reason: Maintenance
16:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1146.eqiad.wmnet with reason: Maintenance
16:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1145.eqiad.wmnet with reason: Maintenance
16:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1145.eqiad.wmnet with reason: Maintenance
16:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
16:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
16:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1143.eqiad.wmnet with reason: Maintenance
16:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1143.eqiad.wmnet with reason: Maintenance
16:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1142.eqiad.wmnet with reason: Maintenance
16:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1142.eqiad.wmnet with reason: Maintenance
16:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1017.eqiad.wmnet with OS buster
16:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1141.eqiad.wmnet with reason: Maintenance
16:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1141.eqiad.wmnet with reason: Maintenance
16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1138.eqiad.wmnet with reason: Maintenance
16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1138.eqiad.wmnet with reason: Maintenance
16:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2107.codfw.wmnet with reason: Maintenance
16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2107.codfw.wmnet with reason: Maintenance
16:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1162.eqiad.wmnet with reason: Maintenance
16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1162.eqiad.wmnet with reason: Maintenance
16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2175.codfw.wmnet with reason: Maintenance
16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2175.codfw.wmnet with reason: Maintenance
16:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2170.codfw.wmnet with reason: Maintenance
16:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2170.codfw.wmnet with reason: Maintenance
16:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
16:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
16:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2138.codfw.wmnet with reason: Maintenance
16:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2138.codfw.wmnet with reason: Maintenance
16:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
16:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
16:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2126.codfw.wmnet with reason: Maintenance
16:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2126.codfw.wmnet with reason: Maintenance
16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2125.codfw.wmnet with reason: Maintenance
16:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2125.codfw.wmnet with reason: Maintenance
16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2104.codfw.wmnet with reason: Maintenance
16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2104.codfw.wmnet with reason: Maintenance
16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2097.codfw.wmnet with reason: Maintenance
16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2097.codfw.wmnet with reason: Maintenance
16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
16:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
16:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1225.eqiad.wmnet with reason: Maintenance
16:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1225.eqiad.wmnet with reason: Maintenance
16:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1222.eqiad.wmnet with reason: Maintenance
16:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1222.eqiad.wmnet with reason: Maintenance
16:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1197.eqiad.wmnet with reason: Maintenance
16:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1197.eqiad.wmnet with reason: Maintenance
16:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1188.eqiad.wmnet with reason: Maintenance
16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1188.eqiad.wmnet with reason: Maintenance
16:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1182.eqiad.wmnet with reason: Maintenance
16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1182.eqiad.wmnet with reason: Maintenance
16:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1170.eqiad.wmnet with reason: Maintenance
16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1170.eqiad.wmnet with reason: Maintenance
16:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
16:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
16:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1156.eqiad.wmnet with reason: Maintenance
16:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1156.eqiad.wmnet with reason: Maintenance
16:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1146.eqiad.wmnet with reason: Maintenance
16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1146.eqiad.wmnet with reason: Maintenance
16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1139.eqiad.wmnet with reason: Maintenance
16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1139.eqiad.wmnet with reason: Maintenance
16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1129.eqiad.wmnet with reason: Maintenance
16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1129.eqiad.wmnet with reason: Maintenance
16:19 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
16:19 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
16:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1017.eqiad.wmnet with OS buster
16:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
16:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
15:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1016.eqiad.wmnet with reason: host reimage
15:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1157.eqiad.wmnet with reason: Maintenance
15:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1157.eqiad.wmnet with reason: Maintenance
15:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1016.eqiad.wmnet with reason: host reimage
15:45 SandraEbele: Deployed refinery-source using jenkins
15:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1149']
15:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
15:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
15:28 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
15:28 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
15:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1149']
15:28 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
15:28 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
15:27 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host snapshot1016.eqiad.wmnet with OS buster
15:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1149']
15:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
15:16 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
15:15 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
15:14 SandraEbele: deploying refinery source
15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2177.codfw.wmnet with reason: Maintenance
15:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
15:02 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
15:01 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
15:00 elukey: run kafka re-assign partitions for eqiad.change-prop.transcludes.resource-change on kafka-main1001 - T338357
14:59 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
14:58 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
14:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
14:57 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
14:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
14:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
14:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2156.codfw.wmnet with reason: Maintenance
14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2156.codfw.wmnet with reason: Maintenance
14:47 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2149.codfw.wmnet with reason: Maintenance
14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2149.codfw.wmnet with reason: Maintenance
14:16 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4052.ulsfo.wmnet
14:16 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4044.ulsfo.wmnet
14:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2139.codfw.wmnet with reason: Maintenance
14:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2139.codfw.wmnet with reason: Maintenance
14:05 fabfur: reboot cp4044 and cp4052 for kernel upgrade (T335835)
14:05 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4044.ulsfo.wmnet
14:05 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4052.ulsfo.wmnet
14:03 claime: Revert noc.wikimedia.org to eqiad, running authdns-update - T331634
13:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
13:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
13:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
13:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2127.codfw.wmnet with reason: Maintenance
13:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2127.codfw.wmnet with reason: Maintenance
13:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1160.eqiad.wmnet with reason: Maintenance
13:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1160.eqiad.wmnet with reason: Maintenance
13:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
13:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:42 urbanecm@deploy1002: Finished scap: Backport for Section images: Fix image placeholder alignment for RTL content (T338837) (duration: 10m 29s)
13:41 sukhe: disable puppet on R:Class bird::anycast_healthchecker to merge CR 928804
13:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
13:35 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4051.ulsfo.wmnet
13:35 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4043.ulsfo.wmnet
13:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
13:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
13:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
13:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
13:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
13:33 urbanecm@deploy1002: kharlan and urbanecm: Backport for Section images: Fix image placeholder alignment for RTL content (T338837) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
13:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
13:31 urbanecm@deploy1002: Started scap: Backport for Section images: Fix image placeholder alignment for RTL content (T338837)
13:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
13:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
13:25 fabfur: reboot cp4043 and cp4051 for kernel upgrade (T335835)
13:24 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4043.ulsfo.wmnet
13:24 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4051.ulsfo.wmnet
13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:21 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
13:21 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
13:21 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
13:20 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
13:20 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
13:19 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
13:18 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
13:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
13:15 urbanecm@deploy1002: Finished scap: Backport for Drop disabling removed Datatype (T332724), Testwikidatawiki: Enable new EntitySchema Datatype (T332724) (duration: 09m 29s)
13:07 urbanecm@deploy1002: migr and urbanecm: Backport for Drop disabling removed Datatype (T332724), Testwikidatawiki: Enable new EntitySchema Datatype (T332724) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:06 urbanecm@deploy1002: Started scap: Backport for Drop disabling removed Datatype (T332724), Testwikidatawiki: Enable new EntitySchema Datatype (T332724)
13:02 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4050.ulsfo.wmnet
13:01 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4042.ulsfo.wmnet
13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P49421 and previous config saved to /var/cache/conftool/dbconfig/20230613-130129-ladsgroup.json
13:01 moritzm: installing nbconvert security updates
12:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
12:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
12:51 fabfur: reboot cp4042 and cp4050 for kernel upgrade (T335835)
12:51 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4042.ulsfo.wmnet
12:51 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4050.ulsfo.wmnet
12:46 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P49420 and previous config saved to /var/cache/conftool/dbconfig/20230613-124623-ladsgroup.json
12:45 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
12:45 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
12:44 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
12:44 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
12:44 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
12:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1225.eqiad.wmnet with reason: Maintenance
12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1225.eqiad.wmnet with reason: Maintenance
12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P49419 and previous config saved to /var/cache/conftool/dbconfig/20230613-123117-ladsgroup.json
12:29 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4049.ulsfo.wmnet
12:28 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4041.ulsfo.wmnet
12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1223.eqiad.wmnet with reason: Maintenance
12:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1223.eqiad.wmnet with reason: Maintenance
12:18 fabfur: reboot cp4041 and cp4049 for kernel upgrade (T335835)
12:18 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4041.ulsfo.wmnet
12:18 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4049.ulsfo.wmnet
12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P49418 and previous config saved to /var/cache/conftool/dbconfig/20230613-121611-ladsgroup.json
12:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
12:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
12:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1212.eqiad.wmnet with reason: Maintenance
12:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1212.eqiad.wmnet with reason: Maintenance
12:09 hashar: Restarted Zuul CI due to T309376
12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1198.eqiad.wmnet with reason: Maintenance
12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1198.eqiad.wmnet with reason: Maintenance
11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1189.eqiad.wmnet with reason: Maintenance
11:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1189.eqiad.wmnet with reason: Maintenance
11:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1175.eqiad.wmnet with reason: Maintenance
11:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1175.eqiad.wmnet with reason: Maintenance
11:45 Amir1: cat wikis_having_stubs | xargs -I {} bash -c 'echo {}; touch /home/ladsgroup/{}.undo.sql; chmod 777 /home/ladsgroup/{}.undo.sql; mwscript maintenance/storage/moveToExternal.php --wiki={} --end 200000000 --undo /home/ladsgroup/{}.undo.sql DB cluster26' (T299387)
11:43 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4048.ulsfo.wmnet
11:42 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4040.ulsfo.wmnet
11:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T329049)
11:40 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T329049)
11:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T329049)
11:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1166.eqiad.wmnet with reason: Maintenance
11:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1166.eqiad.wmnet with reason: Maintenance
11:36 ladsgroup@deploy1002: Finished scap: Backport for moveToExternal: Also check for utf8 encoding before trying to convert (duration: 09m 59s)
11:35 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T329049)
11:32 fabfur: reboot cp4040 and cp4048 for kernel upgrade (T335835)
11:32 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4040.ulsfo.wmnet
11:32 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4048.ulsfo.wmnet
11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49417 and previous config saved to /var/cache/conftool/dbconfig/20230613-113111-root.json
11:28 ladsgroup@deploy1002: ladsgroup: Backport for moveToExternal: Also check for utf8 encoding before trying to convert synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
11:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
11:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
11:26 ladsgroup@deploy1002: Started scap: Backport for moveToExternal: Also check for utf8 encoding before trying to convert
11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2118.codfw.wmnet with reason: Maintenance
11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2118.codfw.wmnet with reason: Maintenance
11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1181.eqiad.wmnet with reason: Maintenance
11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1181.eqiad.wmnet with reason: Maintenance
11:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2182.codfw.wmnet with reason: Maintenance
11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2182.codfw.wmnet with reason: Maintenance
11:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2169.codfw.wmnet with reason: Maintenance
11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2169.codfw.wmnet with reason: Maintenance
11:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2168.codfw.wmnet with reason: Maintenance
11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2168.codfw.wmnet with reason: Maintenance
11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2159.codfw.wmnet with reason: Maintenance
11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2159.codfw.wmnet with reason: Maintenance
11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2150.codfw.wmnet with reason: Maintenance
11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2150.codfw.wmnet with reason: Maintenance
11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2122.codfw.wmnet with reason: Maintenance
11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2122.codfw.wmnet with reason: Maintenance
11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2121.codfw.wmnet with reason: Maintenance
11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2121.codfw.wmnet with reason: Maintenance
11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2120.codfw.wmnet with reason: Maintenance
11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2120.codfw.wmnet with reason: Maintenance
11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2108.codfw.wmnet with reason: Maintenance
11:20 ladsgroup@deploy1002: Finished scap: Backport for Set medium wikis to read new for externallinks (T335343) (duration: 10m 09s)
11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2108.codfw.wmnet with reason: Maintenance
11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2100.codfw.wmnet with reason: Maintenance
11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2100.codfw.wmnet with reason: Maintenance
11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2098.codfw.wmnet with reason: Maintenance
11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2098.codfw.wmnet with reason: Maintenance
11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
11:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1202.eqiad.wmnet with reason: Maintenance
11:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1202.eqiad.wmnet with reason: Maintenance
11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1194.eqiad.wmnet with reason: Maintenance
11:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1194.eqiad.wmnet with reason: Maintenance
11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1191.eqiad.wmnet with reason: Maintenance
11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1191.eqiad.wmnet with reason: Maintenance
11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1174.eqiad.wmnet with reason: Maintenance
11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1174.eqiad.wmnet with reason: Maintenance
11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1171.eqiad.wmnet with reason: Maintenance
11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1171.eqiad.wmnet with reason: Maintenance
11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1170.eqiad.wmnet with reason: Maintenance
11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1170.eqiad.wmnet with reason: Maintenance
11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1158.eqiad.wmnet with reason: Maintenance
11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1158.eqiad.wmnet with reason: Maintenance
11:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1136.eqiad.wmnet with reason: Maintenance
11:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1136.eqiad.wmnet with reason: Maintenance
11:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1127.eqiad.wmnet with reason: Maintenance
11:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1127.eqiad.wmnet with reason: Maintenance
11:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49416 and previous config saved to /var/cache/conftool/dbconfig/20230613-111607-root.json
11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P49415 and previous config saved to /var/cache/conftool/dbconfig/20230613-111549-ladsgroup.json
11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
11:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2165.codfw.wmnet with reason: Maintenance
11:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2165.codfw.wmnet with reason: Maintenance
11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1126.eqiad.wmnet with reason: Maintenance
11:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1126.eqiad.wmnet with reason: Maintenance
11:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2181.codfw.wmnet with reason: Maintenance
11:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2181.codfw.wmnet with reason: Maintenance
11:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2168.codfw.wmnet with reason: Maintenance
11:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2168.codfw.wmnet with reason: Maintenance
11:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2167.codfw.wmnet with reason: Maintenance
11:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2167.codfw.wmnet with reason: Maintenance
11:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2166.codfw.wmnet with reason: Maintenance
11:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2166.codfw.wmnet with reason: Maintenance
11:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
11:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
11:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2164.codfw.wmnet with reason: Maintenance
11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2164.codfw.wmnet with reason: Maintenance
11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2163.codfw.wmnet with reason: Maintenance
11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2163.codfw.wmnet with reason: Maintenance
11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2162.codfw.wmnet with reason: Maintenance
11:12 ladsgroup@deploy1002: ladsgroup: Backport for Set medium wikis to read new for externallinks (T335343) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2162.codfw.wmnet with reason: Maintenance
11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2161.codfw.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2161.codfw.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2154.codfw.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2154.codfw.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2152.codfw.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2152.codfw.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2100.codfw.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2100.codfw.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2098.codfw.wmnet with reason: Maintenance
11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2098.codfw.wmnet with reason: Maintenance
11:10 ladsgroup@deploy1002: Started scap: Backport for Set medium wikis to read new for externallinks (T335343)
11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1216.eqiad.wmnet with reason: Maintenance
11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1216.eqiad.wmnet with reason: Maintenance
11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1214.eqiad.wmnet with reason: Maintenance
11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1214.eqiad.wmnet with reason: Maintenance
11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1211.eqiad.wmnet with reason: Maintenance
11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1211.eqiad.wmnet with reason: Maintenance
11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1209.eqiad.wmnet with reason: Maintenance
11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1209.eqiad.wmnet with reason: Maintenance
11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1203.eqiad.wmnet with reason: Maintenance
11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1203.eqiad.wmnet with reason: Maintenance
11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1193.eqiad.wmnet with reason: Maintenance
11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1193.eqiad.wmnet with reason: Maintenance
11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1192.eqiad.wmnet with reason: Maintenance
11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1192.eqiad.wmnet with reason: Maintenance
11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1178.eqiad.wmnet with reason: Maintenance
11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1178.eqiad.wmnet with reason: Maintenance
11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1177.eqiad.wmnet with reason: Maintenance
11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1177.eqiad.wmnet with reason: Maintenance
11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1172.eqiad.wmnet with reason: Maintenance
11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1172.eqiad.wmnet with reason: Maintenance
11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1171.eqiad.wmnet with reason: Maintenance
11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49414 and previous config saved to /var/cache/conftool/dbconfig/20230613-110746-ladsgroup.json
11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1171.eqiad.wmnet with reason: Maintenance
11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1167.eqiad.wmnet with reason: Maintenance
11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1167.eqiad.wmnet with reason: Maintenance
11:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2113.codfw.wmnet with reason: Maintenance
11:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2113.codfw.wmnet with reason: Maintenance
11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance
11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance
11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49413 and previous config saved to /var/cache/conftool/dbconfig/20230613-110102-root.json
10:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2178.codfw.wmnet with reason: Maintenance
10:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2178.codfw.wmnet with reason: Maintenance
10:56 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4047.ulsfo.wmnet
10:56 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4039.ulsfo.wmnet
10:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
10:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
10:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
10:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P49412 and previous config saved to /var/cache/conftool/dbconfig/20230613-105240-ladsgroup.json
10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance
10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance
10:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
10:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
10:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2128.codfw.wmnet with reason: Maintenance
10:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2128.codfw.wmnet with reason: Maintenance
10:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
10:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance
10:46 fabfur: reboot cp4039 and cp4047 for kernel upgrade (T335835)
10:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance
10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49411 and previous config saved to /var/cache/conftool/dbconfig/20230613-104557-root.json
10:45 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4047.ulsfo.wmnet
10:45 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4039.ulsfo.wmnet
10:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance
10:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance
10:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1213.eqiad.wmnet with reason: Maintenance
10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1213.eqiad.wmnet with reason: Maintenance
10:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1210.eqiad.wmnet with reason: Maintenance
10:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1210.eqiad.wmnet with reason: Maintenance
10:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1200.eqiad.wmnet with reason: Maintenance
10:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1200.eqiad.wmnet with reason: Maintenance
10:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1185.eqiad.wmnet with reason: Maintenance
10:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1185.eqiad.wmnet with reason: Maintenance
10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1183.eqiad.wmnet with reason: Maintenance
10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1183.eqiad.wmnet with reason: Maintenance
10:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
10:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
10:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
10:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P49410 and previous config saved to /var/cache/conftool/dbconfig/20230613-103734-ladsgroup.json
10:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1145.eqiad.wmnet with reason: Maintenance
10:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1145.eqiad.wmnet with reason: Maintenance
10:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
10:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2129.codfw.wmnet with reason: Maintenance
10:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2129.codfw.wmnet with reason: Maintenance
10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1131.eqiad.wmnet with reason: Maintenance
10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1131.eqiad.wmnet with reason: Maintenance
10:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
10:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1224.eqiad.wmnet with reason: Maintenance
10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1224.eqiad.wmnet with reason: Maintenance
10:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1213.eqiad.wmnet with reason: Maintenance
10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1213.eqiad.wmnet with reason: Maintenance
10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1201.eqiad.wmnet with reason: Maintenance
10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1201.eqiad.wmnet with reason: Maintenance
10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1187.eqiad.wmnet with reason: Maintenance
10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1187.eqiad.wmnet with reason: Maintenance
10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance
10:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance
10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49409 and previous config saved to /var/cache/conftool/dbconfig/20230613-103053-root.json
10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1173.eqiad.wmnet with reason: Maintenance
10:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1173.eqiad.wmnet with reason: Maintenance
10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance
10:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance
10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1165.eqiad.wmnet with reason: Maintenance
10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1165.eqiad.wmnet with reason: Maintenance
10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1140.eqiad.wmnet with reason: Maintenance
10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1140.eqiad.wmnet with reason: Maintenance
10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2180.codfw.wmnet with reason: Maintenance
10:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2180.codfw.wmnet with reason: Maintenance
10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
10:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2169.codfw.wmnet with reason: Maintenance
10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2169.codfw.wmnet with reason: Maintenance
10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2158.codfw.wmnet with reason: Maintenance
10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2158.codfw.wmnet with reason: Maintenance
10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2151.codfw.wmnet with reason: Maintenance
10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2151.codfw.wmnet with reason: Maintenance
10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2141.codfw.wmnet with reason: Maintenance
10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2141.codfw.wmnet with reason: Maintenance
10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2124.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2124.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2117.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2117.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2114.codfw.wmnet with reason: Maintenance
10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2114.codfw.wmnet with reason: Maintenance
10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49408 and previous config saved to /var/cache/conftool/dbconfig/20230613-102227-ladsgroup.json
10:18 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4046.ulsfo.wmnet
10:18 Amir1: killed extensions/MachineVision/maintenance/prioritizeFilesWithTemplate.php it was blocking a depool in s4
10:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4038.ulsfo.wmnet
10:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49407 and previous config saved to /var/cache/conftool/dbconfig/20230613-101548-root.json
10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49406 and previous config saved to /var/cache/conftool/dbconfig/20230613-101310-ladsgroup.json
10:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
10:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
10:07 fabfur: reboot cp4038 and cp4046 for kernel upgrade (T335835)
10:07 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4038.ulsfo.wmnet
10:07 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4046.ulsfo.wmnet
10:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netflow2002.codfw.wmnet
10:06 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
10:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49405 and previous config saved to /var/cache/conftool/dbconfig/20230613-100043-root.json
09:58 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts netflow2002.codfw.wmnet
09:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
09:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
09:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49404 and previous config saved to /var/cache/conftool/dbconfig/20230613-094538-root.json
09:45 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
09:42 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
09:38 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
09:38 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
09:33 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4037.ulsfo.wmnet
09:33 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4045.ulsfo.wmnet
09:24 fabfur: reboot cp4037 and cp4045 for kernel upgrade (T335835)
09:23 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4037.ulsfo.wmnet
09:23 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4045.ulsfo.wmnet
09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180 to upgrade to 10.6.14 T338918', diff saved to https://phabricator.wikimedia.org/P49403 and previous config saved to /var/cache/conftool/dbconfig/20230613-092208-root.json
09:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6008.drmrs.wmnet
09:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6016.drmrs.wmnet
09:08 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices2004-dev
09:08 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:08 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2004-dev decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
09:07 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2004-dev decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
09:03 fabfur: reboot cp6008 and cp6016 for kernel upgrade (T335835)
09:03 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6008.drmrs.wmnet
09:03 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6016.drmrs.wmnet
09:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow1002.eqiad.wmnet with OS bookworm
08:59 aborrero@cumin2002: START - Cookbook sre.dns.netbox
08:49 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudservices2004-dev
08:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow1002.eqiad.wmnet with reason: host reimage
08:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow1002.eqiad.wmnet with reason: host reimage
08:30 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6015.drmrs.wmnet
08:30 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6007.drmrs.wmnet
08:25 vgutierrez: cleaning up prometheus-https service from IPVS on lvs2014 - T326657
08:22 fabfur: reboot cp6007 and cp6015 for kernel upgrade (T335835)
08:22 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6007.drmrs.wmnet
08:22 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6015.drmrs.wmnet
08:20 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.13 refs T337527
08:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host netflow1002.eqiad.wmnet with OS bookworm
08:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow3002.esams.wmnet with OS bookworm
08:00 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6006.drmrs.wmnet
08:00 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6014.drmrs.wmnet
07:53 fabfur: reboot cp6006.drmrs.wmnet and cp6014.drmrs.wmnet for kernel upgrade (T335835)
07:52 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6014.drmrs.wmnet
07:52 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6006.drmrs.wmnet
07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow3002.esams.wmnet with reason: host reimage
07:32 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6005.drmrs.wmnet
07:32 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6013.drmrs.wmnet
07:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow3002.esams.wmnet with reason: host reimage
07:23 fabfur: rebooting cp6005.drmrs.wmnet and cp6013.drmrs.wmnet for upgrade
07:23 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6005.drmrs.wmnet
07:23 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6013.drmrs.wmnet
07:10 elukey: move varnishkafka instances on cp4037 to PKI TLS certs - T337825
07:09 kart_: Updated MinT to 2023-06-13-061519-production (T337656, T334465)
07:08 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
07:08 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
07:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host netflow3002.esams.wmnet with OS bookworm
07:03 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6012.drmrs.wmnet
07:02 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6004.drmrs.wmnet
06:59 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
06:59 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
06:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
06:55 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
06:55 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
06:55 fabfur: rebooting cp6004.drmrs.wmnet and cp6012.drmrs.wmnet for upgrade
06:55 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
06:54 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6012.drmrs.wmnet
06:53 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6004.drmrs.wmnet
06:51 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
06:48 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
06:41 kart_: Updated cxserver to 2023-06-13-054849-production (T338123, T338146, T337834)
06:39 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:38 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
06:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:26 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
06:18 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
06:17 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
05:48 marostegui: dbmaint Deploy schema change on x1 eqiad with replication T337940
03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.11 (duration: 02m 13s)
03:52 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.13 refs T337527 (duration: 49m 27s)
03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.13 refs T337527
02:54 eileen: civicrm upgraded from 5bbed553 to d63f548c
02:46 eileen: civicrm upgraded from 5bbed553 to d63f548c
00:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
00:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye

2023-06-12

23:52 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
23:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
23:36 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
23:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
23:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
23:05 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
22:22 brett: Roll restarting pybal on lvs2014 to revert prometheus service rollout - T326657
22:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
22:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host snapshot1016.eqiad.wmnet with OS buster
22:07 cstone: payments-wiki upgraded from f3b229c6 to b1cf4f26
21:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
21:20 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
21:20 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
21:16 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
20:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1003.eqiad.wmnet with OS bullseye
20:36 urbanecm@deploy1002: Finished scap: Backport for [Growth] Enable new Impact module for rowiki (T336203) (duration: 07m 06s)
20:31 urbanecm@deploy1002: urbanecm: Backport for [Growth] Enable new Impact module for rowiki (T336203) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:29 urbanecm@deploy1002: Started scap: Backport for [Growth] Enable new Impact module for rowiki (T336203)
20:29 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host people2003.codfw.wmnet
20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people2003.codfw.wmnet on all recursors
20:29 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people2003.codfw.wmnet on all recursors
20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people2003.codfw.wmnet - dzahn@cumin1001"
20:28 urbanecm: Run extensions/GrowthExperiments/maintenance/refreshUserImpactData.php for rowiki (T336203)
20:28 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people2003.codfw.wmnet - dzahn@cumin1001"
20:25 dzahn@cumin1001: START - Cookbook sre.dns.netbox
20:24 urbanecm@deploy1002: Finished scap: Backport for [Growth] Enable user impact refresh for rowiki (T336203) (duration: 06m 53s)
20:23 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
20:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
20:22 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host people2003.codfw.wmnet with OS bookworm
20:19 urbanecm@deploy1002: urbanecm: Backport for [Growth] Enable user impact refresh for rowiki (T336203) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:17 urbanecm@deploy1002: Started scap: Backport for [Growth] Enable user impact refresh for rowiki (T336203)
20:16 urbanecm@deploy1002: Finished scap: Backport for prod: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336364), Remove references to $wgEnableLocalTimedText from CommonSettings, Remove unused variable wmgEnableLocalTimedText (duration: 11m 33s)
20:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1010.eqiad.wmnet with OS bullseye
20:14 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:11 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
20:06 urbanecm@deploy1002: daimona and urbanecm: Backport for prod: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336364), Remove references to $wgEnableLocalTimedText from CommonSettings, Remove unused variable wmgEnableLocalTimedText synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codf
20:04 urbanecm@deploy1002: Started scap: Backport for prod: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336364), Remove references to $wgEnableLocalTimedText from CommonSettings, Remove unused variable wmgEnableLocalTimedText
20:03 brett: Roll restarting pybal on lvs2014 then lvs2013 - T863380
20:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
19:54 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
19:51 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
19:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1149']
19:41 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
19:38 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1149']
19:35 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@fb9dba3]: repoint drafttopic ingestion to model specific stream (duration: 00m 10s)
19:35 ebernhardson@deploy1002: Started deploy [airflow-dags/search@fb9dba3]: repoint drafttopic ingestion to model specific stream
19:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
19:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
19:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
19:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
19:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
19:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
19:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
19:05 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host people2003.codfw.wmnet with OS bookworm
18:44 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people2003.codfw.wmnet - dzahn@cumin1001"
18:43 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people2003.codfw.wmnet - dzahn@cumin1001"
18:42 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people2003.codfw.wmnet on all recursors
18:42 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people2003.codfw.wmnet on all recursors
18:42 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:42 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people2003.codfw.wmnet - dzahn@cumin1001"
18:42 robh@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
18:41 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people2003.codfw.wmnet - dzahn@cumin1001"
18:39 dzahn@cumin1001: START - Cookbook sre.dns.netbox
18:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people2003.codfw.wmnet
18:37 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host people1004.eqiad.wmnet
18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
18:37 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
18:36 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
18:33 dzahn@cumin1001: START - Cookbook sre.dns.netbox
18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
18:33 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
18:32 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
18:26 dzahn@cumin1001: START - Cookbook sre.dns.netbox
18:26 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1004.eqiad.wmnet
18:25 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host people1004.eqiad.wmnet
18:25 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
18:25 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
18:25 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:25 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
18:24 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
18:21 dzahn@cumin1001: START - Cookbook sre.dns.netbox
18:21 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
18:21 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
18:21 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:21 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
18:20 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
18:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host backup1010.eqiad.wmnet with OS bullseye
18:14 dzahn@cumin1001: START - Cookbook sre.dns.netbox
18:14 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1004.eqiad.wmnet
18:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host people1004.eqiad.wmnet
18:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
18:09 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
18:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:09 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
18:06 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
18:04 dzahn@cumin1001: START - Cookbook sre.dns.netbox
18:04 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host people1004.eqiad.wmnet with OS bookworm
17:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
17:15 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host people1004.eqiad.wmnet with OS bookworm
17:11 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people1004.eqiad.wmnet - dzahn@cumin1001"
17:10 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people1004.eqiad.wmnet - dzahn@cumin1001"
17:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
17:09 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
17:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:09 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
17:08 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
17:03 dzahn@cumin1001: START - Cookbook sre.dns.netbox
17:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1004.eqiad.wmnet
17:03 mutante: creating ganeti VM people1004 with os==bookworm passed to makevm cookbook to test bookworm and because this is traditionally an early adoptor of new distro releases
16:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003']
16:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
16:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
16:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003']
16:48 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
16:08 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6011.drmrs.wmnet
16:07 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 03s)
16:02 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
16:01 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
16:01 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 21s)
15:59 fabfur: reboot cp6011.drmrs.wmnet for upgrade
15:59 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6011.drmrs.wmnet
15:51 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6003.drmrs.wmnet
15:43 fabfur: reboot cp6003.drmrs.wmnet for upgrade
15:42 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6003.drmrs.wmnet
15:34 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6010.drmrs.wmnet
15:25 fabfur: rebooting cp6010.drmrs.wmnet for upgrade
15:25 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6010.drmrs.wmnet
15:23 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6002.drmrs.wmnet
15:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host backup1010.eqiad.wmnet with OS bullseye
15:17 fabfur: reboot cp6002.drmrs.wmnet for upgrade
15:14 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6002.drmrs.wmnet
15:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
15:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
15:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
15:04 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6009.drmrs.wmnet
15:00 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1011.eqiad.wmnet with OS bullseye
15:00 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:58 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
14:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
14:56 fabfur: reboot cp6009.drmrs.wmnet for pgrade
14:56 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6009.drmrs.wmnet
14:51 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6001.drmrs.wmnet
14:44 fabfur: rebooting cp6001.drmrs.wmnet for upgrade
14:42 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6001.drmrs.wmnet
14:41 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1011.eqiad.wmnet with reason: host reimage
14:38 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1011.eqiad.wmnet with reason: host reimage
14:29 zabe: Deployed updated mitigations for T336027
14:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host snapshot1016.eqiad.wmnet with OS buster
14:26 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
14:26 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
14:23 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup1011.eqiad.wmnet']
14:22 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup1010.eqiad.wmnet']
14:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1011.eqiad.wmnet']
14:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1010.eqiad.wmnet']
14:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup1011.eqiad.wmnet']
14:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup1010.eqiad.wmnet']
14:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1011.eqiad.wmnet']
14:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1010.eqiad.wmnet']
14:02 Lucas_WMDE: UTC afternoon backport+config window done
14:02 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Remove wmgWikibaseTmpEnableLabelsInApiSummaries feature flag (T335107) (duration: 06m 49s)
14:01 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
13:57 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Remove wmgWikibaseTmpEnableLabelsInApiSummaries feature flag (T335107) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:55 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Remove wmgWikibaseTmpEnableLabelsInApiSummaries feature flag (T335107)
13:54 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Remove wmgWikibaseTmpWbsubscribersSensibleOutput feature flag (T335783) (duration: 06m 54s)
13:51 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Remove wmgWikibaseTmpWbsubscribersSensibleOutput feature flag (T335783) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
13:47 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Remove wmgWikibaseTmpWbsubscribersSensibleOutput feature flag (T335783)
13:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [wikidatawiki] Add pagelang to wikidata-staff (T337760) (duration: 07m 27s)
13:40 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for [wikidatawiki] Add pagelang to wikidata-staff (T337760) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
13:38 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [wikidatawiki] Add pagelang to wikidata-staff (T337760)
13:32 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for ImageSuggestions: add help link to 4 new languages (T331036) (duration: 11m 23s)
13:21 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and mfossati: Backport for ImageSuggestions: add help link to 4 new languages (T331036) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:20 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for ImageSuggestions: add help link to 4 new languages (T331036)
13:15 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Switch VisualEditor to not use RESTbase on English Wikipedia. (T320529) (duration: 10m 51s)
13:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
13:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host snapshot1016.eqiad.wmnet with OS buster
13:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
13:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host snapshot1016.eqiad.wmnet with OS buster
13:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow6001.drmrs.wmnet with OS bookworm
13:05 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and daniel: Backport for Switch VisualEditor to not use RESTbase on English Wikipedia. (T320529) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
13:04 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Switch VisualEditor to not use RESTbase on English Wikipedia. (T320529)
12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow6001.drmrs.wmnet with reason: host reimage
12:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow6001.drmrs.wmnet with reason: host reimage
12:28 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host netflow6001.drmrs.wmnet with OS bookworm
12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
12:01 ladsgroup@deploy1002: Finished scap: Backport for Set small wikis to read new for externallinks (T335343) (duration: 12m 22s)
11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow5002.eqsin.wmnet with OS bookworm
11:50 ladsgroup@deploy1002: ladsgroup: Backport for Set small wikis to read new for externallinks (T335343) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
11:49 ladsgroup@deploy1002: Started scap: Backport for Set small wikis to read new for externallinks (T335343)
11:32 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
11:32 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
11:31 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
11:30 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
11:29 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
11:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow5002.eqsin.wmnet with reason: host reimage
11:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow5002.eqsin.wmnet with reason: host reimage
10:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
10:56 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
10:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
10:56 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
10:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host netflow5002.eqsin.wmnet with OS bookworm
10:40 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=enwiki --start 31000000 --end 110000000 --undo /home/ladsgroup/T128151.undo.sql --iconv DB cluster27 (T128151)
10:08 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
09:57 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
09:48 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow4002.ulsfo.wmnet with OS bookworm
09:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
09:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
09:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow4002.ulsfo.wmnet with reason: host reimage
08:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow4002.ulsfo.wmnet with reason: host reimage
08:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
08:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
08:50 taavi@deploy1002: Finished scap: Backport for [knwiki] Add a temporary logo for the 20th anniversary (T338136), [lmowiki] Removing the Purtaal namespace and fixing the Portal talk translation (T338621) (duration: 16m 44s)
08:42 taavi@deploy1002: superpes and taavi: Backport for [knwiki] Add a temporary logo for the 20th anniversary (T338136), [lmowiki] Removing the Purtaal namespace and fixing the Portal talk translation (T338621) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
08:39 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host netflow4002.ulsfo.wmnet with OS bookworm
08:33 taavi@deploy1002: Started scap: Backport for [knwiki] Add a temporary logo for the 20th anniversary (T338136), [lmowiki] Removing the Purtaal namespace and fixing the Portal talk translation (T338621)
08:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
08:30 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
08:30 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
07:01 moritzm: upgrading bookworm netboot images to final/released bookworm images T330495
06:54 kart_: Updated MinT to 2023-06-10-124931-production (T284905)
06:45 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
06:44 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2003.wikimedia.org
06:41 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
06:36 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
06:36 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
06:16 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
04:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
04:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
04:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
04:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
04:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
02:51 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/storage/moveToExternal.php --wiki=enwiki --end 32000000 --undo /home/ladsgroup/T128151.undo.sql --iconv DB cluster27 (T128151)

2023-06-11

Welcome di casino online kami bosku, dengan banyak game casino yang menarik.

Progresif yang sangat besar, main blackjack langsung, baccarat atau poker secara live, terpilih menjadi kasino yang terbaik secara berturut - turut.

2023-06-10

17:58 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
17:58 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye

2023-06-09

21:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1011.eqiad.wmnet with OS bullseye
21:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
20:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
20:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
20:38 btullis@cumin1001: END (ERROR) - Cookbook sre.aqs.roll-restart-reboot (exit_code=97) rolling restart_daemons on A:aqs
20:23 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs
17:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS bullseye
17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host snapshot1016.eqiad.wmnet with OS buster
17:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
17:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T336886)', diff saved to https://phabricator.wikimedia.org/P49398 and previous config saved to /var/cache/conftool/dbconfig/20230609-173202-ladsgroup.json
17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P49397 and previous config saved to /var/cache/conftool/dbconfig/20230609-171656-ladsgroup.json
17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P49396 and previous config saved to /var/cache/conftool/dbconfig/20230609-170150-ladsgroup.json
16:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T336886)', diff saved to https://phabricator.wikimedia.org/P49395 and previous config saved to /var/cache/conftool/dbconfig/20230609-164644-ladsgroup.json
16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T336886)', diff saved to https://phabricator.wikimedia.org/P49394 and previous config saved to /var/cache/conftool/dbconfig/20230609-163007-ladsgroup.json
16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T336886)', diff saved to https://phabricator.wikimedia.org/P49393 and previous config saved to /var/cache/conftool/dbconfig/20230609-162946-ladsgroup.json
16:20 urandom: powercycling restbase1028
16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P49392 and previous config saved to /var/cache/conftool/dbconfig/20230609-161440-ladsgroup.json
16:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host snapshot1017.mgmt.eqiad.wmnet with reboot policy FORCED
16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['snapshot1016']
16:02 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['snapshot1016']
15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P49391 and previous config saved to /var/cache/conftool/dbconfig/20230609-155934-ladsgroup.json
15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host snapshot1016.mgmt.eqiad.wmnet with reboot policy FORCED
15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T336886)', diff saved to https://phabricator.wikimedia.org/P49390 and previous config saved to /var/cache/conftool/dbconfig/20230609-154428-ladsgroup.json
15:30 andrewbogott: wikitech-static: deleted everything in /srv/mediawiki/images/wikitech/archive for T338520
15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T336886)', diff saved to https://phabricator.wikimedia.org/P49388 and previous config saved to /var/cache/conftool/dbconfig/20230609-152845-ladsgroup.json
15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T336886)', diff saved to https://phabricator.wikimedia.org/P49387 and previous config saved to /var/cache/conftool/dbconfig/20230609-152824-ladsgroup.json
15:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host snapshot1017.mgmt.eqiad.wmnet with reboot policy FORCED
15:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host snapshot1016.mgmt.eqiad.wmnet with reboot policy FORCED
15:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:23 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for snapshot101[6-7] - pt1979@cumin2002"
15:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for snapshot101[6-7] - pt1979@cumin2002"
15:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P49386 and previous config saved to /var/cache/conftool/dbconfig/20230609-151318-ladsgroup.json
14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P49385 and previous config saved to /var/cache/conftool/dbconfig/20230609-145812-ladsgroup.json
14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T336886)', diff saved to https://phabricator.wikimedia.org/P49384 and previous config saved to /var/cache/conftool/dbconfig/20230609-144305-ladsgroup.json
14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T336886)', diff saved to https://phabricator.wikimedia.org/P49383 and previous config saved to /var/cache/conftool/dbconfig/20230609-142731-ladsgroup.json
14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
14:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49382 and previous config saved to /var/cache/conftool/dbconfig/20230609-142655-ladsgroup.json
14:14 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P49381 and previous config saved to /var/cache/conftool/dbconfig/20230609-141149-ladsgroup.json
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P49380 and previous config saved to /var/cache/conftool/dbconfig/20230609-135643-ladsgroup.json
13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49379 and previous config saved to /var/cache/conftool/dbconfig/20230609-134137-ladsgroup.json
13:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
13:29 sukhe: start pybal on lvs2013
13:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
13:25 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49378 and previous config saved to /var/cache/conftool/dbconfig/20230609-132541-ladsgroup.json
13:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
13:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49377 and previous config saved to /var/cache/conftool/dbconfig/20230609-132520-ladsgroup.json
13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P49376 and previous config saved to /var/cache/conftool/dbconfig/20230609-131014-ladsgroup.json
13:07 sukhe: stop pybal on lvs2013 to test lvs2014
13:02 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs2014
13:02 sukhe: sudo cumin 'A:lvs and A:codfw' 'enable-puppet "CR 928818"'
13:01 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2014
12:59 sukhe: sudo cumin 'A:lvs and A:codfw' 'disable-puppet "CR 928818"'
12:57 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2014
12:57 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2014
12:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2014
12:55 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2014
12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P49373 and previous config saved to /var/cache/conftool/dbconfig/20230609-125508-ladsgroup.json
12:50 krinkle@deploy1002: Finished scap: I385d28 (duration: 06m 59s)
12:43 krinkle@deploy1002: Started scap: I385d28
12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49371 and previous config saved to /var/cache/conftool/dbconfig/20230609-124002-ladsgroup.json
12:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:30 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-add DNS for cloud-hosts-codfw vlan. - cmooney@cumin1001"
12:29 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-add DNS for cloud-hosts-codfw vlan. - cmooney@cumin1001"
12:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox
12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49370 and previous config saved to /var/cache/conftool/dbconfig/20230609-122303-ladsgroup.json
12:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
12:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T336886)', diff saved to https://phabricator.wikimedia.org/P49369 and previous config saved to /var/cache/conftool/dbconfig/20230609-122243-ladsgroup.json
12:16 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:16 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2003-dev - aborrero@cumin2002"
12:15 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2003-dev - aborrero@cumin2002"
12:13 aborrero@cumin2002: START - Cookbook sre.dns.netbox
12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P49368 and previous config saved to /var/cache/conftool/dbconfig/20230609-120737-ladsgroup.json
11:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Fsero out of all services on: 778 hosts
11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P49367 and previous config saved to /var/cache/conftool/dbconfig/20230609-115230-ladsgroup.json
11:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Fsero out of all services on: 778 hosts
11:50 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Fsero out of all services on: 1262 hosts
11:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Fsero out of all services on: 1262 hosts
11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T336886)', diff saved to https://phabricator.wikimedia.org/P49366 and previous config saved to /var/cache/conftool/dbconfig/20230609-113724-ladsgroup.json
11:27 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T336886)', diff saved to https://phabricator.wikimedia.org/P49365 and previous config saved to /var/cache/conftool/dbconfig/20230609-112250-ladsgroup.json
11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T336886)', diff saved to https://phabricator.wikimedia.org/P49364 and previous config saved to /var/cache/conftool/dbconfig/20230609-112229-ladsgroup.json
11:20 sukhe: pcc-db1001: sudo systemctl start pcc_facts_processor.service
11:14 sukhe: sudo /usr/local/sbin/puppet-facts-upload --proxy http://webproxy.eqiad.wmnet:8080
11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P49363 and previous config saved to /var/cache/conftool/dbconfig/20230609-110723-ladsgroup.json
11:02 sukhe: homer "cr*-codfw*" commit "Gerrit: 928113 add new LVS host lvs2014
10:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2014.codfw.wmnet with OS bullseye
10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P49362 and previous config saved to /var/cache/conftool/dbconfig/20230609-105217-ladsgroup.json
10:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T336886)', diff saved to https://phabricator.wikimedia.org/P49361 and previous config saved to /var/cache/conftool/dbconfig/20230609-103711-ladsgroup.json
10:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T336886)', diff saved to https://phabricator.wikimedia.org/P49360 and previous config saved to /var/cache/conftool/dbconfig/20230609-102217-ladsgroup.json
10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T336886)', diff saved to https://phabricator.wikimedia.org/P49359 and previous config saved to /var/cache/conftool/dbconfig/20230609-102156-ladsgroup.json
10:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
10:12 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
10:12 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
10:09 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
10:08 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P49358 and previous config saved to /var/cache/conftool/dbconfig/20230609-100650-ladsgroup.json
09:57 elukey: increase {eqiad,codfw}.change-prop.transcludes.resource-change topic partitions (3->5) on kafka main clusters - T338357
09:56 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:54 moritzm: installing jupyter-core security updates on bullseye
09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P49357 and previous config saved to /var/cache/conftool/dbconfig/20230609-095144-ladsgroup.json
09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T336886)', diff saved to https://phabricator.wikimedia.org/P49356 and previous config saved to /var/cache/conftool/dbconfig/20230609-093638-ladsgroup.json
09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T336886)', diff saved to https://phabricator.wikimedia.org/P49355 and previous config saved to /var/cache/conftool/dbconfig/20230609-092141-ladsgroup.json
09:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
09:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
09:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T336886)', diff saved to https://phabricator.wikimedia.org/P49354 and previous config saved to /var/cache/conftool/dbconfig/20230609-090829-ladsgroup.json
08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P49353 and previous config saved to /var/cache/conftool/dbconfig/20230609-085322-ladsgroup.json
08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P49352 and previous config saved to /var/cache/conftool/dbconfig/20230609-083816-ladsgroup.json
08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T336886)', diff saved to https://phabricator.wikimedia.org/P49351 and previous config saved to /var/cache/conftool/dbconfig/20230609-082310-ladsgroup.json
08:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T336886)', diff saved to https://phabricator.wikimedia.org/P49350 and previous config saved to /var/cache/conftool/dbconfig/20230609-080708-ladsgroup.json
08:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
08:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T336886)', diff saved to https://phabricator.wikimedia.org/P49349 and previous config saved to /var/cache/conftool/dbconfig/20230609-080637-ladsgroup.json
07:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P49348 and previous config saved to /var/cache/conftool/dbconfig/20230609-075130-ladsgroup.json
07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P49347 and previous config saved to /var/cache/conftool/dbconfig/20230609-073624-ladsgroup.json
07:33 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1492.eqiad.wmnet
07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T336886)', diff saved to https://phabricator.wikimedia.org/P49346 and previous config saved to /var/cache/conftool/dbconfig/20230609-072118-ladsgroup.json
07:19 moritzm: powercycling restbase2018 (kernel hung following what looks like I/O errors)
07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T336886)', diff saved to https://phabricator.wikimedia.org/P49345 and previous config saved to /var/cache/conftool/dbconfig/20230609-070520-ladsgroup.json
07:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
07:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T336886)', diff saved to https://phabricator.wikimedia.org/P49344 and previous config saved to /var/cache/conftool/dbconfig/20230609-070459-ladsgroup.json
06:50 moritzm: installing wireshark security updates
06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P49343 and previous config saved to /var/cache/conftool/dbconfig/20230609-064953-ladsgroup.json
06:49 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: puppetmaster2005.codfw.wmnet
06:49 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: puppetmaster2005.codfw.wmnet
06:49 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: puppetmaster1005.eqiad.wmnet
06:49 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: puppetmaster1005.eqiad.wmnet
06:49 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: prometheus3001.esams.wmnet
06:48 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: prometheus3001.esams.wmnet
06:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
06:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P49342 and previous config saved to /var/cache/conftool/dbconfig/20230609-063447-ladsgroup.json
06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T336886)', diff saved to https://phabricator.wikimedia.org/P49341 and previous config saved to /var/cache/conftool/dbconfig/20230609-061941-ladsgroup.json
06:06 eileen: config 97c57848 -> 6f4a9d19 restart jobs
06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T336886)', diff saved to https://phabricator.wikimedia.org/P49340 and previous config saved to /var/cache/conftool/dbconfig/20230609-060438-ladsgroup.json
06:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
06:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
05:53 eileen: civicrm upgraded from 158896cc to 5bbed553
05:52 eileen: config revision changed from 8b71fa7a to 97c57848
05:50 moritzm: installing cpio security updates
05:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
05:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
05:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
05:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
05:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T336886)', diff saved to https://phabricator.wikimedia.org/P49339 and previous config saved to /var/cache/conftool/dbconfig/20230609-052315-ladsgroup.json
05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P49338 and previous config saved to /var/cache/conftool/dbconfig/20230609-050809-ladsgroup.json
04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P49337 and previous config saved to /var/cache/conftool/dbconfig/20230609-045302-ladsgroup.json
04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T336886)', diff saved to https://phabricator.wikimedia.org/P49336 and previous config saved to /var/cache/conftool/dbconfig/20230609-043756-ladsgroup.json
04:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T336886)', diff saved to https://phabricator.wikimedia.org/P49335 and previous config saved to /var/cache/conftool/dbconfig/20230609-042306-ladsgroup.json
04:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1219.eqiad.wmnet with reason: Maintenance
04:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1219.eqiad.wmnet with reason: Maintenance
04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T336886)', diff saved to https://phabricator.wikimedia.org/P49334 and previous config saved to /var/cache/conftool/dbconfig/20230609-042246-ladsgroup.json
04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P49333 and previous config saved to /var/cache/conftool/dbconfig/20230609-040739-ladsgroup.json
03:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P49332 and previous config saved to /var/cache/conftool/dbconfig/20230609-035233-ladsgroup.json
03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T336886)', diff saved to https://phabricator.wikimedia.org/P49331 and previous config saved to /var/cache/conftool/dbconfig/20230609-033727-ladsgroup.json
03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1218 (T336886)', diff saved to https://phabricator.wikimedia.org/P49330 and previous config saved to /var/cache/conftool/dbconfig/20230609-032127-ladsgroup.json
03:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1218.eqiad.wmnet with reason: Maintenance
03:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1218.eqiad.wmnet with reason: Maintenance
03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T336886)', diff saved to https://phabricator.wikimedia.org/P49329 and previous config saved to /var/cache/conftool/dbconfig/20230609-032106-ladsgroup.json
03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P49328 and previous config saved to /var/cache/conftool/dbconfig/20230609-030600-ladsgroup.json
02:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P49327 and previous config saved to /var/cache/conftool/dbconfig/20230609-025054-ladsgroup.json
02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T336886)', diff saved to https://phabricator.wikimedia.org/P49326 and previous config saved to /var/cache/conftool/dbconfig/20230609-023548-ladsgroup.json
02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T336886)', diff saved to https://phabricator.wikimedia.org/P49325 and previous config saved to /var/cache/conftool/dbconfig/20230609-022054-ladsgroup.json
02:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
02:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T336886)', diff saved to https://phabricator.wikimedia.org/P49324 and previous config saved to /var/cache/conftool/dbconfig/20230609-022034-ladsgroup.json
02:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudswift1002.eqiad.wmnet with OS bullseye
02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P49323 and previous config saved to /var/cache/conftool/dbconfig/20230609-020528-ladsgroup.json
02:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
02:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
02:00 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P49322 and previous config saved to /var/cache/conftool/dbconfig/20230609-015021-ladsgroup.json
01:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1011.eqiad.wmnet with OS bullseye
01:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T336886)', diff saved to https://phabricator.wikimedia.org/P49321 and previous config saved to /var/cache/conftool/dbconfig/20230609-013515-ladsgroup.json
01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS bullseye
01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T336886)', diff saved to https://phabricator.wikimedia.org/P49320 and previous config saved to /var/cache/conftool/dbconfig/20230609-011945-ladsgroup.json
01:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1206.eqiad.wmnet with reason: Maintenance
01:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1206.eqiad.wmnet with reason: Maintenance
01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T336886)', diff saved to https://phabricator.wikimedia.org/P49319 and previous config saved to /var/cache/conftool/dbconfig/20230609-011924-ladsgroup.json
01:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P49318 and previous config saved to /var/cache/conftool/dbconfig/20230609-010418-ladsgroup.json
00:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
00:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage
00:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1011.eqiad.wmnet with OS bullseye
00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P49317 and previous config saved to /var/cache/conftool/dbconfig/20230609-004912-ladsgroup.json
00:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage
00:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
00:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS bullseye
00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T336886)', diff saved to https://phabricator.wikimedia.org/P49316 and previous config saved to /var/cache/conftool/dbconfig/20230609-003406-ladsgroup.json
00:31 eileen: civicrm upgraded from 6f64e77d to 158896cc
00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pki-root1002']
00:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki-root1002']
00:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pki-root1002']
00:24 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki-root1002']
00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with reboot policy FORCED
00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T336886)', diff saved to https://phabricator.wikimedia.org/P49315 and previous config saved to /var/cache/conftool/dbconfig/20230609-001821-ladsgroup.json
00:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
00:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
00:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1196.eqiad.wmnet with reason: Maintenance
00:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1196.eqiad.wmnet with reason: Maintenance
00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T336886)', diff saved to https://phabricator.wikimedia.org/P49314 and previous config saved to /var/cache/conftool/dbconfig/20230609-001732-ladsgroup.json
00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P49313 and previous config saved to /var/cache/conftool/dbconfig/20230609-000226-ladsgroup.json

2023-06-08

23:55 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
23:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
23:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
23:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
23:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P49312 and previous config saved to /var/cache/conftool/dbconfig/20230608-234720-ladsgroup.json
23:42 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with reboot policy FORCED
23:41 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:41 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for pki-root - pt1979@cumin2002"
23:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for pki-root - pt1979@cumin2002"
23:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
23:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T336886)', diff saved to https://phabricator.wikimedia.org/P49311 and previous config saved to /var/cache/conftool/dbconfig/20230608-233214-ladsgroup.json
23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T336886)', diff saved to https://phabricator.wikimedia.org/P49310 and previous config saved to /var/cache/conftool/dbconfig/20230608-231650-ladsgroup.json
23:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1186.eqiad.wmnet with reason: Maintenance
23:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1186.eqiad.wmnet with reason: Maintenance
23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T336886)', diff saved to https://phabricator.wikimedia.org/P49309 and previous config saved to /var/cache/conftool/dbconfig/20230608-231629-ladsgroup.json
23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P49308 and previous config saved to /var/cache/conftool/dbconfig/20230608-230123-ladsgroup.json
22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P49307 and previous config saved to /var/cache/conftool/dbconfig/20230608-224617-ladsgroup.json
22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: decom
22:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: decom
22:37 mutante: gerrit1001 - rmdir /etc/ssh/userkeys/gerrit.d which leads to puppet warnings because it cant remove empty dir
22:35 mutante: removing gerrit role from former gerrit prod machine gerrit1001, removes firewall rules, shell access, monitoring..etc
22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T336886)', diff saved to https://phabricator.wikimedia.org/P49306 and previous config saved to /var/cache/conftool/dbconfig/20230608-223111-ladsgroup.json
22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T336886)', diff saved to https://phabricator.wikimedia.org/P49305 and previous config saved to /var/cache/conftool/dbconfig/20230608-221536-ladsgroup.json
22:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
22:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T336886)', diff saved to https://phabricator.wikimedia.org/P49304 and previous config saved to /var/cache/conftool/dbconfig/20230608-221515-ladsgroup.json
22:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P49303 and previous config saved to /var/cache/conftool/dbconfig/20230608-220009-ladsgroup.json
21:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P49302 and previous config saved to /var/cache/conftool/dbconfig/20230608-214503-ladsgroup.json
21:31 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T336886)', diff saved to https://phabricator.wikimedia.org/P49301 and previous config saved to /var/cache/conftool/dbconfig/20230608-212957-ladsgroup.json
21:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T336886)', diff saved to https://phabricator.wikimedia.org/P49300 and previous config saved to /var/cache/conftool/dbconfig/20230608-211419-ladsgroup.json
21:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
21:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
21:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup1011.eqiad.wmnet']
21:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1011.eqiad.wmnet']
21:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup1011.eqiad.wmnet']
21:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup1010.eqiad.wmnet']
21:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1010.eqiad.wmnet']
21:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup1010.eqiad.wmnet']
21:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1011.eqiad.wmnet']
21:05 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1010.eqiad.wmnet']
21:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
21:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
20:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
20:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T336886)', diff saved to https://phabricator.wikimedia.org/P49298 and previous config saved to /var/cache/conftool/dbconfig/20230608-204722-ladsgroup.json
20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P49297 and previous config saved to /var/cache/conftool/dbconfig/20230608-203216-ladsgroup.json
20:31 ladsgroup@deploy1002: Finished scap: Backport for Externallinks: Make port part of the index (T337149) (duration: 10m 10s)
20:22 ladsgroup@deploy1002: ladsgroup: Backport for Externallinks: Make port part of the index (T337149) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1028.eqiad.wmnet with OS bullseye
20:20 ladsgroup@deploy1002: Started scap: Backport for Externallinks: Make port part of the index (T337149)
20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P49296 and previous config saved to /var/cache/conftool/dbconfig/20230608-201710-ladsgroup.json
20:12 ladsgroup@deploy1002: Finished scap: Backport for Remove VectorLimitedWidthIndicator (T336197) (duration: 07m 32s)
20:06 ladsgroup@deploy1002: ladsgroup and ksarabia: Backport for Remove VectorLimitedWidthIndicator (T336197) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:05 ladsgroup@deploy1002: Started scap: Backport for Remove VectorLimitedWidthIndicator (T336197)
20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T336886)', diff saved to https://phabricator.wikimedia.org/P49295 and previous config saved to /var/cache/conftool/dbconfig/20230608-200204-ladsgroup.json
20:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1010.mgmt.eqiad.wmnet with reboot policy FORCED
19:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
19:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T336886)', diff saved to https://phabricator.wikimedia.org/P49294 and previous config saved to /var/cache/conftool/dbconfig/20230608-194555-ladsgroup.json
19:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
19:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T336886)', diff saved to https://phabricator.wikimedia.org/P49293 and previous config saved to /var/cache/conftool/dbconfig/20230608-194534-ladsgroup.json
19:40 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS bullseye
19:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P49292 and previous config saved to /var/cache/conftool/dbconfig/20230608-193028-ladsgroup.json
19:22 jclark@cumin1001: START - Cookbook sre.hosts.provision for host backup1010.mgmt.eqiad.wmnet with reboot policy FORCED
19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P49291 and previous config saved to /var/cache/conftool/dbconfig/20230608-191522-ladsgroup.json
19:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1011.mgmt.eqiad.wmnet with reboot policy FORCED
19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T336886)', diff saved to https://phabricator.wikimedia.org/P49290 and previous config saved to /var/cache/conftool/dbconfig/20230608-190016-ladsgroup.json
18:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1010.mgmt.eqiad.wmnet with reboot policy FORCED
18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T336886)', diff saved to https://phabricator.wikimedia.org/P49289 and previous config saved to /var/cache/conftool/dbconfig/20230608-184312-ladsgroup.json
18:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T336886)', diff saved to https://phabricator.wikimedia.org/P49288 and previous config saved to /var/cache/conftool/dbconfig/20230608-184251-ladsgroup.json
18:36 jclark@cumin1001: START - Cookbook sre.hosts.provision for host backup1011.mgmt.eqiad.wmnet with reboot policy FORCED
18:36 jclark@cumin1001: START - Cookbook sre.hosts.provision for host backup1010.mgmt.eqiad.wmnet with reboot policy FORCED
18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P49287 and previous config saved to /var/cache/conftool/dbconfig/20230608-182745-ladsgroup.json
18:24 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in eqiad: maintenance
18:19 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in eqiad: maintenance
18:18 urandom: (Re)pooling sessionstore/eqiad — T337426
18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P49286 and previous config saved to /var/cache/conftool/dbconfig/20230608-181238-ladsgroup.json
18:09 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.12 refs T337526
17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T336886)', diff saved to https://phabricator.wikimedia.org/P49285 and previous config saved to /var/cache/conftool/dbconfig/20230608-175732-ladsgroup.json
17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T336886)', diff saved to https://phabricator.wikimedia.org/P49284 and previous config saved to /var/cache/conftool/dbconfig/20230608-174135-ladsgroup.json
17:41 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
17:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
17:35 pt1979@cumin2002: START - Cookbook sre.dns.netbox
17:31 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:31 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:30 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:30 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:28 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T336886)', diff saved to https://phabricator.wikimedia.org/P49283 and previous config saved to /var/cache/conftool/dbconfig/20230608-172746-ladsgroup.json
17:24 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:14 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P49282 and previous config saved to /var/cache/conftool/dbconfig/20230608-171240-ladsgroup.json
17:10 stevemunene@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
17:05 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster1006.eqiad.wmnet with OS bullseye
17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P49281 and previous config saved to /var/cache/conftool/dbconfig/20230608-165734-ladsgroup.json
16:56 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs
16:46 urandom: Starting traffic test against sessionstore.svc.eqiad.wmnet — T337426
16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster1006.eqiad.wmnet with reason: host reimage
16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T336886)', diff saved to https://phabricator.wikimedia.org/P49280 and previous config saved to /var/cache/conftool/dbconfig/20230608-164228-ladsgroup.json
16:41 urandom: Upgrading Cassandra to 4.1.1, sessionstore1003 — T337426
16:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster1006.eqiad.wmnet with reason: host reimage
16:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host puppetmaster1006.eqiad.wmnet with OS bullseye
16:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetmaster1006.eqiad.wmnet with OS bullseye
16:35 urandom: Upgrading Cassandra to 4.1.1, sessionstore1002 — T337426
16:34 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs
16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T336886)', diff saved to https://phabricator.wikimedia.org/P49279 and previous config saved to /var/cache/conftool/dbconfig/20230608-162650-ladsgroup.json
16:26 urandom: Upgrading Cassandra to 4.1.1, sessionstore1001 — T337426
16:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
16:22 urandom: creating pre-upgrade Cassandra snapshots, sessionstore/eqiad — T337426
16:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
16:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
16:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
16:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
16:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
16:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
16:11 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in eqiad: maintenance
16:06 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2014.codfw.wmnet with OS bullseye
16:06 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in eqiad: maintenance
16:06 urandom: depooling eqiad sessionstore for Cassandra upgrade — T337426
16:00 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
15:58 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2014.codfw.wmnet with OS bullseye
15:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
15:23 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
15:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host puppetmaster1006.eqiad.wmnet with OS bullseye
15:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['puppetmaster1006']
15:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['puppetmaster1006']
15:09 moritzm: installing c-ares security updates on bullseye
14:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
14:42 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
14:41 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
14:41 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
14:41 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
14:36 moritzm: installing libwep security updates on buster
14:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudswift1002.eqiad.wmnet with OS bullseye
14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetmaster1006.mgmt.eqiad.wmnet with reboot policy FORCED
14:19 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:19 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
14:17 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
14:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
14:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2014.codfw.wmnet with OS bullseye
14:14 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:13 XioNoX: cloudsw2-c8-eqiad> request system zeroize - T338459
14:13 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:11 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
14:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
14:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
14:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
14:09 XioNoX: decom cloudsw2-c8-eqiad - T338459
14:08 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
14:07 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
14:07 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
14:07 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:06 cmooney@cumin1001: START - Cookbook sre.dns.netbox
14:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
14:02 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
14:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
14:00 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
13:59 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:58 ladsgroup@deploy1002: Finished scap: Backport for Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153) (duration: 09m 13s)
13:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
13:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
13:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
13:52 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
13:51 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:51 ladsgroup@deploy1002: ladsgroup: Backport for Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
13:51 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
13:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:49 ladsgroup@deploy1002: Started scap: Backport for Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153)
13:49 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host puppetmaster1006.mgmt.eqiad.wmnet with reboot policy FORCED
13:48 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
13:44 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:44 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
13:43 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
13:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
13:43 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
13:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:40 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
13:39 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
13:36 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
13:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
13:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:29 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:06 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
13:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
13:05 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
13:05 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
12:57 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
12:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
12:36 cmooney@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937 (duration: 17m 22s)
12:19 topranks: De-pooling lvs1017 to move link to lsw1-e1-eqiad to ssw1-e1-eqiad T322937
12:18 cmooney@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937
12:12 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:11 cmooney@cumin1001: START - Cookbook sre.dns.netbox
12:03 vgutierrez: restore cp4052 HAProxy configuration - T317799
11:51 vgutierrez: repooling cp4052 - T317799
11:40 vgutierrez: depooling cp4052 for some HAProxy tests - T317799
11:28 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=nlwiki --iconv DB cluster26 (T128154)
11:03 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=dawiki --iconv DB cluster27 (T128153)
10:49 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=svwiki --iconv DB cluster27 (T128153)
10:22 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:21 jiji@cumin1001: START - Cookbook sre.dns.netbox
09:58 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@bb7526e]: (no justification provided) (duration: 00m 08s)
09:57 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@bb7526e]: (no justification provided)
09:40 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver2001.codfw.wmnet with OS bookworm
09:40 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin2002"
09:24 vgutierrez: updated to HAProxy 2.7.9 on cp4052 and cp5032
09:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5032.eqsin.wmnet,cp4052.ulsfo.wmnet} and A:cp
09:19 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
09:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
09:17 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5032.eqsin.wmnet,cp4052.ulsfo.wmnet} and A:cp
09:10 vgutierrez: fetch HAProxy 2.7.9 for thirdparty/haproxy27 bullseye (apt.wm.o)
08:54 apergos: UTC morning backport and config training window done
08:38 ariel@deploy1002: Finished scap: Backport for [ruwiki] Add an editautoreviewprotected level protecion (T337430) (duration: 08m 25s)
08:31 ariel@deploy1002: ariel and superpes: Backport for [ruwiki] Add an editautoreviewprotected level protecion (T337430) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
08:30 ariel@deploy1002: Started scap: Backport for [ruwiki] Add an editautoreviewprotected level protecion (T337430)
08:25 ariel@deploy1002: Finished scap: Backport for [fiwiki] Limitate the use of the ContentTranslation tool (T337412) (duration: 09m 16s)
08:17 ariel@deploy1002: superpes and ariel: Backport for [fiwiki] Limitate the use of the ContentTranslation tool (T337412) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
08:16 ariel@deploy1002: Started scap: Backport for [fiwiki] Limitate the use of the ContentTranslation tool (T337412)
08:12 ariel@deploy1002: Finished scap: Backport for [itwiktionary] Add a tagline (T337688) (duration: 08m 07s)
08:06 ariel@deploy1002: ariel and superpes: Backport for [itwiktionary] Add a tagline (T337688) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
08:04 ariel@deploy1002: Started scap: Backport for [itwiktionary] Add a tagline (T337688)
07:49 ariel@deploy1002: Finished scap: Backport for [kaawiki] Change the logo with an HD version and the tagline (T337641) (duration: 09m 09s)
07:41 ariel@deploy1002: ariel and superpes: Backport for [kaawiki] Change the logo with an HD version and the tagline (T337641) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
07:40 ariel@deploy1002: Started scap: Backport for [kaawiki] Change the logo with an HD version and the tagline (T337641)
07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T336886)', diff saved to https://phabricator.wikimedia.org/P49271 and previous config saved to /var/cache/conftool/dbconfig/20230608-073524-ladsgroup.json
07:27 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337834) (duration: 09m 19s)
07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P49270 and previous config saved to /var/cache/conftool/dbconfig/20230608-072018-ladsgroup.json
07:19 kartik@deploy1002: kartik: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337834) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
07:17 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337834)
07:14 elukey: delete pod kask-production-7dfdfc7cbc-2vw5q in wikikube codfw, since it was scheduled on a non dedicated node
07:14 kartik@deploy1002: Finished scap: Backport for Enable Content and Section Translation for 9 Wikipedia (T337290) (duration: 09m 52s)
07:06 kartik@deploy1002: kartik: Backport for Enable Content and Section Translation for 9 Wikipedia (T337290) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P49268 and previous config saved to /var/cache/conftool/dbconfig/20230608-070512-ladsgroup.json
07:04 kartik@deploy1002: Started scap: Backport for Enable Content and Section Translation for 9 Wikipedia (T337290)
06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T336886)', diff saved to https://phabricator.wikimedia.org/P49267 and previous config saved to /var/cache/conftool/dbconfig/20230608-065006-ladsgroup.json
06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T336886)', diff saved to https://phabricator.wikimedia.org/P49266 and previous config saved to /var/cache/conftool/dbconfig/20230608-064508-ladsgroup.json
06:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
06:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
06:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T336886)', diff saved to https://phabricator.wikimedia.org/P49265 and previous config saved to /var/cache/conftool/dbconfig/20230608-064447-ladsgroup.json
06:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P49264 and previous config saved to /var/cache/conftool/dbconfig/20230608-062941-ladsgroup.json
06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P49263 and previous config saved to /var/cache/conftool/dbconfig/20230608-061435-ladsgroup.json
06:10 elukey: kill remaining processes for `andyrussg` on stat100x nodes to unblock puppet
05:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T336886)', diff saved to https://phabricator.wikimedia.org/P49262 and previous config saved to /var/cache/conftool/dbconfig/20230608-055929-ladsgroup.json
05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T336886)', diff saved to https://phabricator.wikimedia.org/P49261 and previous config saved to /var/cache/conftool/dbconfig/20230608-055432-ladsgroup.json
05:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
05:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T336886)', diff saved to https://phabricator.wikimedia.org/P49260 and previous config saved to /var/cache/conftool/dbconfig/20230608-055411-ladsgroup.json
05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P49259 and previous config saved to /var/cache/conftool/dbconfig/20230608-053904-ladsgroup.json
05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P49258 and previous config saved to /var/cache/conftool/dbconfig/20230608-052358-ladsgroup.json
05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T336886)', diff saved to https://phabricator.wikimedia.org/P49257 and previous config saved to /var/cache/conftool/dbconfig/20230608-050852-ladsgroup.json
05:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T336886)', diff saved to https://phabricator.wikimedia.org/P49256 and previous config saved to /var/cache/conftool/dbconfig/20230608-050353-ladsgroup.json
05:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
05:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
05:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
05:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
05:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49255 and previous config saved to /var/cache/conftool/dbconfig/20230608-050328-ladsgroup.json
04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P49254 and previous config saved to /var/cache/conftool/dbconfig/20230608-044821-ladsgroup.json
04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P49253 and previous config saved to /var/cache/conftool/dbconfig/20230608-043315-ladsgroup.json
04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49252 and previous config saved to /var/cache/conftool/dbconfig/20230608-041809-ladsgroup.json
04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49251 and previous config saved to /var/cache/conftool/dbconfig/20230608-041311-ladsgroup.json
04:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
04:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
04:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
04:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49250 and previous config saved to /var/cache/conftool/dbconfig/20230608-040935-ladsgroup.json
03:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P49249 and previous config saved to /var/cache/conftool/dbconfig/20230608-035428-ladsgroup.json
03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P49248 and previous config saved to /var/cache/conftool/dbconfig/20230608-033922-ladsgroup.json
03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49247 and previous config saved to /var/cache/conftool/dbconfig/20230608-032416-ladsgroup.json
03:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49246 and previous config saved to /var/cache/conftool/dbconfig/20230608-031911-ladsgroup.json
03:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
03:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
03:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49245 and previous config saved to /var/cache/conftool/dbconfig/20230608-031901-ladsgroup.json
03:11 eileen: civicrm upgraded from 066095b8 to 6f64e77d
03:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P49244 and previous config saved to /var/cache/conftool/dbconfig/20230608-030355-ladsgroup.json
02:54 samtar@deploy1002: Finished scap: Backport for Remove additional v1 suffix when computing internalRestbaseURL (T334842 T338381) (duration: 09m 50s)
02:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P49243 and previous config saved to /var/cache/conftool/dbconfig/20230608-024849-ladsgroup.json
02:46 samtar@deploy1002: samtar: Backport for Remove additional v1 suffix when computing internalRestbaseURL (T334842 T338381) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
02:44 samtar@deploy1002: Started scap: Backport for Remove additional v1 suffix when computing internalRestbaseURL (T334842 T338381)
02:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49242 and previous config saved to /var/cache/conftool/dbconfig/20230608-023343-ladsgroup.json
02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49241 and previous config saved to /var/cache/conftool/dbconfig/20230608-022842-ladsgroup.json
02:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
02:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T336886)', diff saved to https://phabricator.wikimedia.org/P49240 and previous config saved to /var/cache/conftool/dbconfig/20230608-022821-ladsgroup.json
02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P49239 and previous config saved to /var/cache/conftool/dbconfig/20230608-021315-ladsgroup.json
01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P49238 and previous config saved to /var/cache/conftool/dbconfig/20230608-015809-ladsgroup.json
01:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T336886)', diff saved to https://phabricator.wikimedia.org/P49237 and previous config saved to /var/cache/conftool/dbconfig/20230608-014303-ladsgroup.json
01:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T336886)', diff saved to https://phabricator.wikimedia.org/P49236 and previous config saved to /var/cache/conftool/dbconfig/20230608-013808-ladsgroup.json
01:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
01:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T336886)', diff saved to https://phabricator.wikimedia.org/P49235 and previous config saved to /var/cache/conftool/dbconfig/20230608-013736-ladsgroup.json
01:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
01:23 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P49234 and previous config saved to /var/cache/conftool/dbconfig/20230608-012230-ladsgroup.json
01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T336886)', diff saved to https://phabricator.wikimedia.org/P49233 and previous config saved to /var/cache/conftool/dbconfig/20230608-010853-ladsgroup.json
01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P49232 and previous config saved to /var/cache/conftool/dbconfig/20230608-010724-ladsgroup.json
00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P49231 and previous config saved to /var/cache/conftool/dbconfig/20230608-005347-ladsgroup.json
00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T336886)', diff saved to https://phabricator.wikimedia.org/P49230 and previous config saved to /var/cache/conftool/dbconfig/20230608-005218-ladsgroup.json
00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T336886)', diff saved to https://phabricator.wikimedia.org/P49229 and previous config saved to /var/cache/conftool/dbconfig/20230608-004713-ladsgroup.json
00:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
00:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T336886)', diff saved to https://phabricator.wikimedia.org/P49228 and previous config saved to /var/cache/conftool/dbconfig/20230608-004653-ladsgroup.json
00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P49227 and previous config saved to /var/cache/conftool/dbconfig/20230608-003841-ladsgroup.json
00:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P49226 and previous config saved to /var/cache/conftool/dbconfig/20230608-003146-ladsgroup.json
00:28 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
00:28 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-cluster
00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T336886)', diff saved to https://phabricator.wikimedia.org/P49225 and previous config saved to /var/cache/conftool/dbconfig/20230608-002335-ladsgroup.json
00:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P49224 and previous config saved to /var/cache/conftool/dbconfig/20230608-001640-ladsgroup.json
00:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T336886)', diff saved to https://phabricator.wikimedia.org/P49223 and previous config saved to /var/cache/conftool/dbconfig/20230608-001555-ladsgroup.json
00:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
00:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
00:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49222 and previous config saved to /var/cache/conftool/dbconfig/20230608-001534-ladsgroup.json
00:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T336886)', diff saved to https://phabricator.wikimedia.org/P49221 and previous config saved to /var/cache/conftool/dbconfig/20230608-000134-ladsgroup.json
00:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P49220 and previous config saved to /var/cache/conftool/dbconfig/20230608-000028-ladsgroup.json

2023-06-07

23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T336886)', diff saved to https://phabricator.wikimedia.org/P49219 and previous config saved to /var/cache/conftool/dbconfig/20230607-235624-ladsgroup.json
23:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
23:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T336886)', diff saved to https://phabricator.wikimedia.org/P49218 and previous config saved to /var/cache/conftool/dbconfig/20230607-235603-ladsgroup.json
23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P49217 and previous config saved to /var/cache/conftool/dbconfig/20230607-234522-ladsgroup.json
23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P49216 and previous config saved to /var/cache/conftool/dbconfig/20230607-234057-ladsgroup.json
23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49215 and previous config saved to /var/cache/conftool/dbconfig/20230607-233016-ladsgroup.json
23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P49214 and previous config saved to /var/cache/conftool/dbconfig/20230607-232551-ladsgroup.json
23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49213 and previous config saved to /var/cache/conftool/dbconfig/20230607-232223-ladsgroup.json
23:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
23:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49212 and previous config saved to /var/cache/conftool/dbconfig/20230607-232203-ladsgroup.json
23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T336886)', diff saved to https://phabricator.wikimedia.org/P49211 and previous config saved to /var/cache/conftool/dbconfig/20230607-231045-ladsgroup.json
23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P49210 and previous config saved to /var/cache/conftool/dbconfig/20230607-230657-ladsgroup.json
23:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T336886)', diff saved to https://phabricator.wikimedia.org/P49209 and previous config saved to /var/cache/conftool/dbconfig/20230607-230540-ladsgroup.json
23:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
23:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
23:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
23:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
22:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
22:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T336886)', diff saved to https://phabricator.wikimedia.org/P49208 and previous config saved to /var/cache/conftool/dbconfig/20230607-225926-ladsgroup.json
22:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P49207 and previous config saved to /var/cache/conftool/dbconfig/20230607-225150-ladsgroup.json
22:45 zabe@deploy1002: Finished scap: T338287 (duration: 07m 30s)
22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P49206 and previous config saved to /var/cache/conftool/dbconfig/20230607-224420-ladsgroup.json
22:38 zabe@deploy1002: Started scap: T338287
22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49205 and previous config saved to /var/cache/conftool/dbconfig/20230607-223644-ladsgroup.json
22:34 zabe@deploy1002: Sync cancelled.
22:34 zabe@deploy1002: zabe: Backport for Use cuc_timestamp as index field when reading old (T338287) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
22:32 zabe@deploy1002: Started scap: Backport for Use cuc_timestamp as index field when reading old (T338287)
22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P49204 and previous config saved to /var/cache/conftool/dbconfig/20230607-222914-ladsgroup.json
22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49203 and previous config saved to /var/cache/conftool/dbconfig/20230607-222905-ladsgroup.json
22:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
22:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49202 and previous config saved to /var/cache/conftool/dbconfig/20230607-222844-ladsgroup.json
22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T336886)', diff saved to https://phabricator.wikimedia.org/P49201 and previous config saved to /var/cache/conftool/dbconfig/20230607-221408-ladsgroup.json
22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P49200 and previous config saved to /var/cache/conftool/dbconfig/20230607-221338-ladsgroup.json
22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T336886)', diff saved to https://phabricator.wikimedia.org/P49199 and previous config saved to /var/cache/conftool/dbconfig/20230607-220859-ladsgroup.json
22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1221.eqiad.wmnet with reason: Maintenance
22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1221.eqiad.wmnet with reason: Maintenance
22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T336886)', diff saved to https://phabricator.wikimedia.org/P49198 and previous config saved to /var/cache/conftool/dbconfig/20230607-220821-ladsgroup.json
22:05 eileen: civicrm upgraded from bcc8fccc to 066095b8
22:05 zabe@deploy1002: Finished scap: Backport for Use cuc_timestamp as index field when reading old (T338287) (duration: 11m 48s)
21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P49197 and previous config saved to /var/cache/conftool/dbconfig/20230607-215831-ladsgroup.json
21:55 zabe@deploy1002: dreamyjazz and zabe: Backport for Use cuc_timestamp as index field when reading old (T338287) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
21:53 zabe@deploy1002: Started scap: Backport for Use cuc_timestamp as index field when reading old (T338287)
21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P49196 and previous config saved to /var/cache/conftool/dbconfig/20230607-215315-ladsgroup.json
21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49195 and previous config saved to /var/cache/conftool/dbconfig/20230607-214325-ladsgroup.json
21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P49194 and previous config saved to /var/cache/conftool/dbconfig/20230607-213809-ladsgroup.json
21:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2012.codfw.wmnet
21:36 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2012.codfw.wmnet
21:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49193 and previous config saved to /var/cache/conftool/dbconfig/20230607-213530-ladsgroup.json
21:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
21:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
21:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T336886)', diff saved to https://phabricator.wikimedia.org/P49192 and previous config saved to /var/cache/conftool/dbconfig/20230607-213509-ladsgroup.json
21:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
21:32 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
21:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs1016.eqiad.wmnet
21:32 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs1016.eqiad.wmnet
21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T336886)', diff saved to https://phabricator.wikimedia.org/P49191 and previous config saved to /var/cache/conftool/dbconfig/20230607-212303-ladsgroup.json
21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P49190 and previous config saved to /var/cache/conftool/dbconfig/20230607-212003-ladsgroup.json
21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T336886)', diff saved to https://phabricator.wikimedia.org/P49189 and previous config saved to /var/cache/conftool/dbconfig/20230607-211807-ladsgroup.json
21:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1199.eqiad.wmnet with reason: Maintenance
21:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1199.eqiad.wmnet with reason: Maintenance
21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T336886)', diff saved to https://phabricator.wikimedia.org/P49188 and previous config saved to /var/cache/conftool/dbconfig/20230607-211746-ladsgroup.json
21:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P49187 and previous config saved to /var/cache/conftool/dbconfig/20230607-210457-ladsgroup.json
21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P49186 and previous config saved to /var/cache/conftool/dbconfig/20230607-210240-ladsgroup.json
20:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T336886)', diff saved to https://phabricator.wikimedia.org/P49185 and previous config saved to /var/cache/conftool/dbconfig/20230607-204951-ladsgroup.json
20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P49184 and previous config saved to /var/cache/conftool/dbconfig/20230607-204734-ladsgroup.json
20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T336886)', diff saved to https://phabricator.wikimedia.org/P49183 and previous config saved to /var/cache/conftool/dbconfig/20230607-204728-ladsgroup.json
20:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
20:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
20:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
20:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T336886)', diff saved to https://phabricator.wikimedia.org/P49182 and previous config saved to /var/cache/conftool/dbconfig/20230607-204652-ladsgroup.json
20:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
20:35 catrope@deploy1002: Finished scap: Backport for Link to translations of CC BY-SA 4.0 where possible (T319064) (duration: 12m 12s)
20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T336886)', diff saved to https://phabricator.wikimedia.org/P49181 and previous config saved to /var/cache/conftool/dbconfig/20230607-203228-ladsgroup.json
20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P49180 and previous config saved to /var/cache/conftool/dbconfig/20230607-203146-ladsgroup.json
20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T336886)', diff saved to https://phabricator.wikimedia.org/P49179 and previous config saved to /var/cache/conftool/dbconfig/20230607-202733-ladsgroup.json
20:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1190.eqiad.wmnet with reason: Maintenance
20:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1190.eqiad.wmnet with reason: Maintenance
20:24 catrope@deploy1002: catrope: Backport for Link to translations of CC BY-SA 4.0 where possible (T319064) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
20:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
20:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T336886)', diff saved to https://phabricator.wikimedia.org/P49178 and previous config saved to /var/cache/conftool/dbconfig/20230607-202408-ladsgroup.json
20:23 catrope@deploy1002: Started scap: Backport for Link to translations of CC BY-SA 4.0 where possible (T319064)
20:18 catrope@deploy1002: Finished scap: Backport for Deploy GDI safety survey to JA and RU wikis. (T337728) (duration: 10m 53s)
20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P49177 and previous config saved to /var/cache/conftool/dbconfig/20230607-201640-ladsgroup.json
20:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: attempting WDQS stack on bullseye
20:15 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: attempting WDQS stack on bullseye
20:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: attempting WDQS stack on bullseye
20:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: attempting WDQS stack on bullseye
20:11 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2012.codfw.wmnet
20:11 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2012.codfw.wmnet
20:09 catrope@deploy1002: catrope and essexigyan: Backport for Deploy GDI safety survey to JA and RU wikis. (T337728) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P49176 and previous config saved to /var/cache/conftool/dbconfig/20230607-200902-ladsgroup.json
20:07 catrope@deploy1002: Started scap: Backport for Deploy GDI safety survey to JA and RU wikis. (T337728)
20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T336886)', diff saved to https://phabricator.wikimedia.org/P49175 and previous config saved to /var/cache/conftool/dbconfig/20230607-200134-ladsgroup.json
19:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P49174 and previous config saved to /var/cache/conftool/dbconfig/20230607-195356-ladsgroup.json
19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T336886)', diff saved to https://phabricator.wikimedia.org/P49173 and previous config saved to /var/cache/conftool/dbconfig/20230607-195316-ladsgroup.json
19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
19:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T336886)', diff saved to https://phabricator.wikimedia.org/P49172 and previous config saved to /var/cache/conftool/dbconfig/20230607-195255-ladsgroup.json
19:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
19:41 taavi: manually created 3 global accounts T338197
19:40 bblack: cp*: disabling puppet temporarily out of an abundance of caution
19:40 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
19:40 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T336886)', diff saved to https://phabricator.wikimedia.org/P49171 and previous config saved to /var/cache/conftool/dbconfig/20230607-193850-ladsgroup.json
19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P49170 and previous config saved to /var/cache/conftool/dbconfig/20230607-193749-ladsgroup.json
19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T336886)', diff saved to https://phabricator.wikimedia.org/P49169 and previous config saved to /var/cache/conftool/dbconfig/20230607-193357-ladsgroup.json
19:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
19:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49168 and previous config saved to /var/cache/conftool/dbconfig/20230607-193326-ladsgroup.json
19:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
19:23 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P49167 and previous config saved to /var/cache/conftool/dbconfig/20230607-192243-ladsgroup.json
19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P49166 and previous config saved to /var/cache/conftool/dbconfig/20230607-191820-ladsgroup.json
19:16 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance
19:11 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in codfw: maintenance
19:11 urandom: (Re)pooling codfw sessionstore — T337426
19:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T336886)', diff saved to https://phabricator.wikimedia.org/P49165 and previous config saved to /var/cache/conftool/dbconfig/20230607-190737-ladsgroup.json
19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T336886)', diff saved to https://phabricator.wikimedia.org/P49164 and previous config saved to /var/cache/conftool/dbconfig/20230607-190514-ladsgroup.json
19:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
19:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P49163 and previous config saved to /var/cache/conftool/dbconfig/20230607-190314-ladsgroup.json
19:02 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
18:59 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy1022.eqiad.wmnet with OS bullseye
18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
18:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
18:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
18:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
18:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49162 and previous config saved to /var/cache/conftool/dbconfig/20230607-184808-ladsgroup.json
18:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
18:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T336886)', diff saved to https://phabricator.wikimedia.org/P49161 and previous config saved to /var/cache/conftool/dbconfig/20230607-184712-ladsgroup.json
18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49160 and previous config saved to /var/cache/conftool/dbconfig/20230607-184411-ladsgroup.json
18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
18:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49159 and previous config saved to /var/cache/conftool/dbconfig/20230607-184351-ladsgroup.json
18:41 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P49158 and previous config saved to /var/cache/conftool/dbconfig/20230607-183206-ladsgroup.json
18:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3052.esams.wmnet
18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P49157 and previous config saved to /var/cache/conftool/dbconfig/20230607-182845-ladsgroup.json
18:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1135.eqiad.wmnet with reason: T338354
18:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1135.eqiad.wmnet with reason: T338354
18:22 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3052.esams.wmnet
18:20 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.12 refs T337526 (duration: 06m 05s)
18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P49156 and previous config saved to /var/cache/conftool/dbconfig/20230607-181700-ladsgroup.json
18:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.12 refs T337526
18:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P49155 and previous config saved to /var/cache/conftool/dbconfig/20230607-181339-ladsgroup.json
18:08 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@d90d5c8]: (no justification provided) (duration: 00m 33s)
18:07 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@d90d5c8]: (no justification provided)
18:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2014.codfw.wmnet with OS bullseye
18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T336886)', diff saved to https://phabricator.wikimedia.org/P49154 and previous config saved to /var/cache/conftool/dbconfig/20230607-180154-ladsgroup.json
17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49153 and previous config saved to /var/cache/conftool/dbconfig/20230607-175833-ladsgroup.json
17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T336886)', diff saved to https://phabricator.wikimedia.org/P49152 and previous config saved to /var/cache/conftool/dbconfig/20230607-175347-ladsgroup.json
17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49151 and previous config saved to /var/cache/conftool/dbconfig/20230607-175337-ladsgroup.json
17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T336886)', diff saved to https://phabricator.wikimedia.org/P49150 and previous config saved to /var/cache/conftool/dbconfig/20230607-175327-ladsgroup.json
17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49149 and previous config saved to /var/cache/conftool/dbconfig/20230607-175316-ladsgroup.json
17:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=ats-be
17:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=cdn
17:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=ats-be
17:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=cdn
17:46 inflatador: bking@wdqs depool wdqs2012 T321605
17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3051.esams.wmnet
17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P49148 and previous config saved to /var/cache/conftool/dbconfig/20230607-173821-ladsgroup.json
17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P49147 and previous config saved to /var/cache/conftool/dbconfig/20230607-173810-ladsgroup.json
17:34 cwhite@cumin2002: dbctl commit (dc=all): 'depool db1135', diff saved to https://phabricator.wikimedia.org/P49146 and previous config saved to /var/cache/conftool/dbconfig/20230607-173453-cwhite.json
17:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3051.esams.wmnet
17:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P49145 and previous config saved to /var/cache/conftool/dbconfig/20230607-172315-ladsgroup.json
17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P49144 and previous config saved to /var/cache/conftool/dbconfig/20230607-172304-ladsgroup.json
17:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
17:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
17:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
17:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
17:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
17:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T336886)', diff saved to https://phabricator.wikimedia.org/P49143 and previous config saved to /var/cache/conftool/dbconfig/20230607-170808-ladsgroup.json
17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49142 and previous config saved to /var/cache/conftool/dbconfig/20230607-170758-ladsgroup.json
17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T336886)', diff saved to https://phabricator.wikimedia.org/P49141 and previous config saved to /var/cache/conftool/dbconfig/20230607-170551-ladsgroup.json
17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T336886)', diff saved to https://phabricator.wikimedia.org/P49140 and previous config saved to /var/cache/conftool/dbconfig/20230607-170530-ladsgroup.json
17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49139 and previous config saved to /var/cache/conftool/dbconfig/20230607-170252-ladsgroup.json
17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
17:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49138 and previous config saved to /var/cache/conftool/dbconfig/20230607-165934-ladsgroup.json
16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
16:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
16:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
16:52 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
16:52 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
16:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P49137 and previous config saved to /var/cache/conftool/dbconfig/20230607-165024-ladsgroup.json
16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P49135 and previous config saved to /var/cache/conftool/dbconfig/20230607-164428-ladsgroup.json
16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P49134 and previous config saved to /var/cache/conftool/dbconfig/20230607-163518-ladsgroup.json
16:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P49133 and previous config saved to /var/cache/conftool/dbconfig/20230607-162922-ladsgroup.json
16:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
16:29 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
16:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3050.esams.wmnet
16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
16:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
16:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2014']
16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T336886)', diff saved to https://phabricator.wikimedia.org/P49132 and previous config saved to /var/cache/conftool/dbconfig/20230607-162012-ladsgroup.json
16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T336886)', diff saved to https://phabricator.wikimedia.org/P49131 and previous config saved to /var/cache/conftool/dbconfig/20230607-161800-ladsgroup.json
16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
16:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T336886)', diff saved to https://phabricator.wikimedia.org/P49130 and previous config saved to /var/cache/conftool/dbconfig/20230607-161740-ladsgroup.json
16:15 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3050.esams.wmnet
16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49129 and previous config saved to /var/cache/conftool/dbconfig/20230607-161416-ladsgroup.json
16:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
16:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
16:12 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
16:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
16:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49128 and previous config saved to /var/cache/conftool/dbconfig/20230607-160912-ladsgroup.json
16:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
16:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T336886)', diff saved to https://phabricator.wikimedia.org/P49127 and previous config saved to /var/cache/conftool/dbconfig/20230607-160851-ladsgroup.json
16:07 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
16:04 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin2002"
16:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs2014.mgmt.codfw.wmnet with reboot policy FORCED
16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P49126 and previous config saved to /var/cache/conftool/dbconfig/20230607-160234-ladsgroup.json
16:00 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host lists1003.wikimedia.org
15:57 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
15:56 urandom: Beginning (3 hour) generated traffic testing of sessionstore.svc.codfw.wmnet — T337426
15:56 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P49125 and previous config saved to /var/cache/conftool/dbconfig/20230607-155345-ladsgroup.json
15:52 urandom: Upgrading Cassandra to 4.1.1, sessionstore2003 — T337426
15:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1003.wikimedia.org
15:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P49124 and previous config saved to /var/cache/conftool/dbconfig/20230607-154727-ladsgroup.json
15:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
15:44 urandom: Upgrading Cassandra to 4.1.1, sessionstore2002 — T337426
15:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs2014.mgmt.codfw.wmnet with reboot policy FORCED
15:42 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:42 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for lvs2014 - pt1979@cumin2002"
15:41 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for lvs2014 - pt1979@cumin2002"
15:40 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver2001.codfw.wmnet with reason: host reimage
15:39 moritzm: installing isc-dhcp bugfixes updates from Bullseye 11.7 point release
15:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P49123 and previous config saved to /var/cache/conftool/dbconfig/20230607-153839-ladsgroup.json
15:37 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver2001.codfw.wmnet with reason: host reimage
15:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:33 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
15:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T336886)', diff saved to https://phabricator.wikimedia.org/P49122 and previous config saved to /var/cache/conftool/dbconfig/20230607-153221-ladsgroup.json
15:26 moritzm: rolling restart of FPM on mw canaries to pick up libwebp security updates
15:26 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
15:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T336886)', diff saved to https://phabricator.wikimedia.org/P49121 and previous config saved to /var/cache/conftool/dbconfig/20230607-152456-ladsgroup.json
15:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
15:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49120 and previous config saved to /var/cache/conftool/dbconfig/20230607-152425-ladsgroup.json
15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T336886)', diff saved to https://phabricator.wikimedia.org/P49119 and previous config saved to /var/cache/conftool/dbconfig/20230607-152333-ladsgroup.json
15:23 elukey: all varnishkafka instances on caching nodes are getting restarted due to https://gerrit.wikimedia.org/r/c/operations/puppet/+/928087 - T337825
15:22 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
15:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
15:22 elukey: re-enable puppet on caching nodes
15:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
15:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
15:21 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
15:21 claime: Bumping prewarmparsoid concurrency to 45 in changeprop-jobqueue - T320534
15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T336886)', diff saved to https://phabricator.wikimedia.org/P49118 and previous config saved to /var/cache/conftool/dbconfig/20230607-151835-ladsgroup.json
15:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1143.eqiad.wmnet with reason: Maintenance
15:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1143.eqiad.wmnet with reason: Maintenance
15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T336886)', diff saved to https://phabricator.wikimedia.org/P49117 and previous config saved to /var/cache/conftool/dbconfig/20230607-151815-ladsgroup.json
15:17 moritzm: installing libwebp security updates on buster
15:17 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetserver2001.codfw.wmnet with OS bookworm
15:17 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetserver2001.codfw.wmnet with OS bookworm
15:14 urandom: Upgrading Cassandra to 4.1.1, sessionstore2001 — T337426
15:14 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:10 elukey: disable puppet on all caching nodes to rollout a varnishakfka change (ref: https://gerrit.wikimedia.org/r/c/operations/puppet/+/928087)
15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P49116 and previous config saved to /var/cache/conftool/dbconfig/20230607-150919-ladsgroup.json
15:08 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetserver2001.codfw.wmnet with OS bookworm
15:07 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in codfw: maintenance
15:06 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetserver2001.mgmt.codfw.wmnet on all recursors
15:06 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetserver2001.mgmt.codfw.wmnet on all recursors
15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P49115 and previous config saved to /var/cache/conftool/dbconfig/20230607-150309-ladsgroup.json
15:02 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: maintenance
15:02 urandom: de-pooling sessionstore/codfw — T337426
14:56 sukhe: homer "cr*-codfw*" commit "Gerrit: 928068 remove decommissioned host lvs2010"
14:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver1001.eqiad.wmnet with OS bookworm
14:54 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P49114 and previous config saved to /var/cache/conftool/dbconfig/20230607-145413-ladsgroup.json
14:54 moritzm: installing postgresql 13 security updates (clients/libs, server instances all updated already)
14:53 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
14:51 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:50 jbond@cumin2002: START - Cookbook sre.dns.netbox
14:49 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs2010.codfw.wmnet
14:49 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:48 sukhe@cumin2002: START - Cookbook sre.dns.netbox
14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P49112 and previous config saved to /var/cache/conftool/dbconfig/20230607-144803-ladsgroup.json
14:43 jbond@cumin2002: START - Cookbook sre.dns.netbox
14:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver1001.eqiad.wmnet with reason: host reimage
14:40 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-upload_eqiad and A:cp
14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49111 and previous config saved to /var/cache/conftool/dbconfig/20230607-143907-ladsgroup.json
14:39 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs2010.codfw.wmnet
14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver1001.eqiad.wmnet with reason: host reimage
14:36 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
14:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
14:33 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-text_eqiad and A:cp
14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T336886)', diff saved to https://phabricator.wikimedia.org/P49110 and previous config saved to /var/cache/conftool/dbconfig/20230607-143256-ladsgroup.json
14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49109 and previous config saved to /var/cache/conftool/dbconfig/20230607-143235-ladsgroup.json
14:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
14:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
14:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T336886)', diff saved to https://phabricator.wikimedia.org/P49108 and previous config saved to /var/cache/conftool/dbconfig/20230607-143215-ladsgroup.json
14:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T336886)', diff saved to https://phabricator.wikimedia.org/P49107 and previous config saved to /var/cache/conftool/dbconfig/20230607-142756-ladsgroup.json
14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T336886)', diff saved to https://phabricator.wikimedia.org/P49106 and previous config saved to /var/cache/conftool/dbconfig/20230607-142736-ladsgroup.json
14:26 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
14:25 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1001.eqiad.wmnet with OS bookworm
14:24 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetserver1001.eqiad.wmnet with OS bookworm
14:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P49104 and previous config saved to /var/cache/conftool/dbconfig/20230607-141709-ladsgroup.json
14:17 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet2006-dev
14:16 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2006-dev
14:14 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet2005-dev
14:14 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2005-dev
14:14 aborrero@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudnet2006-dev
14:13 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2006-dev
14:13 aborrero@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudnet2005-dev
14:13 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2005-dev
14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P49103 and previous config saved to /var/cache/conftool/dbconfig/20230607-141230-ladsgroup.json
14:10 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable 'multi-line' mode in preg_match() for wikitextToHTML regex (T338264) (duration: 09m 16s)
14:05 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1001.eqiad.wmnet with OS bookworm
14:03 lucaswerkmeister-wmde@deploy1002: d3r1ck01 and lucaswerkmeister-wmde: Backport for Enable 'multi-line' mode in preg_match() for wikitextToHTML regex (T338264) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P49102 and previous config saved to /var/cache/conftool/dbconfig/20230607-140203-ladsgroup.json
14:01 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable 'multi-line' mode in preg_match() for wikitextToHTML regex (T338264)
13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P49101 and previous config saved to /var/cache/conftool/dbconfig/20230607-135724-ladsgroup.json
13:47 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable cache warming jobs for parsoid per default. (T329366) (duration: 10m 27s)
13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T336886)', diff saved to https://phabricator.wikimedia.org/P49100 and previous config saved to /var/cache/conftool/dbconfig/20230607-134656-ladsgroup.json
13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T336886)', diff saved to https://phabricator.wikimedia.org/P49099 and previous config saved to /var/cache/conftool/dbconfig/20230607-134218-ladsgroup.json
13:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1027.eqiad.wmnet']
13:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027.eqiad.wmnet']
13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T336886)', diff saved to https://phabricator.wikimedia.org/P49098 and previous config saved to /var/cache/conftool/dbconfig/20230607-133933-ladsgroup.json
13:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1027.eqiad.wmnet']
13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
13:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49097 and previous config saved to /var/cache/conftool/dbconfig/20230607-133854-ladsgroup.json
13:38 lucaswerkmeister-wmde@deploy1002: daniel and lucaswerkmeister-wmde: Backport for Enable cache warming jobs for parsoid per default. (T329366) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
13:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027.eqiad.wmnet']
13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T336886)', diff saved to https://phabricator.wikimedia.org/P49096 and previous config saved to /var/cache/conftool/dbconfig/20230607-133725-ladsgroup.json
13:37 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable cache warming jobs for parsoid per default. (T329366)
13:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
13:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49095 and previous config saved to /var/cache/conftool/dbconfig/20230607-133704-ladsgroup.json
13:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
13:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
13:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
13:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P49093 and previous config saved to /var/cache/conftool/dbconfig/20230607-132348-ladsgroup.json
13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P49092 and previous config saved to /var/cache/conftool/dbconfig/20230607-132158-ladsgroup.json
13:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
13:20 topranks: removing remote vlan configuration from lsw1-f1-eqiad T322937
13:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
13:10 ladsgroup@deploy1002: Finished scap: Backport for Revert "Revert "Remove legacy encoding option from dawiktionary"" (duration: 07m 11s)
13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P49090 and previous config saved to /var/cache/conftool/dbconfig/20230607-130841-ladsgroup.json
13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P49089 and previous config saved to /var/cache/conftool/dbconfig/20230607-130651-ladsgroup.json
13:04 ladsgroup@deploy1002: ladsgroup: Backport for Revert "Revert "Remove legacy encoding option from dawiktionary"" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
13:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
13:03 ladsgroup@deploy1002: Started scap: Backport for Revert "Revert "Remove legacy encoding option from dawiktionary""
13:02 cmooney@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937 (duration: 11m 45s)
12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49088 and previous config saved to /var/cache/conftool/dbconfig/20230607-125335-ladsgroup.json
12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49087 and previous config saved to /var/cache/conftool/dbconfig/20230607-125145-ladsgroup.json
12:51 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetserver1001.eqiad.wmnet with OS bookworm
12:50 topranks: Depooling lvs1019 to move link from lsw1-f1-eqiad to ssw1-f1-eqiad
12:50 cmooney@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937
12:46 Amir1: mwscript maintenance/storage/moveToExternal.php --iconv DB cluster27 on dawiktionary and svwiktionary (T128155 and T128156)
12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49086 and previous config saved to /var/cache/conftool/dbconfig/20230607-124543-ladsgroup.json
12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
12:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T336886)', diff saved to https://phabricator.wikimedia.org/P49085 and previous config saved to /var/cache/conftool/dbconfig/20230607-123926-ladsgroup.json
12:37 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:37 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet - aborrero@cumin2002"
12:36 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet - aborrero@cumin2002"
12:33 aborrero@cumin2002: START - Cookbook sre.dns.netbox
12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T336886)', diff saved to https://phabricator.wikimedia.org/P49084 and previous config saved to /var/cache/conftool/dbconfig/20230607-123002-ladsgroup.json
12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P49083 and previous config saved to /var/cache/conftool/dbconfig/20230607-122420-ladsgroup.json
12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P49082 and previous config saved to /var/cache/conftool/dbconfig/20230607-121456-ladsgroup.json
12:13 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1001.eqiad.wmnet with OS bookworm
12:12 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetserver1001.eqiad.wmnet on all recursors
12:12 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetserver1001.eqiad.wmnet on all recursors
12:11 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetserver.eqiad.wmnet on all recursors
12:11 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetserver.eqiad.wmnet on all recursors
12:11 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:10 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster1005 -> puppetserver1001 - jbond@cumin1001"
12:09 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster1005 -> puppetserver1001 - jbond@cumin1001"
12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P49081 and previous config saved to /var/cache/conftool/dbconfig/20230607-120914-ladsgroup.json
12:07 jbond@cumin1001: START - Cookbook sre.dns.netbox
12:07 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver1001
12:06 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver1001
12:06 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2001
12:04 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2001
11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P49080 and previous config saved to /var/cache/conftool/dbconfig/20230607-115950-ladsgroup.json
11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T336886)', diff saved to https://phabricator.wikimedia.org/P49079 and previous config saved to /var/cache/conftool/dbconfig/20230607-115408-ladsgroup.json
11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T336886)', diff saved to https://phabricator.wikimedia.org/P49078 and previous config saved to /var/cache/conftool/dbconfig/20230607-115156-ladsgroup.json
11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49077 and previous config saved to /var/cache/conftool/dbconfig/20230607-115124-ladsgroup.json
11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
11:48 jbond@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host puppetserver2001
11:46 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2001
11:46 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:46 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster1005 -> puppetserver1001 - jbond@cumin1001"
11:45 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster1005 -> puppetserver1001 - jbond@cumin1001"
11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T336886)', diff saved to https://phabricator.wikimedia.org/P49076 and previous config saved to /var/cache/conftool/dbconfig/20230607-114444-ladsgroup.json
11:44 jbond@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host puppetserver1001
11:43 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver1001
11:43 jbond@cumin1001: START - Cookbook sre.dns.netbox
11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T336886)', diff saved to https://phabricator.wikimedia.org/P49075 and previous config saved to /var/cache/conftool/dbconfig/20230607-114120-ladsgroup.json
11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
11:41 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49074 and previous config saved to /var/cache/conftool/dbconfig/20230607-114059-ladsgroup.json
11:40 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
11:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:35 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:30 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster2005
11:30 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster1005
11:30 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:30 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1005 decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
11:29 jbond@cumin2002: START - Cookbook sre.dns.netbox
11:27 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1005 decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P49073 and previous config saved to /var/cache/conftool/dbconfig/20230607-112553-ladsgroup.json
11:24 jbond@cumin1001: START - Cookbook sre.dns.netbox
11:24 jbond@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetmaster2005
11:23 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts puppetmaster1005
11:22 jbond@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetmaster1005
11:17 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetmaster1005
11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P49072 and previous config saved to /var/cache/conftool/dbconfig/20230607-111047-ladsgroup.json
10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49071 and previous config saved to /var/cache/conftool/dbconfig/20230607-105541-ladsgroup.json
10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49070 and previous config saved to /var/cache/conftool/dbconfig/20230607-105215-ladsgroup.json
10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49069 and previous config saved to /var/cache/conftool/dbconfig/20230607-105154-ladsgroup.json
10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P49068 and previous config saved to /var/cache/conftool/dbconfig/20230607-103648-ladsgroup.json
10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P49066 and previous config saved to /var/cache/conftool/dbconfig/20230607-102141-ladsgroup.json
10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49065 and previous config saved to /var/cache/conftool/dbconfig/20230607-100635-ladsgroup.json
10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49064 and previous config saved to /var/cache/conftool/dbconfig/20230607-100307-ladsgroup.json
10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T336886)', diff saved to https://phabricator.wikimedia.org/P49063 and previous config saved to /var/cache/conftool/dbconfig/20230607-100247-ladsgroup.json
09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P49062 and previous config saved to /var/cache/conftool/dbconfig/20230607-094740-ladsgroup.json
09:33 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir
09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P49061 and previous config saved to /var/cache/conftool/dbconfig/20230607-093234-ladsgroup.json
09:21 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T336886)', diff saved to https://phabricator.wikimedia.org/P49060 and previous config saved to /var/cache/conftool/dbconfig/20230607-091728-ladsgroup.json
09:17 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T336886)', diff saved to https://phabricator.wikimedia.org/P49059 and previous config saved to /var/cache/conftool/dbconfig/20230607-091402-ladsgroup.json
09:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
09:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T336886)', diff saved to https://phabricator.wikimedia.org/P49058 and previous config saved to /var/cache/conftool/dbconfig/20230607-091341-ladsgroup.json
09:07 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
09:06 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
09:00 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
08:59 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-upload_eqiad and A:cp
08:59 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_eqiad and A:cp
08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P49057 and previous config saved to /var/cache/conftool/dbconfig/20230607-085835-ladsgroup.json
08:49 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P49056 and previous config saved to /var/cache/conftool/dbconfig/20230607-084329-ladsgroup.json
08:34 fabfur: disable puppet on A:cp-eqiad for varnish <-> haproxy port 80 swap
08:29 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir
08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T336886)', diff saved to https://phabricator.wikimedia.org/P49055 and previous config saved to /var/cache/conftool/dbconfig/20230607-082823-ladsgroup.json
08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T336886)', diff saved to https://phabricator.wikimedia.org/P49054 and previous config saved to /var/cache/conftool/dbconfig/20230607-082500-ladsgroup.json
08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T336886)', diff saved to https://phabricator.wikimedia.org/P49053 and previous config saved to /var/cache/conftool/dbconfig/20230607-082434-ladsgroup.json
08:22 moritzm: uploaded ruby 2.5.5-3+deb10u5+wmf1 to apt.wikimedia.org, unbreaking Puppet runs after latest Ruby update for Buster T338294
08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P49052 and previous config saved to /var/cache/conftool/dbconfig/20230607-080928-ladsgroup.json
07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P49051 and previous config saved to /var/cache/conftool/dbconfig/20230607-075422-ladsgroup.json
07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T336886)', diff saved to https://phabricator.wikimedia.org/P49050 and previous config saved to /var/cache/conftool/dbconfig/20230607-073916-ladsgroup.json
07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T336886)', diff saved to https://phabricator.wikimedia.org/P49049 and previous config saved to /var/cache/conftool/dbconfig/20230607-073554-ladsgroup.json
07:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
07:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T336886)', diff saved to https://phabricator.wikimedia.org/P49048 and previous config saved to /var/cache/conftool/dbconfig/20230607-073533-ladsgroup.json
07:22 kartik@deploy1002: Finished scap: Backport for Use direct Parsoid in Small and Medium Wikis for Content Translation (T337922) (duration: 18m 06s)
07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P49047 and previous config saved to /var/cache/conftool/dbconfig/20230607-072027-ladsgroup.json
07:06 kartik@deploy1002: kartik: Backport for Use direct Parsoid in Small and Medium Wikis for Content Translation (T337922) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P49046 and previous config saved to /var/cache/conftool/dbconfig/20230607-070521-ladsgroup.json
07:04 kartik@deploy1002: Started scap: Backport for Use direct Parsoid in Small and Medium Wikis for Content Translation (T337922)
06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T336886)', diff saved to https://phabricator.wikimedia.org/P49045 and previous config saved to /var/cache/conftool/dbconfig/20230607-065015-ladsgroup.json
06:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T336886)', diff saved to https://phabricator.wikimedia.org/P49044 and previous config saved to /var/cache/conftool/dbconfig/20230607-064652-ladsgroup.json
06:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
06:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
06:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T336886)', diff saved to https://phabricator.wikimedia.org/P49043 and previous config saved to /var/cache/conftool/dbconfig/20230607-064631-ladsgroup.json
06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T336886)', diff saved to https://phabricator.wikimedia.org/P49042 and previous config saved to /var/cache/conftool/dbconfig/20230607-064215-ladsgroup.json
06:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P49041 and previous config saved to /var/cache/conftool/dbconfig/20230607-063125-ladsgroup.json
06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49040 and previous config saved to /var/cache/conftool/dbconfig/20230607-062709-ladsgroup.json
06:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P49039 and previous config saved to /var/cache/conftool/dbconfig/20230607-061618-ladsgroup.json
06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49038 and previous config saved to /var/cache/conftool/dbconfig/20230607-061203-ladsgroup.json
06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T336886)', diff saved to https://phabricator.wikimedia.org/P49037 and previous config saved to /var/cache/conftool/dbconfig/20230607-060112-ladsgroup.json
05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T336886)', diff saved to https://phabricator.wikimedia.org/P49036 and previous config saved to /var/cache/conftool/dbconfig/20230607-055746-ladsgroup.json
05:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
05:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T336886)', diff saved to https://phabricator.wikimedia.org/P49035 and previous config saved to /var/cache/conftool/dbconfig/20230607-055726-ladsgroup.json
05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T336886)', diff saved to https://phabricator.wikimedia.org/P49034 and previous config saved to /var/cache/conftool/dbconfig/20230607-055655-ladsgroup.json
05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T336886)', diff saved to https://phabricator.wikimedia.org/P49033 and previous config saved to /var/cache/conftool/dbconfig/20230607-055320-ladsgroup.json
05:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
05:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49032 and previous config saved to /var/cache/conftool/dbconfig/20230607-055259-ladsgroup.json
05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P49031 and previous config saved to /var/cache/conftool/dbconfig/20230607-054220-ladsgroup.json
05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49030 and previous config saved to /var/cache/conftool/dbconfig/20230607-053753-ladsgroup.json
05:28 kart_: Updated cxserver to 2023-06-07-044025-production (T337290, T337669, T337834)
05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P49029 and previous config saved to /var/cache/conftool/dbconfig/20230607-052713-ladsgroup.json
05:25 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
05:25 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:22 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49028 and previous config saved to /var/cache/conftool/dbconfig/20230607-052247-ladsgroup.json
05:22 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
05:17 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:17 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T336886)', diff saved to https://phabricator.wikimedia.org/P49027 and previous config saved to /var/cache/conftool/dbconfig/20230607-051207-ladsgroup.json
05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T336886)', diff saved to https://phabricator.wikimedia.org/P49026 and previous config saved to /var/cache/conftool/dbconfig/20230607-050844-ladsgroup.json
05:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
05:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T336886)', diff saved to https://phabricator.wikimedia.org/P49025 and previous config saved to /var/cache/conftool/dbconfig/20230607-050823-ladsgroup.json
05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49024 and previous config saved to /var/cache/conftool/dbconfig/20230607-050740-ladsgroup.json
05:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49023 and previous config saved to /var/cache/conftool/dbconfig/20230607-050258-ladsgroup.json
05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
05:02 kart_: Updated MinT to 2023-06-06-120533-production (T337910, T337686, T337708)
05:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
05:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T336886)', diff saved to https://phabricator.wikimedia.org/P49022 and previous config saved to /var/cache/conftool/dbconfig/20230607-050237-ladsgroup.json
04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P49021 and previous config saved to /var/cache/conftool/dbconfig/20230607-045317-ladsgroup.json
04:51 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
04:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P49020 and previous config saved to /var/cache/conftool/dbconfig/20230607-044731-ladsgroup.json
04:45 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
04:39 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
04:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P49019 and previous config saved to /var/cache/conftool/dbconfig/20230607-043810-ladsgroup.json
04:36 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
04:32 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P49018 and previous config saved to /var/cache/conftool/dbconfig/20230607-043225-ladsgroup.json
04:31 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
04:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T336886)', diff saved to https://phabricator.wikimedia.org/P49017 and previous config saved to /var/cache/conftool/dbconfig/20230607-042304-ladsgroup.json
04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T336886)', diff saved to https://phabricator.wikimedia.org/P49016 and previous config saved to /var/cache/conftool/dbconfig/20230607-042040-ladsgroup.json
04:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
04:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
04:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
04:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T336886)', diff saved to https://phabricator.wikimedia.org/P49015 and previous config saved to /var/cache/conftool/dbconfig/20230607-041719-ladsgroup.json
04:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
04:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
04:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
04:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
04:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T336886)', diff saved to https://phabricator.wikimedia.org/P49014 and previous config saved to /var/cache/conftool/dbconfig/20230607-041357-ladsgroup.json
04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T336886)', diff saved to https://phabricator.wikimedia.org/P49013 and previous config saved to /var/cache/conftool/dbconfig/20230607-041347-ladsgroup.json
04:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
04:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49012 and previous config saved to /var/cache/conftool/dbconfig/20230607-041326-ladsgroup.json
03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P49011 and previous config saved to /var/cache/conftool/dbconfig/20230607-035851-ladsgroup.json
03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P49010 and previous config saved to /var/cache/conftool/dbconfig/20230607-035820-ladsgroup.json
03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P49009 and previous config saved to /var/cache/conftool/dbconfig/20230607-034345-ladsgroup.json
03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P49008 and previous config saved to /var/cache/conftool/dbconfig/20230607-034314-ladsgroup.json
03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T336886)', diff saved to https://phabricator.wikimedia.org/P49007 and previous config saved to /var/cache/conftool/dbconfig/20230607-032839-ladsgroup.json
03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49006 and previous config saved to /var/cache/conftool/dbconfig/20230607-032808-ladsgroup.json
03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1214 (T336886)', diff saved to https://phabricator.wikimedia.org/P49005 and previous config saved to /var/cache/conftool/dbconfig/20230607-032522-ladsgroup.json
03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1214.eqiad.wmnet with reason: Maintenance
03:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1214.eqiad.wmnet with reason: Maintenance
03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T336886)', diff saved to https://phabricator.wikimedia.org/P49004 and previous config saved to /var/cache/conftool/dbconfig/20230607-032501-ladsgroup.json
03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49003 and previous config saved to /var/cache/conftool/dbconfig/20230607-032428-ladsgroup.json
03:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
03:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T336886)', diff saved to https://phabricator.wikimedia.org/P49002 and previous config saved to /var/cache/conftool/dbconfig/20230607-032407-ladsgroup.json
03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P49001 and previous config saved to /var/cache/conftool/dbconfig/20230607-030955-ladsgroup.json
03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P49000 and previous config saved to /var/cache/conftool/dbconfig/20230607-030901-ladsgroup.json
02:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P48999 and previous config saved to /var/cache/conftool/dbconfig/20230607-025449-ladsgroup.json
02:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P48998 and previous config saved to /var/cache/conftool/dbconfig/20230607-025355-ladsgroup.json
02:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T336886)', diff saved to https://phabricator.wikimedia.org/P48997 and previous config saved to /var/cache/conftool/dbconfig/20230607-023943-ladsgroup.json
02:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T336886)', diff saved to https://phabricator.wikimedia.org/P48996 and previous config saved to /var/cache/conftool/dbconfig/20230607-023848-ladsgroup.json
02:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T336886)', diff saved to https://phabricator.wikimedia.org/P48995 and previous config saved to /var/cache/conftool/dbconfig/20230607-023624-ladsgroup.json
02:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
02:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T336886)', diff saved to https://phabricator.wikimedia.org/P48994 and previous config saved to /var/cache/conftool/dbconfig/20230607-023613-ladsgroup.json
02:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
02:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
02:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T336886)', diff saved to https://phabricator.wikimedia.org/P48993 and previous config saved to /var/cache/conftool/dbconfig/20230607-023603-ladsgroup.json
02:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
02:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
02:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T336886)', diff saved to https://phabricator.wikimedia.org/P48992 and previous config saved to /var/cache/conftool/dbconfig/20230607-023537-ladsgroup.json
02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P48991 and previous config saved to /var/cache/conftool/dbconfig/20230607-022057-ladsgroup.json
02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P48990 and previous config saved to /var/cache/conftool/dbconfig/20230607-022031-ladsgroup.json
02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P48989 and previous config saved to /var/cache/conftool/dbconfig/20230607-020550-ladsgroup.json
02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P48988 and previous config saved to /var/cache/conftool/dbconfig/20230607-020518-ladsgroup.json
01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T336886)', diff saved to https://phabricator.wikimedia.org/P48987 and previous config saved to /var/cache/conftool/dbconfig/20230607-015043-ladsgroup.json
01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T336886)', diff saved to https://phabricator.wikimedia.org/P48986 and previous config saved to /var/cache/conftool/dbconfig/20230607-015012-ladsgroup.json
01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T336886)', diff saved to https://phabricator.wikimedia.org/P48985 and previous config saved to /var/cache/conftool/dbconfig/20230607-014635-ladsgroup.json
01:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T336886)', diff saved to https://phabricator.wikimedia.org/P48984 and previous config saved to /var/cache/conftool/dbconfig/20230607-014626-ladsgroup.json
01:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
01:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T336886)', diff saved to https://phabricator.wikimedia.org/P48983 and previous config saved to /var/cache/conftool/dbconfig/20230607-014614-ladsgroup.json
01:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T336886)', diff saved to https://phabricator.wikimedia.org/P48982 and previous config saved to /var/cache/conftool/dbconfig/20230607-014605-ladsgroup.json
01:39 sukhe: run authdns-update: T338280
01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P48981 and previous config saved to /var/cache/conftool/dbconfig/20230607-013108-ladsgroup.json
01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P48980 and previous config saved to /var/cache/conftool/dbconfig/20230607-013059-ladsgroup.json
01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P48979 and previous config saved to /var/cache/conftool/dbconfig/20230607-011602-ladsgroup.json
01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P48978 and previous config saved to /var/cache/conftool/dbconfig/20230607-011553-ladsgroup.json
01:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T336886)', diff saved to https://phabricator.wikimedia.org/P48977 and previous config saved to /var/cache/conftool/dbconfig/20230607-010055-ladsgroup.json
01:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T336886)', diff saved to https://phabricator.wikimedia.org/P48976 and previous config saved to /var/cache/conftool/dbconfig/20230607-010047-ladsgroup.json
00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T336886)', diff saved to https://phabricator.wikimedia.org/P48975 and previous config saved to /var/cache/conftool/dbconfig/20230607-005722-ladsgroup.json
00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T336886)', diff saved to https://phabricator.wikimedia.org/P48974 and previous config saved to /var/cache/conftool/dbconfig/20230607-005713-ladsgroup.json
00:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
00:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
00:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T336886)', diff saved to https://phabricator.wikimedia.org/P48973 and previous config saved to /var/cache/conftool/dbconfig/20230607-005654-ladsgroup.json
00:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
00:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
00:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
00:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
00:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
00:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48972 and previous config saved to /var/cache/conftool/dbconfig/20230607-005155-ladsgroup.json
00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P48971 and previous config saved to /var/cache/conftool/dbconfig/20230607-004148-ladsgroup.json
00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P48970 and previous config saved to /var/cache/conftool/dbconfig/20230607-003649-ladsgroup.json
00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P48969 and previous config saved to /var/cache/conftool/dbconfig/20230607-002642-ladsgroup.json
00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P48968 and previous config saved to /var/cache/conftool/dbconfig/20230607-002143-ladsgroup.json
00:14 urbanecm:: Deployed security patch for T338276
00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T336886)', diff saved to https://phabricator.wikimedia.org/P48967 and previous config saved to /var/cache/conftool/dbconfig/20230607-001136-ladsgroup.json
00:08 urbanecm:: Deployed security patch for T338276
00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T336886)', diff saved to https://phabricator.wikimedia.org/P48966 and previous config saved to /var/cache/conftool/dbconfig/20230607-000814-ladsgroup.json
00:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
00:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
00:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T336886)', diff saved to https://phabricator.wikimedia.org/P48965 and previous config saved to /var/cache/conftool/dbconfig/20230607-000754-ladsgroup.json
00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48964 and previous config saved to /var/cache/conftool/dbconfig/20230607-000637-ladsgroup.json
00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48963 and previous config saved to /var/cache/conftool/dbconfig/20230607-000337-ladsgroup.json
00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T336886)', diff saved to https://phabricator.wikimedia.org/P48962 and previous config saved to /var/cache/conftool/dbconfig/20230607-000316-ladsgroup.json
00:01 urbanecm: Deploying security patch for T338276

2023-06-06

23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P48961 and previous config saved to /var/cache/conftool/dbconfig/20230606-235248-ladsgroup.json
23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P48960 and previous config saved to /var/cache/conftool/dbconfig/20230606-234810-ladsgroup.json
23:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a1-codfw.mgmt.codfw.wmnet
23:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P48959 and previous config saved to /var/cache/conftool/dbconfig/20230606-233742-ladsgroup.json
23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P48958 and previous config saved to /var/cache/conftool/dbconfig/20230606-233304-ladsgroup.json
23:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T336886)', diff saved to https://phabricator.wikimedia.org/P48955 and previous config saved to /var/cache/conftool/dbconfig/20230606-232235-ladsgroup.json
23:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a1-codfw - pt1979@cumin2002"
23:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a1-codfw - pt1979@cumin2002"
23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T336886)', diff saved to https://phabricator.wikimedia.org/P48954 and previous config saved to /var/cache/conftool/dbconfig/20230606-231913-ladsgroup.json
23:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
23:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
23:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T336886)', diff saved to https://phabricator.wikimedia.org/P48953 and previous config saved to /var/cache/conftool/dbconfig/20230606-231853-ladsgroup.json
23:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T336886)', diff saved to https://phabricator.wikimedia.org/P48952 and previous config saved to /var/cache/conftool/dbconfig/20230606-231758-ladsgroup.json
23:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox
23:16 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a1-codfw.mgmt.codfw.wmnet
23:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
23:16 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:16 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - pt1979@cumin2002"
23:15 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - pt1979@cumin2002"
23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T336886)', diff saved to https://phabricator.wikimedia.org/P48951 and previous config saved to /var/cache/conftool/dbconfig/20230606-231408-ladsgroup.json
23:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
23:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
23:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T336886)', diff saved to https://phabricator.wikimedia.org/P48950 and previous config saved to /var/cache/conftool/dbconfig/20230606-231347-ladsgroup.json
23:13 pt1979@cumin2002: START - Cookbook sre.dns.netbox
23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P48949 and previous config saved to /var/cache/conftool/dbconfig/20230606-230347-ladsgroup.json
22:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P48948 and previous config saved to /var/cache/conftool/dbconfig/20230606-225841-ladsgroup.json
22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
22:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
22:48 pt1979@cumin2002: START - Cookbook sre.dns.netbox
22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P48947 and previous config saved to /var/cache/conftool/dbconfig/20230606-224841-ladsgroup.json
22:48 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
22:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P48946 and previous config saved to /var/cache/conftool/dbconfig/20230606-224334-ladsgroup.json
22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T336886)', diff saved to https://phabricator.wikimedia.org/P48945 and previous config saved to /var/cache/conftool/dbconfig/20230606-223335-ladsgroup.json
22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T336886)', diff saved to https://phabricator.wikimedia.org/P48944 and previous config saved to /var/cache/conftool/dbconfig/20230606-223011-ladsgroup.json
22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
22:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48943 and previous config saved to /var/cache/conftool/dbconfig/20230606-222950-ladsgroup.json
22:29 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T336886)', diff saved to https://phabricator.wikimedia.org/P48942 and previous config saved to /var/cache/conftool/dbconfig/20230606-222828-ladsgroup.json
22:27 zabe@deploy1002: Finished scap: Backport for Stop writing to revision_comment_temp everywhere (T299954) (duration: 07m 33s)
22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T336886)', diff saved to https://phabricator.wikimedia.org/P48941 and previous config saved to /var/cache/conftool/dbconfig/20230606-222534-ladsgroup.json
22:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
22:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T336886)', diff saved to https://phabricator.wikimedia.org/P48940 and previous config saved to /var/cache/conftool/dbconfig/20230606-222513-ladsgroup.json
22:21 zabe@deploy1002: zabe: Backport for Stop writing to revision_comment_temp everywhere (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
22:19 zabe@deploy1002: Started scap: Backport for Stop writing to revision_comment_temp everywhere (T299954)
22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P48939 and previous config saved to /var/cache/conftool/dbconfig/20230606-221444-ladsgroup.json
22:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P48938 and previous config saved to /var/cache/conftool/dbconfig/20230606-221007-ladsgroup.json
21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P48937 and previous config saved to /var/cache/conftool/dbconfig/20230606-215938-ladsgroup.json
21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P48936 and previous config saved to /var/cache/conftool/dbconfig/20230606-215501-ladsgroup.json
21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48935 and previous config saved to /var/cache/conftool/dbconfig/20230606-214432-ladsgroup.json
21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48934 and previous config saved to /var/cache/conftool/dbconfig/20230606-214109-ladsgroup.json
21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
21:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
21:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T336886)', diff saved to https://phabricator.wikimedia.org/P48933 and previous config saved to /var/cache/conftool/dbconfig/20230606-214048-ladsgroup.json
21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T336886)', diff saved to https://phabricator.wikimedia.org/P48932 and previous config saved to /var/cache/conftool/dbconfig/20230606-213954-ladsgroup.json
21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T336886)', diff saved to https://phabricator.wikimedia.org/P48931 and previous config saved to /var/cache/conftool/dbconfig/20230606-213702-ladsgroup.json
21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T336886)', diff saved to https://phabricator.wikimedia.org/P48930 and previous config saved to /var/cache/conftool/dbconfig/20230606-213641-ladsgroup.json
21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P48929 and previous config saved to /var/cache/conftool/dbconfig/20230606-212542-ladsgroup.json
21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P48928 and previous config saved to /var/cache/conftool/dbconfig/20230606-212135-ladsgroup.json
21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P48927 and previous config saved to /var/cache/conftool/dbconfig/20230606-211036-ladsgroup.json
21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P48926 and previous config saved to /var/cache/conftool/dbconfig/20230606-210629-ladsgroup.json
21:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
21:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1026.eqiad.wmnet with OS bullseye
20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T336886)', diff saved to https://phabricator.wikimedia.org/P48925 and previous config saved to /var/cache/conftool/dbconfig/20230606-205530-ladsgroup.json
20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T336886)', diff saved to https://phabricator.wikimedia.org/P48924 and previous config saved to /var/cache/conftool/dbconfig/20230606-205206-ladsgroup.json
20:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
20:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
20:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T336886)', diff saved to https://phabricator.wikimedia.org/P48923 and previous config saved to /var/cache/conftool/dbconfig/20230606-205123-ladsgroup.json
20:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
20:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P48922 and previous config saved to /var/cache/conftool/dbconfig/20230606-205002-ladsgroup.json
20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1183 (T336886)', diff saved to https://phabricator.wikimedia.org/P48921 and previous config saved to /var/cache/conftool/dbconfig/20230606-204527-ladsgroup.json
20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T336886)', diff saved to https://phabricator.wikimedia.org/P48920 and previous config saved to /var/cache/conftool/dbconfig/20230606-204506-ladsgroup.json
20:41 urbanecm@deploy1002: Finished scap: Backport for PersonalizedPraiseLogger: Only include mentee_id if not null (T338078), PersonalizedPraiseLogger: Only include mentee_id if not null (T338078) (duration: 07m 23s)
20:35 urbanecm@deploy1002: urbanecm: Backport for PersonalizedPraiseLogger: Only include mentee_id if not null (T338078), PersonalizedPraiseLogger: Only include mentee_id if not null (T338078) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P48919 and previous config saved to /var/cache/conftool/dbconfig/20230606-203456-ladsgroup.json
20:34 urbanecm@deploy1002: Started scap: Backport for PersonalizedPraiseLogger: Only include mentee_id if not null (T338078), PersonalizedPraiseLogger: Only include mentee_id if not null (T338078)
20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P48917 and previous config saved to /var/cache/conftool/dbconfig/20230606-203000-ladsgroup.json
20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P48916 and previous config saved to /var/cache/conftool/dbconfig/20230606-201950-ladsgroup.json
20:16 mutante: miscweb1003, miscweb2003 - rm -rf /srv/org/wikimedia/sitemaps after removing httpd virtual host T338064
20:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P48915 and previous config saved to /var/cache/conftool/dbconfig/20230606-201454-ladsgroup.json
20:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
20:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1026.eqiad.wmnet with OS bullseye
20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P48914 and previous config saved to /var/cache/conftool/dbconfig/20230606-200444-ladsgroup.json
19:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T336886)', diff saved to https://phabricator.wikimedia.org/P48913 and previous config saved to /var/cache/conftool/dbconfig/20230606-195948-ladsgroup.json
19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T336886)', diff saved to https://phabricator.wikimedia.org/P48912 and previous config saved to /var/cache/conftool/dbconfig/20230606-195557-ladsgroup.json
19:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
19:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
19:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
19:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
19:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48911 and previous config saved to /var/cache/conftool/dbconfig/20230606-195320-ladsgroup.json
19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P48910 and previous config saved to /var/cache/conftool/dbconfig/20230606-193814-ladsgroup.json
19:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P48909 and previous config saved to /var/cache/conftool/dbconfig/20230606-192308-ladsgroup.json
19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48908 and previous config saved to /var/cache/conftool/dbconfig/20230606-190802-ladsgroup.json
19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P48907 and previous config saved to /var/cache/conftool/dbconfig/20230606-190420-ladsgroup.json
19:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
19:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
19:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
19:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48906 and previous config saved to /var/cache/conftool/dbconfig/20230606-190402-ladsgroup.json
19:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
19:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
18:10 mutante: disabling https://sitemaps.wikimedia.org - T338064 T332101
18:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.12 refs T337526
18:01 sukhe: cumin 'A:cp-text' 'enable-puppet "CR 926611" && run-puppet-agent -q'
18:01 sukhe: re-enable puppet on A:cp-text and force puppet run: T338064
17:54 sukhe: enable puppet on cp4037 to test CR 926611
17:50 sukhe: disable puppet on A:cp-text to roll out CR 926611
17:39 sukhe: sudo cumin 'P:ntp' 'enable-puppet "testing CR 926598" && run-puppet-agent'
17:27 sukhe: sudo cumin 'P:ntp' 'disable-puppet "testing CR 926598"'
17:05 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
17:04 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
17:04 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
17:01 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
16:51 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
16:41 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
16:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
16:40 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
16:39 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
16:37 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
16:37 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
16:36 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
16:30 sukhe: low-traffic/codfw: set routing-options static route 10.2.1.0/24 next-hop 10.192.32.14
16:27 sukhe: restart pybal on lvs2013 to remove bgp-med override
16:23 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
16:12 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
16:12 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
16:06 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
16:03 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T336886)', diff saved to https://phabricator.wikimedia.org/P48904 and previous config saved to /var/cache/conftool/dbconfig/20230606-160151-ladsgroup.json
15:54 jbond@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
15:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
15:52 jbond@cumin1001: START - Cookbook sre.postgresql.postgres-init
15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P48902 and previous config saved to /var/cache/conftool/dbconfig/20230606-154645-ladsgroup.json
15:46 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
15:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:40 cdanis@deploy1002: Finished scap: Backport for Revert "EventStreamConfig - development.network.probe- disable canary events and hadoop ingestion" (duration: 08m 13s)
15:38 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
15:37 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:35 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:35 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:34 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:34 cdanis@deploy1002: cdanis and otto: Backport for Revert "EventStreamConfig - development.network.probe- disable canary events and hadoop ingestion" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
15:32 zabe: purge wikimaniawiki logos # T337044
15:32 cdanis@deploy1002: Started scap: Backport for Revert "EventStreamConfig - development.network.probe- disable canary events and hadoop ingestion"
15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P48901 and previous config saved to /var/cache/conftool/dbconfig/20230606-153139-ladsgroup.json
15:30 zabe@deploy1002: Finished scap: Backport for Change project logo for Wikimania to Wikimania 2023 version (T337044) (duration: 08m 02s)
15:26 sukhe: homer "cr*-codfw*" commit "Gerrit: 927725 add new LVS host lvs2013" : T326767
15:24 zabe@deploy1002: robertsky and zabe: Backport for Change project logo for Wikimania to Wikimania 2023 version (T337044) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
15:22 zabe@deploy1002: Started scap: Backport for Change project logo for Wikimania to Wikimania 2023 version (T337044)
15:21 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs2013
15:21 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2013
15:20 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
15:19 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
15:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T336886)', diff saved to https://phabricator.wikimedia.org/P48900 and previous config saved to /var/cache/conftool/dbconfig/20230606-151633-ladsgroup.json
15:12 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-text_esams and A:cp
15:08 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
15:07 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:06 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
15:06 mforns@deploy1002: Finished deploy [airflow-dags/analytics@72d9b87]: (no justification provided) (duration: 00m 10s)
15:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
15:06 mforns@deploy1002: Started deploy [airflow-dags/analytics@72d9b87]: (no justification provided)
15:03 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
15:02 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
15:02 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T336886)', diff saved to https://phabricator.wikimedia.org/P48899 and previous config saved to /var/cache/conftool/dbconfig/20230606-150141-ladsgroup.json
15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48898 and previous config saved to /var/cache/conftool/dbconfig/20230606-150120-ladsgroup.json
15:00 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
14:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1026.eqiad.wmnet with OS bullseye
14:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
14:56 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:53 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
14:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
14:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
14:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
14:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
14:53 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:53 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change entries for moved links eqiad row e f switches - cmooney@cumin1001"
14:51 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change entries for moved links eqiad row e f switches - cmooney@cumin1001"
14:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2013.codfw.wmnet with OS bullseye
14:49 cmooney@cumin1001: START - Cookbook sre.dns.netbox
14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P48897 and previous config saved to /var/cache/conftool/dbconfig/20230606-144614-ladsgroup.json
14:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
14:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P48896 and previous config saved to /var/cache/conftool/dbconfig/20230606-143107-ladsgroup.json
14:25 oblivian@deploy1002: Finished scap: Backport for Load and enable parsoid everywhere (T334980) (duration: 15m 00s)
14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48895 and previous config saved to /var/cache/conftool/dbconfig/20230606-141601-ladsgroup.json
14:16 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2013.codfw.wmnet with OS bullseye
14:15 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
14:12 oblivian@deploy1002: oblivian: Backport for Load and enable parsoid everywhere (T334980) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
14:10 oblivian@deploy1002: Started scap: Backport for Load and enable parsoid everywhere (T334980)
14:08 eoghan@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
14:06 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad.mgmt,lsw1-f[1,3]-eqiad.mgmt with reason: Migrate lsw1-f2-eqiad uplinks to spine
14:06 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad.mgmt,lsw1-f[1,3]-eqiad.mgmt with reason: Migrate lsw1-f2-eqiad uplinks to spine
14:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1026.eqiad.wmnet with OS bullseye
14:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
14:01 oblivian@deploy1002: Finished scap: Backport for Enable parser cache warming jobs for parsoid on enwiki (T329366) (duration: 07m 57s)
14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48894 and previous config saved to /var/cache/conftool/dbconfig/20230606-140051-ladsgroup.json
14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48893 and previous config saved to /var/cache/conftool/dbconfig/20230606-140030-ladsgroup.json
13:59 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 780 hosts
13:58 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 780 hosts
13:58 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 1259 hosts
13:57 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 1259 hosts
13:55 oblivian@deploy1002: oblivian and daniel: Backport for Enable parser cache warming jobs for parsoid on enwiki (T329366) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
13:53 oblivian@deploy1002: Started scap: Backport for Enable parser cache warming jobs for parsoid on enwiki (T329366)
13:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
13:50 oblivian@deploy1002: Finished scap: Backport for Drop wmgMemoryLimitParsoid from IS.php (duration: 07m 21s)
13:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P48891 and previous config saved to /var/cache/conftool/dbconfig/20230606-134524-ladsgroup.json
13:45 oblivian@deploy1002: oblivian: Backport for Drop wmgMemoryLimitParsoid from IS.php synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
13:43 oblivian@deploy1002: Started scap: Backport for Drop wmgMemoryLimitParsoid from IS.php
13:41 oblivian@deploy1002: Finished scap: Backport for Raise memory limit to match parsoid (T334980) (duration: 07m 53s)
13:41 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
13:41 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
13:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad.mgmt,lsw1-f[1-2]-eqiad.mgmt with reason: Migrate lsw1-f2-eqiad uplinks to spine
13:35 oblivian@deploy1002: oblivian: Backport for Raise memory limit to match parsoid (T334980) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
13:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad.mgmt,lsw1-f[1-2]-eqiad.mgmt with reason: Migrate lsw1-f2-eqiad uplinks to spine
13:33 oblivian@deploy1002: Started scap: Backport for Raise memory limit to match parsoid (T334980)
13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P48890 and previous config saved to /var/cache/conftool/dbconfig/20230606-133018-ladsgroup.json
13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48889 and previous config saved to /var/cache/conftool/dbconfig/20230606-131512-ladsgroup.json
13:11 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
13:06 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Disable canary events and hadoop ingestion for development.network.probe - T332024 (duration: 07m 17s)
13:00 eoghan@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48888 and previous config saved to /var/cache/conftool/dbconfig/20230606-125944-ladsgroup.json
12:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
12:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T336886)', diff saved to https://phabricator.wikimedia.org/P48887 and previous config saved to /var/cache/conftool/dbconfig/20230606-125923-ladsgroup.json
12:56 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-upload_esams and A:cp
12:55 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
12:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P48886 and previous config saved to /var/cache/conftool/dbconfig/20230606-124417-ladsgroup.json
12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P48885 and previous config saved to /var/cache/conftool/dbconfig/20230606-122911-ladsgroup.json
12:21 cgoubert@deploy1002: Finished scap: (no justification provided) (duration: 02m 10s)
12:19 cgoubert@deploy1002: Started scap: (no justification provided)
12:19 claime: redeploying 927218 to mw-on-k8s - T338121
12:15 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T336886)', diff saved to https://phabricator.wikimedia.org/P48884 and previous config saved to /var/cache/conftool/dbconfig/20230606-121405-ladsgroup.json
12:09 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
12:00 kamila@deploy1002: Finished scap: Backport for OAuthRateLimiter: Add rate limiting class for WME using LiftWing (T338121) (duration: 08m 54s)
11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T336886)', diff saved to https://phabricator.wikimedia.org/P48881 and previous config saved to /var/cache/conftool/dbconfig/20230606-115911-ladsgroup.json
11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
11:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T336886)', diff saved to https://phabricator.wikimedia.org/P48880 and previous config saved to /var/cache/conftool/dbconfig/20230606-115833-ladsgroup.json
11:53 kamila@deploy1002: kamila and klausman: Backport for OAuthRateLimiter: Add rate limiting class for WME using LiftWing (T338121) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
11:51 kamila@deploy1002: Started scap: Backport for OAuthRateLimiter: Add rate limiting class for WME using LiftWing (T338121)
11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P48879 and previous config saved to /var/cache/conftool/dbconfig/20230606-114327-ladsgroup.json
11:38 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:37 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P48878 and previous config saved to /var/cache/conftool/dbconfig/20230606-112819-ladsgroup.json
11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T336886)', diff saved to https://phabricator.wikimedia.org/P48877 and previous config saved to /var/cache/conftool/dbconfig/20230606-111313-ladsgroup.json
11:03 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T336886)', diff saved to https://phabricator.wikimedia.org/P48876 and previous config saved to /var/cache/conftool/dbconfig/20230606-105756-ladsgroup.json
10:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
10:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T336886)', diff saved to https://phabricator.wikimedia.org/P48875 and previous config saved to /var/cache/conftool/dbconfig/20230606-105724-ladsgroup.json
10:53 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
10:53 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
10:52 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
10:51 zabe@deploy1002: Finished scap: Backport for Stop writing to revision_comment_temp in group1 wikis (T299954) (duration: 07m 03s)
10:51 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
10:50 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
10:50 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
10:50 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
10:50 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
10:46 zabe@deploy1002: zabe: Backport for Stop writing to revision_comment_temp in group1 wikis (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
10:44 zabe@deploy1002: Started scap: Backport for Stop writing to revision_comment_temp in group1 wikis (T299954)
10:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P48874 and previous config saved to /var/cache/conftool/dbconfig/20230606-104218-ladsgroup.json
10:30 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
10:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
10:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:28 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P48873 and previous config saved to /var/cache/conftool/dbconfig/20230606-102712-ladsgroup.json
10:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
10:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
10:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:20 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.10 (duration: 02m 18s)
10:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:19 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:17 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.12 refs T337526 (duration: 56m 25s)
10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T336886)', diff saved to https://phabricator.wikimedia.org/P48872 and previous config saved to /var/cache/conftool/dbconfig/20230606-101205-ladsgroup.json
10:07 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:07 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:02 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
10:01 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
10:00 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
09:59 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
09:58 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
09:58 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T336886)', diff saved to https://phabricator.wikimedia.org/P48871 and previous config saved to /var/cache/conftool/dbconfig/20230606-095512-ladsgroup.json
09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T336886)', diff saved to https://phabricator.wikimedia.org/P48870 and previous config saved to /var/cache/conftool/dbconfig/20230606-095451-ladsgroup.json
09:41 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
09:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P48869 and previous config saved to /var/cache/conftool/dbconfig/20230606-093945-ladsgroup.json
09:34 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_esams and A:cp
09:31 fabfur@cumin1001: END (FAIL) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=1) rolling custom on A:cp-text_esams and A:cp
09:27 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_esams and A:cp
09:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
09:26 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P48867 and previous config saved to /var/cache/conftool/dbconfig/20230606-092439-ladsgroup.json
09:21 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.12 refs T337526
09:18 jynus: running systemctl start train-presync
09:16 vgutierrez: restarting acme-chief and nginx on acme-chief instances
09:11 claime: Building production images - T338014
09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T336886)', diff saved to https://phabricator.wikimedia.org/P48866 and previous config saved to /var/cache/conftool/dbconfig/20230606-090933-ladsgroup.json
08:59 urbanecm: deploy1002: run /usr/local/sbin/fix-staging-perms (T338205)
08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
08:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T336886)', diff saved to https://phabricator.wikimedia.org/P48865 and previous config saved to /var/cache/conftool/dbconfig/20230606-085337-ladsgroup.json
08:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
08:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T336886)', diff saved to https://phabricator.wikimedia.org/P48864 and previous config saved to /var/cache/conftool/dbconfig/20230606-085317-ladsgroup.json
08:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
08:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P48863 and previous config saved to /var/cache/conftool/dbconfig/20230606-083810-ladsgroup.json
08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P48861 and previous config saved to /var/cache/conftool/dbconfig/20230606-082304-ladsgroup.json
08:15 moritzm: installing openssl security updates on bullseye
08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T336886)', diff saved to https://phabricator.wikimedia.org/P48860 and previous config saved to /var/cache/conftool/dbconfig/20230606-080758-ladsgroup.json
07:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T336886)', diff saved to https://phabricator.wikimedia.org/P48859 and previous config saved to /var/cache/conftool/dbconfig/20230606-075210-ladsgroup.json
07:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
07:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
07:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T336886)', diff saved to https://phabricator.wikimedia.org/P48858 and previous config saved to /var/cache/conftool/dbconfig/20230606-075149-ladsgroup.json
07:47 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-upload_esams and A:cp
07:42 dcausse@deploy1002: Finished scap: Backport for ttm: use new config option to separate readable and writable services (T322284) (duration: 15m 20s)
07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P48857 and previous config saved to /var/cache/conftool/dbconfig/20230606-073643-ladsgroup.json
07:28 dcausse@deploy1002: dcausse: Backport for ttm: use new config option to separate readable and writable services (T322284) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
07:27 dcausse@deploy1002: Started scap: Backport for ttm: use new config option to separate readable and writable services (T322284)
07:22 kharlan@deploy1002: Finished scap: Backport for checkuser: Disable client hints feature by default (T337944) (duration: 08m 14s)
07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P48856 and previous config saved to /var/cache/conftool/dbconfig/20230606-072137-ladsgroup.json
07:16 kharlan@deploy1002: kharlan: Backport for checkuser: Disable client hints feature by default (T337944) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
07:14 kharlan@deploy1002: Started scap: Backport for checkuser: Disable client hints feature by default (T337944)
07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T336886)', diff saved to https://phabricator.wikimedia.org/P48855 and previous config saved to /var/cache/conftool/dbconfig/20230606-070631-ladsgroup.json
06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T336886)', diff saved to https://phabricator.wikimedia.org/P48854 and previous config saved to /var/cache/conftool/dbconfig/20230606-065057-ladsgroup.json
06:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
06:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
06:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
06:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
06:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
06:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
06:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T336886)', diff saved to https://phabricator.wikimedia.org/P48853 and previous config saved to /var/cache/conftool/dbconfig/20230606-060807-ladsgroup.json
05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P48852 and previous config saved to /var/cache/conftool/dbconfig/20230606-055301-ladsgroup.json
05:50 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 2518
05:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2518
05:49 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 2518
05:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2518
05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P48851 and previous config saved to /var/cache/conftool/dbconfig/20230606-053755-ladsgroup.json
05:34 Amir1: ladsgroup@clouddb1021:/srv/sqldata.s1$ sudo rm db1196* (T337961)
05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T336886)', diff saved to https://phabricator.wikimedia.org/P48850 and previous config saved to /var/cache/conftool/dbconfig/20230606-052249-ladsgroup.json
05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T336886)', diff saved to https://phabricator.wikimedia.org/P48849 and previous config saved to /var/cache/conftool/dbconfig/20230606-051938-ladsgroup.json
05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
05:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T336886)', diff saved to https://phabricator.wikimedia.org/P48848 and previous config saved to /var/cache/conftool/dbconfig/20230606-051918-ladsgroup.json
05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P48847 and previous config saved to /var/cache/conftool/dbconfig/20230606-050410-ladsgroup.json
04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P48846 and previous config saved to /var/cache/conftool/dbconfig/20230606-044904-ladsgroup.json
04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T336886)', diff saved to https://phabricator.wikimedia.org/P48845 and previous config saved to /var/cache/conftool/dbconfig/20230606-043358-ladsgroup.json
04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T336886)', diff saved to https://phabricator.wikimedia.org/P48844 and previous config saved to /var/cache/conftool/dbconfig/20230606-043047-ladsgroup.json
04:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
04:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T336886)', diff saved to https://phabricator.wikimedia.org/P48843 and previous config saved to /var/cache/conftool/dbconfig/20230606-043026-ladsgroup.json
04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P48842 and previous config saved to /var/cache/conftool/dbconfig/20230606-041520-ladsgroup.json
04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P48841 and previous config saved to /var/cache/conftool/dbconfig/20230606-040013-ladsgroup.json
03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T336886)', diff saved to https://phabricator.wikimedia.org/P48840 and previous config saved to /var/cache/conftool/dbconfig/20230606-034506-ladsgroup.json
03:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T336886)', diff saved to https://phabricator.wikimedia.org/P48839 and previous config saved to /var/cache/conftool/dbconfig/20230606-034256-ladsgroup.json
03:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
03:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
03:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T336886)', diff saved to https://phabricator.wikimedia.org/P48838 and previous config saved to /var/cache/conftool/dbconfig/20230606-034235-ladsgroup.json
03:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
03:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
03:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - pt1979@cumin2002"
03:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - pt1979@cumin2002"
03:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P48837 and previous config saved to /var/cache/conftool/dbconfig/20230606-032729-ladsgroup.json
03:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P48836 and previous config saved to /var/cache/conftool/dbconfig/20230606-031223-ladsgroup.json
02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T336886)', diff saved to https://phabricator.wikimedia.org/P48835 and previous config saved to /var/cache/conftool/dbconfig/20230606-025717-ladsgroup.json
02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T336886)', diff saved to https://phabricator.wikimedia.org/P48834 and previous config saved to /var/cache/conftool/dbconfig/20230606-025507-ladsgroup.json
02:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
02:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48833 and previous config saved to /var/cache/conftool/dbconfig/20230606-021622-ladsgroup.json
02:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
02:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48832 and previous config saved to /var/cache/conftool/dbconfig/20230606-020616-ladsgroup.json
02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P48831 and previous config saved to /var/cache/conftool/dbconfig/20230606-020116-ladsgroup.json
01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P48830 and previous config saved to /var/cache/conftool/dbconfig/20230606-015110-ladsgroup.json
01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P48829 and previous config saved to /var/cache/conftool/dbconfig/20230606-014610-ladsgroup.json
01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P48828 and previous config saved to /var/cache/conftool/dbconfig/20230606-013604-ladsgroup.json
01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48827 and previous config saved to /var/cache/conftool/dbconfig/20230606-013104-ladsgroup.json
01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48826 and previous config saved to /var/cache/conftool/dbconfig/20230606-012058-ladsgroup.json
01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48825 and previous config saved to /var/cache/conftool/dbconfig/20230606-010704-ladsgroup.json
01:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
01:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48824 and previous config saved to /var/cache/conftool/dbconfig/20230606-010643-ladsgroup.json
00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48823 and previous config saved to /var/cache/conftool/dbconfig/20230606-005357-ladsgroup.json
00:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
00:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T336886)', diff saved to https://phabricator.wikimedia.org/P48822 and previous config saved to /var/cache/conftool/dbconfig/20230606-005336-ladsgroup.json
00:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P48821 and previous config saved to /var/cache/conftool/dbconfig/20230606-005137-ladsgroup.json
00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P48820 and previous config saved to /var/cache/conftool/dbconfig/20230606-003830-ladsgroup.json
00:37 pt1979@cumin2002: START - Cookbook sre.dns.netbox
00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P48819 and previous config saved to /var/cache/conftool/dbconfig/20230606-003631-ladsgroup.json
00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P48818 and previous config saved to /var/cache/conftool/dbconfig/20230606-002324-ladsgroup.json
00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48817 and previous config saved to /var/cache/conftool/dbconfig/20230606-002125-ladsgroup.json
00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48816 and previous config saved to /var/cache/conftool/dbconfig/20230606-001914-ladsgroup.json
00:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
00:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
00:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
00:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T336886)', diff saved to https://phabricator.wikimedia.org/P48815 and previous config saved to /var/cache/conftool/dbconfig/20230606-001836-ladsgroup.json
00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T336886)', diff saved to https://phabricator.wikimedia.org/P48814 and previous config saved to /var/cache/conftool/dbconfig/20230606-000818-ladsgroup.json
00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P48813 and previous config saved to /var/cache/conftool/dbconfig/20230606-000330-ladsgroup.json

2023-06-05

23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T336886)', diff saved to https://phabricator.wikimedia.org/P48812 and previous config saved to /var/cache/conftool/dbconfig/20230605-235346-ladsgroup.json
23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T336886)', diff saved to https://phabricator.wikimedia.org/P48811 and previous config saved to /var/cache/conftool/dbconfig/20230605-235310-ladsgroup.json
23:49 zabe@deploy1002: Finished scap: Backport for Stop writing to revision_comment_temp in group0 wikis (T299954) (duration: 07m 02s)
23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P48810 and previous config saved to /var/cache/conftool/dbconfig/20230605-234824-ladsgroup.json
23:43 zabe@deploy1002: zabe: Backport for Stop writing to revision_comment_temp in group0 wikis (T299954) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
23:42 zabe@deploy1002: Started scap: Backport for Stop writing to revision_comment_temp in group0 wikis (T299954)
23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P48809 and previous config saved to /var/cache/conftool/dbconfig/20230605-233804-ladsgroup.json
23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T336886)', diff saved to https://phabricator.wikimedia.org/P48808 and previous config saved to /var/cache/conftool/dbconfig/20230605-233318-ladsgroup.json
23:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T336886)', diff saved to https://phabricator.wikimedia.org/P48807 and previous config saved to /var/cache/conftool/dbconfig/20230605-233107-ladsgroup.json
23:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48806 and previous config saved to /var/cache/conftool/dbconfig/20230605-233046-ladsgroup.json
23:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
23:24 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P48805 and previous config saved to /var/cache/conftool/dbconfig/20230605-232258-ladsgroup.json
23:22 pt1979@cumin2002: START - Cookbook sre.dns.netbox
23:22 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
23:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device ssw1-a1-codfw.mgmt.codfw.wmnet
23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P48804 and previous config saved to /var/cache/conftool/dbconfig/20230605-231540-ladsgroup.json
23:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:15 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove mgmt DNS for ssw1-a1 for testing - pt1979@cumin2002"
23:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove mgmt DNS for ssw1-a1 for testing - pt1979@cumin2002"
23:12 pt1979@cumin2002: START - Cookbook sre.dns.netbox
23:11 jforrester@deploy1002: Finished deploy [integration/docroot@6eefe56]: I5c1b92 for T334492 (duration: 00m 05s)
23:10 jforrester@deploy1002: Started deploy [integration/docroot@6eefe56]: I5c1b92 for T334492
23:09 jforrester@deploy1002: Finished deploy [integration/docroot@ab77611]: Idf6c7a (duration: 00m 08s)
23:09 jforrester@deploy1002: Started deploy [integration/docroot@ab77611]: Idf6c7a
23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T336886)', diff saved to https://phabricator.wikimedia.org/P48803 and previous config saved to /var/cache/conftool/dbconfig/20230605-230752-ladsgroup.json
23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P48802 and previous config saved to /var/cache/conftool/dbconfig/20230605-230034-ladsgroup.json
22:57 mutante: contint2001 - sudo systemctl restart apache2
22:57 mutante: contint2001 - sudo apt-get remove --purge libapache2-mod-php7.3 php7.3-cli php7.3-common php7.3-json php7.3-opcache php7.3-readline
22:55 jforrester@deploy1002: Finished deploy [integration/docroot@8255d99]: I6c7575 for T337425 (duration: 00m 13s)
22:55 jforrester@deploy1002: Started deploy [integration/docroot@8255d99]: I6c7575 for T337425
22:53 mutante: contint2001 (prod main CI server) - upgrading PHP 7.3 to 7.4
22:49 zabe@deploy1002: Finished scap: Backport for Stop writing to revision_comment_temp in testwiki (T299954) (duration: 09m 13s)
22:46 mutante: contint2002, contint1002 - upgrading PHP from 7.3 to 7.4
22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48801 and previous config saved to /var/cache/conftool/dbconfig/20230605-224528-ladsgroup.json
22:41 zabe@deploy1002: zabe: Backport for Stop writing to revision_comment_temp in testwiki (T299954) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
22:40 zabe@deploy1002: Started scap: Backport for Stop writing to revision_comment_temp in testwiki (T299954)
22:37 ladsgroup@deploy1002: Finished scap: Backport for moveToExternal: Actually convert encoding of cur_text (T337700) (duration: 09m 04s)
22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T336886)', diff saved to https://phabricator.wikimedia.org/P48800 and previous config saved to /var/cache/conftool/dbconfig/20230605-223035-ladsgroup.json
22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
22:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
22:29 ladsgroup@deploy1002: ladsgroup: Backport for moveToExternal: Actually convert encoding of cur_text (T337700) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
22:28 ladsgroup@deploy1002: Started scap: Backport for moveToExternal: Actually convert encoding of cur_text (T337700)
22:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48799 and previous config saved to /var/cache/conftool/dbconfig/20230605-222745-ladsgroup.json
22:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
22:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
22:24 ladsgroup@deploy1002: Finished scap: Backport for Revert "Remove legacy encoding option from dawiktionary" (duration: 07m 40s)
22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P48798 and previous config saved to /var/cache/conftool/dbconfig/20230605-222339-ladsgroup.json
22:18 ladsgroup@deploy1002: ladsgroup: Backport for Revert "Remove legacy encoding option from dawiktionary" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
22:17 ladsgroup@deploy1002: Started scap: Backport for Revert "Remove legacy encoding option from dawiktionary"
22:13 ladsgroup@deploy1002: Finished scap: Backport for Help measure the impact of saneitizer jobs (T336698) (duration: 09m 48s)
22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P48797 and previous config saved to /var/cache/conftool/dbconfig/20230605-220833-ladsgroup.json
22:05 ladsgroup@deploy1002: ladsgroup: Backport for Help measure the impact of saneitizer jobs (T336698) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
22:03 ladsgroup@deploy1002: Started scap: Backport for Help measure the impact of saneitizer jobs (T336698)
22:01 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1016.eqiad.wmnet
22:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1016.eqiad.wmnet
21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48796 and previous config saved to /var/cache/conftool/dbconfig/20230605-215345-ladsgroup.json
21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P48795 and previous config saved to /var/cache/conftool/dbconfig/20230605-215326-ladsgroup.json
21:51 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs1016.eqiad.wmnet
21:50 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1016.eqiad.wmnet
21:42 urbanecm@deploy1002: Finished scap: Backport for NewImpact: Fix renderMode parsing for Special:Impact (T338085) (duration: 25m 38s)
21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P48794 and previous config saved to /var/cache/conftool/dbconfig/20230605-213839-ladsgroup.json
21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P48793 and previous config saved to /var/cache/conftool/dbconfig/20230605-213819-ladsgroup.json
21:35 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1015.eqiad.wmnet
21:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1015.eqiad.wmnet
21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P48792 and previous config saved to /var/cache/conftool/dbconfig/20230605-213202-ladsgroup.json
21:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
21:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
21:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
21:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
21:30 urbanecm@deploy1002: urbanecm: Backport for NewImpact: Fix renderMode parsing for Special:Impact (T338085) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
21:29 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
21:29 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
21:25 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs1015.eqiad.wmnet
21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P48791 and previous config saved to /var/cache/conftool/dbconfig/20230605-212333-ladsgroup.json
21:23 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1015.eqiad.wmnet
21:18 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
21:17 urbanecm@deploy1002: Started scap: Backport for NewImpact: Fix renderMode parsing for Special:Impact (T338085)
21:16 urbanecm@deploy1002: Finished scap: Backport for Update interwiki cache (T338093) (duration: 24m 34s)
21:15 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48790 and previous config saved to /var/cache/conftool/dbconfig/20230605-210827-ladsgroup.json
21:05 urbanecm@deploy1002: urbanecm: Backport for Update interwiki cache (T338093) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:51 urbanecm@deploy1002: Started scap: Backport for Update interwiki cache (T338093)
20:48 cjming: end of UTC late backport window
20:47 urbanecm: [urbanecm@deploy1002 ~]$ sudo /usr/local/sbin/fix-staging-perms # verify T338180 fix
away: payments-wiki upgraded from 2b4203df to f3b229c6
20:46 cjming@deploy1002: Finished scap: Backport for Revert "Revert "VisualEditorFeatureUse sampling rate to 1 everywhere"" (duration: 09m 57s)
20:38 cjming@deploy1002: cjming: Backport for Revert "Revert "VisualEditorFeatureUse sampling rate to 1 everywhere"" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
20:36 cjming@deploy1002: Started scap: Backport for Revert "Revert "VisualEditorFeatureUse sampling rate to 1 everywhere""
20:35 cjming@deploy1002: Finished scap: Backport for Add initial stream configs for Android article events using Metrics Platform Java client library (T330355) (duration: 24m 57s)
20:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48789 and previous config saved to /var/cache/conftool/dbconfig/20230605-202916-ladsgroup.json
20:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
20:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T336886)', diff saved to https://phabricator.wikimedia.org/P48788 and previous config saved to /var/cache/conftool/dbconfig/20230605-202855-ladsgroup.json
20:23 cjming@deploy1002: cjming: Backport for Add initial stream configs for Android article events using Metrics Platform Java client library (T330355) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P48787 and previous config saved to /var/cache/conftool/dbconfig/20230605-201349-ladsgroup.json
20:10 cjming@deploy1002: Started scap: Backport for Add initial stream configs for Android article events using Metrics Platform Java client library (T330355)
20:09 urbanecm: [urbanecm@deploy1002 ~]$ sudo /usr/local/sbin/fix-staging-perms # attempt to fix permission errors when doing a backport
19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P48786 and previous config saved to /var/cache/conftool/dbconfig/20230605-195842-ladsgroup.json
19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T336886)', diff saved to https://phabricator.wikimedia.org/P48785 and previous config saved to /var/cache/conftool/dbconfig/20230605-194336-ladsgroup.json
19:32 brett: Maglev LVS scheduler rollout in eqiad finished (puppet re-enabled) - T263797
19:12 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2011.codfw.wmnet
19:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2011.codfw.wmnet
19:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T336886)', diff saved to https://phabricator.wikimedia.org/P48784 and previous config saved to /var/cache/conftool/dbconfig/20230605-190702-ladsgroup.json
19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T336886)', diff saved to https://phabricator.wikimedia.org/P48783 and previous config saved to /var/cache/conftool/dbconfig/20230605-190528-ladsgroup.json
19:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
19:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
19:03 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2011.codfw.wmnet
18:58 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2011.codfw.wmnet
18:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2011.codfw.wmnet
18:52 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: no-op: revert - remove undeeded wgEventBusStreamNamesMap override setting (take 2) - T336817 (duration: 11m 54s)
18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P48782 and previous config saved to /var/cache/conftool/dbconfig/20230605-185156-ladsgroup.json
18:48 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2011.codfw.wmnet
18:48 inflatador: bking@cumin1001 depooling wdqs2011for fw update T331297
18:48 inflatador: bking@cumin1001 repooling wdqs2010 T331297
18:45 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2010.codfw.wmnet
18:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
18:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
18:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P48781 and previous config saved to /var/cache/conftool/dbconfig/20230605-183650-ladsgroup.json
18:35 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2010.codfw.wmnet
18:32 inflatador: bking@cumin1001 depooling wdqs2010 for fw update T331297
18:30 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: revert - Remove unused page_change rc streams - T336817 (duration: 11m 23s)
18:29 sukhe: homer "cr*-eqiad*" commit "Gerrit: 927246 remove old gerrit service IP"
18:28 brett: Maglev LVS scheduler rollout in eqiad (puppet disabled) - T263797
18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T336886)', diff saved to https://phabricator.wikimedia.org/P48780 and previous config saved to /var/cache/conftool/dbconfig/20230605-182144-ladsgroup.json
18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T336886)', diff saved to https://phabricator.wikimedia.org/P48779 and previous config saved to /var/cache/conftool/dbconfig/20230605-181935-ladsgroup.json
18:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
18:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48778 and previous config saved to /var/cache/conftool/dbconfig/20230605-181915-ladsgroup.json
18:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
18:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T336886)', diff saved to https://phabricator.wikimedia.org/P48777 and previous config saved to /var/cache/conftool/dbconfig/20230605-181219-ladsgroup.json
18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P48776 and previous config saved to /var/cache/conftool/dbconfig/20230605-180408-ladsgroup.json
17:58 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
17:58 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48775 and previous config saved to /var/cache/conftool/dbconfig/20230605-175712-ladsgroup.json
17:50 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: no-op: Remove unused page_change rc streams - T336817 (duration: 20m 11s)
17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P48774 and previous config saved to /var/cache/conftool/dbconfig/20230605-174902-ladsgroup.json
17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48773 and previous config saved to /var/cache/conftool/dbconfig/20230605-174206-ladsgroup.json
17:38 cdanis@deploy1002: Finished scap: Backport for Enable user network probe events (T332024) (duration: 10m 02s)
17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48772 and previous config saved to /var/cache/conftool/dbconfig/20230605-173356-ladsgroup.json
17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48771 and previous config saved to /var/cache/conftool/dbconfig/20230605-173002-ladsgroup.json
17:30 cdanis@deploy1002: cdanis: Backport for Enable user network probe events (T332024) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
17:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
17:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T336886)', diff saved to https://phabricator.wikimedia.org/P48770 and previous config saved to /var/cache/conftool/dbconfig/20230605-172942-ladsgroup.json
17:28 cdanis@deploy1002: Started scap: Backport for Enable user network probe events (T332024)
17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T336886)', diff saved to https://phabricator.wikimedia.org/P48769 and previous config saved to /var/cache/conftool/dbconfig/20230605-172700-ladsgroup.json
17:26 cdanis@deploy1002: Backport cancelled.
17:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: no-op: Remove undeeded wgEventBusStreamNamesMap override setting (take 2) - T336817 (duration: 09m 25s)
17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T336886)', diff saved to https://phabricator.wikimedia.org/P48768 and previous config saved to /var/cache/conftool/dbconfig/20230605-172124-ladsgroup.json
17:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
17:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T336886)', diff saved to https://phabricator.wikimedia.org/P48767 and previous config saved to /var/cache/conftool/dbconfig/20230605-172103-ladsgroup.json
17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P48766 and previous config saved to /var/cache/conftool/dbconfig/20230605-171436-ladsgroup.json
17:12 cdanis@deploy1002: Backport cancelled.
17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48765 and previous config saved to /var/cache/conftool/dbconfig/20230605-170557-ladsgroup.json
16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P48764 and previous config saved to /var/cache/conftool/dbconfig/20230605-165929-ladsgroup.json
16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48763 and previous config saved to /var/cache/conftool/dbconfig/20230605-165051-ladsgroup.json
16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T336886)', diff saved to https://phabricator.wikimedia.org/P48762 and previous config saved to /var/cache/conftool/dbconfig/20230605-164423-ladsgroup.json
16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2013.codfw.wmnet with OS bullseye
16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T336886)', diff saved to https://phabricator.wikimedia.org/P48761 and previous config saved to /var/cache/conftool/dbconfig/20230605-163714-ladsgroup.json
16:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T336886)', diff saved to https://phabricator.wikimedia.org/P48760 and previous config saved to /var/cache/conftool/dbconfig/20230605-163653-ladsgroup.json
16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T336886)', diff saved to https://phabricator.wikimedia.org/P48759 and previous config saved to /var/cache/conftool/dbconfig/20230605-163545-ladsgroup.json
16:35 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T336886)', diff saved to https://phabricator.wikimedia.org/P48758 and previous config saved to /var/cache/conftool/dbconfig/20230605-162707-ladsgroup.json
16:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
16:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T336886)', diff saved to https://phabricator.wikimedia.org/P48757 and previous config saved to /var/cache/conftool/dbconfig/20230605-162629-ladsgroup.json
16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P48756 and previous config saved to /var/cache/conftool/dbconfig/20230605-162147-ladsgroup.json
16:21 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
16:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
16:19 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
16:16 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P48755 and previous config saved to /var/cache/conftool/dbconfig/20230605-161123-ladsgroup.json
16:08 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P48754 and previous config saved to /var/cache/conftool/dbconfig/20230605-160640-ladsgroup.json
16:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
16:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
16:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
16:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
16:05 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
16:05 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
15:59 bblack: mw1419: manually executing a php restart to test new safe-service-restart
15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P48753 and previous config saved to /var/cache/conftool/dbconfig/20230605-155617-ladsgroup.json
15:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2013.codfw.wmnet with OS bullseye
15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T336886)', diff saved to https://phabricator.wikimedia.org/P48752 and previous config saved to /var/cache/conftool/dbconfig/20230605-155134-ladsgroup.json
15:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2013']
15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T336886)', diff saved to https://phabricator.wikimedia.org/P48751 and previous config saved to /var/cache/conftool/dbconfig/20230605-154926-ladsgroup.json
15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48750 and previous config saved to /var/cache/conftool/dbconfig/20230605-154905-ladsgroup.json
15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T336886)', diff saved to https://phabricator.wikimedia.org/P48749 and previous config saved to /var/cache/conftool/dbconfig/20230605-154110-ladsgroup.json
15:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2013']
15:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2013']
15:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2013']
15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T336886)', diff saved to https://phabricator.wikimedia.org/P48748 and previous config saved to /var/cache/conftool/dbconfig/20230605-153542-ladsgroup.json
15:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
15:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T336886)', diff saved to https://phabricator.wikimedia.org/P48747 and previous config saved to /var/cache/conftool/dbconfig/20230605-153521-ladsgroup.json
15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P48746 and previous config saved to /var/cache/conftool/dbconfig/20230605-153359-ladsgroup.json
15:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance
15:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance
15:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs2013.mgmt.codfw.wmnet with reboot policy FORCED
15:27 Amir1: on s3 master: update `text` set old_text = 'O:18:"historyblobcurstub":1:{s:6:"mCurId";i:5532;}', old_flags = 'object' where old_id= 14484; (T337700)
15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P48745 and previous config saved to /var/cache/conftool/dbconfig/20230605-152015-ladsgroup.json
15:19 moritzm: installing debian-archive-keyring updates on bullseye hosts
15:19 mforns@deploy1002: Finished deploy [airflow-dags/analytics@674ec0a]: (no justification provided) (duration: 00m 17s)
15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P48744 and previous config saved to /var/cache/conftool/dbconfig/20230605-151853-ladsgroup.json
15:18 mforns@deploy1002: Started deploy [airflow-dags/analytics@674ec0a]: (no justification provided)
15:18 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T326767 (duration: 102m 46s)
15:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs2013.mgmt.codfw.wmnet with reboot policy FORCED
15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Setup DNS for lvs2013 - pt1979@cumin2002"
15:06 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Setup DNS for lvs2013 - pt1979@cumin2002"
15:05 moritzm: installing avahi security updates
15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P48742 and previous config saved to /var/cache/conftool/dbconfig/20230605-150509-ladsgroup.json
15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48741 and previous config saved to /var/cache/conftool/dbconfig/20230605-150347-ladsgroup.json
15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48740 and previous config saved to /var/cache/conftool/dbconfig/20230605-150138-ladsgroup.json
15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T336886)', diff saved to https://phabricator.wikimedia.org/P48739 and previous config saved to /var/cache/conftool/dbconfig/20230605-150117-ladsgroup.json
14:55 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:55 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
14:52 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:52 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:50 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T336886)', diff saved to https://phabricator.wikimedia.org/P48738 and previous config saved to /var/cache/conftool/dbconfig/20230605-145003-ladsgroup.json
14:48 sukhe: homer "cr*-codfw*" commit "Gerrit: 927208 remove decommissioned host lvs2009": T335777
14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs2009.codfw.wmnet
14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P48737 and previous config saved to /var/cache/conftool/dbconfig/20230605-144611-ladsgroup.json
14:45 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T336886)', diff saved to https://phabricator.wikimedia.org/P48736 and previous config saved to /var/cache/conftool/dbconfig/20230605-144438-ladsgroup.json
14:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
14:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T336886)', diff saved to https://phabricator.wikimedia.org/P48735 and previous config saved to /var/cache/conftool/dbconfig/20230605-144417-ladsgroup.json
14:42 sukhe@cumin2002: START - Cookbook sre.dns.netbox
14:32 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs2009.codfw.wmnet
14:31 ejegg: payments-wiki upgraded from c2f9f8b5 to 2b4203df
14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P48734 and previous config saved to /var/cache/conftool/dbconfig/20230605-143105-ladsgroup.json
14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P48733 and previous config saved to /var/cache/conftool/dbconfig/20230605-142911-ladsgroup.json
14:28 sukhe: codfw low-traffic LVS: set routing-options static route 10.2.1.0/24 next-hop 10.192.49.7
14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T336886)', diff saved to https://phabricator.wikimedia.org/P48732 and previous config saved to /var/cache/conftool/dbconfig/20230605-141559-ladsgroup.json
14:15 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:15 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T336886)', diff saved to https://phabricator.wikimedia.org/P48731 and previous config saved to /var/cache/conftool/dbconfig/20230605-141451-ladsgroup.json
14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T336886)', diff saved to https://phabricator.wikimedia.org/P48730 and previous config saved to /var/cache/conftool/dbconfig/20230605-141430-ladsgroup.json
14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P48729 and previous config saved to /var/cache/conftool/dbconfig/20230605-141405-ladsgroup.json
14:08 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:08 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P48728 and previous config saved to /var/cache/conftool/dbconfig/20230605-135924-ladsgroup.json
13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T336886)', diff saved to https://phabricator.wikimedia.org/P48727 and previous config saved to /var/cache/conftool/dbconfig/20230605-135859-ladsgroup.json
13:57 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:56 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T336886)', diff saved to https://phabricator.wikimedia.org/P48726 and previous config saved to /var/cache/conftool/dbconfig/20230605-135332-ladsgroup.json
13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T336886)', diff saved to https://phabricator.wikimedia.org/P48725 and previous config saved to /var/cache/conftool/dbconfig/20230605-135311-ladsgroup.json
13:46 moritzm: installing python-ipaddress security updates
13:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance
13:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance
13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P48724 and previous config saved to /var/cache/conftool/dbconfig/20230605-134418-ladsgroup.json
13:44 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: Host under maintenance
13:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: Host under maintenance
13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T335845)', diff saved to https://phabricator.wikimedia.org/P48723 and previous config saved to /var/cache/conftool/dbconfig/20230605-134313-ladsgroup.json
13:41 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
13:41 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P48722 and previous config saved to /var/cache/conftool/dbconfig/20230605-133805-ladsgroup.json
13:36 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T326767
13:35 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T322937 (duration: 01m 06s)
13:35 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T322937
13:35 bblack@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: temporary lock for LVS resarts in core DCs (duration: 05m 54s)
13:32 bblack: lvs1* (eqiad) - restart pybal for T334703 IPs
13:29 bblack: lvs2* (codfw) - restart pybal for T334703 IPs
13:29 bblack@deploy1002: Locking from deployment [ALL REPOSITORIES]: temporary lock for LVS resarts in core DCs
13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T336886)', diff saved to https://phabricator.wikimedia.org/P48721 and previous config saved to /var/cache/conftool/dbconfig/20230605-132911-ladsgroup.json
13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P48720 and previous config saved to /var/cache/conftool/dbconfig/20230605-132807-ladsgroup.json
13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T336886)', diff saved to https://phabricator.wikimedia.org/P48719 and previous config saved to /var/cache/conftool/dbconfig/20230605-132703-ladsgroup.json
13:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T336886)', diff saved to https://phabricator.wikimedia.org/P48718 and previous config saved to /var/cache/conftool/dbconfig/20230605-132642-ladsgroup.json
13:25 hashar: Restarted Zuul due to stall ssh connection # T309376
13:25 bblack: lvs3* (esams) - restart pybal for T334703 IPs
13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P48717 and previous config saved to /var/cache/conftool/dbconfig/20230605-132259-ladsgroup.json
13:19 bblack: lvs5* (eqsin) - restart pybal for T334703 IPs
13:17 Lucas_WMDE: UTC afternoon backport+config window done
13:15 bblack: lvs6* (drmrs) - restart pybal for T334703 IPs
13:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Make outreachwiki a multilingual Wikidata client (T171140) (duration: 10m 06s)
13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P48716 and previous config saved to /var/cache/conftool/dbconfig/20230605-131301-ladsgroup.json
13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P48715 and previous config saved to /var/cache/conftool/dbconfig/20230605-131136-ladsgroup.json
13:09 bblack: lvs4* (ulsfo) - restart pybal for T334703 IPs
13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T336886)', diff saved to https://phabricator.wikimedia.org/P48714 and previous config saved to /var/cache/conftool/dbconfig/20230605-130753-ladsgroup.json
13:05 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Make outreachwiki a multilingual Wikidata client (T171140) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
13:04 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Make outreachwiki a multilingual Wikidata client (T171140)
13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T336886)', diff saved to https://phabricator.wikimedia.org/P48713 and previous config saved to /var/cache/conftool/dbconfig/20230605-130228-ladsgroup.json
13:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
13:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T335845)', diff saved to https://phabricator.wikimedia.org/P48712 and previous config saved to /var/cache/conftool/dbconfig/20230605-125754-ladsgroup.json
12:56 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P48711 and previous config saved to /var/cache/conftool/dbconfig/20230605-125630-ladsgroup.json
12:52 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
12:51 Amir1: killed prioritizeFilesWithTemplate.php, stopping depool maint.
12:49 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
12:44 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T335845)', diff saved to https://phabricator.wikimedia.org/P48710 and previous config saved to /var/cache/conftool/dbconfig/20230605-124444-ladsgroup.json
12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
12:43 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T336886)', diff saved to https://phabricator.wikimedia.org/P48709 and previous config saved to /var/cache/conftool/dbconfig/20230605-124124-ladsgroup.json
12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T336886)', diff saved to https://phabricator.wikimedia.org/P48708 and previous config saved to /var/cache/conftool/dbconfig/20230605-123915-ladsgroup.json
12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
12:39 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
12:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
12:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
12:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
12:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
12:17 jynus: creating a copy of db1157 binlogs on dbprov1004 T338128
12:15 bblack: lvs*: disabling puppet to roll out new LVS IPs in https://gerrit.wikimedia.org/r/c/operations/puppet/+/924593 - T334703
12:15 bblack: lvs*: disabling puppet to roll out new LVS IPs in https://gerrit.wikimedia.org/r/c/operations/puppet/+/924593 - T334703
12:15 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetboard-next
11:46 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:relforge
11:45 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:relforge
11:39 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetboard-next
11:21 moritzm: restarting Exim on MXes to pick up OpenSSL updates
11:15 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir
11:13 moritzm: bounced ferm on ml-serve2006 (race caused by firewall profile change)
11:08 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir
10:31 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas
10:29 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas
10:14 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:14 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirts - aborrero@cumin1001"
10:13 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirts - aborrero@cumin1001"
10:11 moritzm: installing openssl security updates on Bullseye
10:08 aborrero@cumin1001: START - Cookbook sre.dns.netbox
10:06 godog: truncate xff.log and JobExecutor.log on mwlog1002 to reclaim space - T338127
09:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
09:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
09:39 claime: roll-restart thumbor in eqiad - T337649
09:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
09:38 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=thumbor.*
09:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
09:37 claime: roll-restart thumbor in codfw - T337649
08:40 claime: power-cycling restbase1027 - T338122
07:54 moritzm: installing containerd security updates
07:38 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337669) (duration: 09m 58s)
07:30 kartik@deploy1002: kartik: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337669) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
07:28 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337669)
07:25 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
07:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
07:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
07:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
07:21 taavi@deploy1002: Finished scap: Backport for [SearchVue] Enable on Norwegian, Hungarian, Catalan, Dutch, and Ukrainian (T336870) (duration: 18m 27s)
07:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
07:12 taavi@deploy1002: mlitn and taavi: Backport for [SearchVue] Enable on Norwegian, Hungarian, Catalan, Dutch, and Ukrainian (T336870) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
07:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
07:02 taavi@deploy1002: Started scap: Backport for [SearchVue] Enable on Norwegian, Hungarian, Catalan, Dutch, and Ukrainian (T336870)
06:20 _joe_: killing a pod with consistently high haproxy queue for thumbor in codfw
06:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 60427
06:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 60427

2023-06-03

13:41 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Host under testing/upgrade
13:41 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Host under testing/upgrade
13:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2012.codfw.wmnet
13:28 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2012.codfw.wmnet

2023-06-02

20:16 apergos: rsync in ariel screen session, bwlimit 100000, running on dumpsdata1003, pulling from dumpsdata1002, copying over 'other dumps'
18:42 bblack: dns*: puppets are all re-enabled, ntp restarts are done, etc
17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
17:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
17:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
17:45 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
17:27 bblack: dns*: disabling puppet to control rollout of NTP config fixups
16:03 bblack: dns*: removed faulty authdns[12]001 lines from /etc/hosts via cumin+sed
15:35 sukhe: restart ntp.service on dns1002
13:26 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
13:26 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:25 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:25 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:25 ottomata: deploying flink-operator change to dse-k8s and wikikube to add ingress for health check port - https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/926479
13:24 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
13:24 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
13:24 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
13:24 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
13:22 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:22 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:03 moritzm: installing at-spi2-core bugfix updates from Bullseye point release
09:35 moritzm: installing texlive-security updates on buster
09:18 akosiaris: update kubernetes-node to 1.23.14-2 on all P:kubernetes::node hosts (88 in total) T337836. Reload systemd for unit changes to take effect
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5016.eqsin.wmnet
08:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5016.eqsin.wmnet
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5015.eqsin.wmnet
08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5015.eqsin.wmnet
08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5014.eqsin.wmnet
08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5014.eqsin.wmnet
08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5013.eqsin.wmnet
08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5013.eqsin.wmnet
08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 0 hosts:
08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 0 hosts:
08:42 moritzm: installing traceroute bugfix updates from Bullseye point release
07:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org
07:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org
07:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3006.wikimedia.org
07:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3006.wikimedia.org
07:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad or A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org
07:22 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad or A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
07:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org
01:53 ejegg: fundraising python tools upgraded from 759d4c89 to 2ca83336
01:22 cstone: civicrm upgraded from 3819d6d1 to bcc8fccc

2023-06-01

21:06 samtar@deploy1002: Finished scap: Backport for Remove deleted config wgVectorStickyHeaderEdit (T337955) (duration: 08m 30s)
20:59 samtar@deploy1002: esanders and samtar: Backport for Remove deleted config wgVectorStickyHeaderEdit (T337955) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
20:57 samtar@deploy1002: Started scap: Backport for Remove deleted config wgVectorStickyHeaderEdit (T337955)
20:54 samtar@deploy1002: Finished scap: Backport for Remove config and AB test code for edit buttons in sticky header (T337955) (duration: 10m 29s)
20:45 samtar@deploy1002: samtar and ksarabia: Backport for Remove config and AB test code for edit buttons in sticky header (T337955) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:44 samtar@deploy1002: Started scap: Backport for Remove config and AB test code for edit buttons in sticky header (T337955)
20:21 samtar@deploy1002: Finished scap: Backport for Deploy Research Incentive survey on enwiki (T336092) (duration: 07m 56s)
20:15 samtar@deploy1002: dani and samtar: Backport for Deploy Research Incentive survey on enwiki (T336092) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:13 samtar@deploy1002: Started scap: Backport for Deploy Research Incentive survey on enwiki (T336092)
20:12 samtar@deploy1002: Finished scap: Backport for Always collapse by default the CheckUserHelper on loginwiki (T328726) (duration: 08m 20s)
20:05 samtar@deploy1002: samtar and dreamyjazz: Backport for Always collapse by default the CheckUserHelper on loginwiki (T328726) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
20:04 samtar@deploy1002: Started scap: Backport for Always collapse by default the CheckUserHelper on loginwiki (T328726)
19:51 ejegg: fundraising python tools upgraded from 72570bdd to 759d4c89
19:12 mforns@deploy1002: Finished deploy [airflow-dags/analytics@21e7354]: (no justification provided) (duration: 02m 42s)
19:11 mforns@deploy1002: Started deploy [airflow-dags/analytics@21e7354]: (no justification provided)
19:11 bblack@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: temporary lock for LVS/pybal upgrade work (duration: 03m 27s)
19:09 bblack: lvs1* (eqiad): upgrade pybal to 1.15.13 - T334703
19:08 bblack@deploy1002: Locking from deployment [ALL REPOSITORIES]: temporary lock for LVS/pybal upgrade work
18:45 bblack: lvs6* (drmrs): upgrade pybal to 1.15.13 - T334703
18:33 bblack: lvs3* (esams): upgrade pybal to 1.15.13 - T334703
18:32 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.11 refs T337525
17:50 mforns@deploy1002: Finished deploy [airflow-dags/analytics@03ca1c1]: (no justification provided) (duration: 00m 10s)
17:50 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-upload_drmrs and A:cp
17:50 mforns@deploy1002: Started deploy [airflow-dags/analytics@03ca1c1]: (no justification provided)
17:49 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
17:48 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
17:48 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-text_drmrs and A:cp
17:47 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
17:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
17:45 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
17:45 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
17:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye
17:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
16:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye
16:55 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: revert: Remove undeeded wgEventBusStreamNamesMap override setting. Recent EventBus changes are not deployed yet? - T336817 (duration: 07m 24s)
16:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
16:53 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
16:53 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
16:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
16:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: no-op: Remove undeeded wgEventBusStreamNamesMap override setting - T336817 (duration: 08m 18s)
16:42 bblack: lvs2* (codfw): upgrade pybal to 1.15.13 - T334703
16:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye
16:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
16:35 bblack: lvs5* (eqsin): upgrade pybal to 1.15.13 - T334703
16:32 bblack: lvs400[89]: upgrade pybal to 1.15.13 - T334703 (round 2!)
16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudswift1001.eqiad.wmnet with OS bullseye
16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:10 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
16:07 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
16:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudswift1001.eqiad.wmnet with reason: host reimage
16:06 mutante: gerrit - set repo wikimedia/annualreport to readonly (from active) - T337041
16:04 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudswift1001.eqiad.wmnet with reason: host reimage
16:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
16:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
15:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
15:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
15:45 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
15:44 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
15:33 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
15:33 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
15:21 fabfur: running run-puppet-agent on cp6010.drmrs.wmnet to fix icinga check from cookbook
15:15 bblack: lvs400[89]: upgrade pybal to 1.15.13 - T334703
15:11 sukhe: reprepro -C component/pybal bullseye-wikimedia pybal_1.15.13_source.changes
15:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog1002.eqiad.wmnet with OS bullseye
14:59 moritzm: installing python-sqlparse security updates
14:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
14:56 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
14:55 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
14:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
14:53 moritzm: installing jackson-databind security updates
14:49 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
14:45 fabfur: running run-puppet-agent on cp6009.drmrs.wmnet to fix icinga check from cookbook
14:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog1002.eqiad.wmnet with reason: host reimage
14:41 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog1002.eqiad.wmnet with reason: host reimage
14:40 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-upload_drmrs and A:cp
14:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
14:39 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
14:36 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_drmrs and A:cp
14:34 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
14:29 moritzm: installing imagemagick security updates on buster
14:16 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog1002.eqiad.wmnet with OS bullseye
14:14 fabfur: Disabled puppet on A:cp-drmrs for T323557
14:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3c9cc85]: (no justification provided) (duration: 00m 11s)
14:13 mforns@deploy1002: Started deploy [airflow-dags/analytics@3c9cc85]: (no justification provided)
14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48700 and previous config saved to /var/cache/conftool/dbconfig/20230601-141317-ladsgroup.json
14:11 claime: Removing obsolete mediawiki-services-function-evaluator from registry - T337505
13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P48699 and previous config saved to /var/cache/conftool/dbconfig/20230601-135811-ladsgroup.json
13:52 moritzm: installing sysstat security updates
13:52 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
13:51 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
13:50 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
13:50 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
13:49 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
13:49 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P48698 and previous config saved to /var/cache/conftool/dbconfig/20230601-134304-ladsgroup.json
13:29 moritzm: installing openssl security updates on bullseye
13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48697 and previous config saved to /var/cache/conftool/dbconfig/20230601-132758-ladsgroup.json
13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48695 and previous config saved to /var/cache/conftool/dbconfig/20230601-132319-ladsgroup.json
13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T336886)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20230601-132238-ladsgroup.json
13:21 claime: Removing obsolete mediawiki-services-function-orchestrator from registry - T337505
13:13 urbanecm@deploy1002: Finished scap: Backport for beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362), Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364) (duration: 11m 08s)
13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P48694 and previous config saved to /var/cache/conftool/dbconfig/20230601-130732-ladsgroup.json
13:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
13:04 urbanecm@deploy1002: urbanecm and daimona: Backport for beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362), Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
13:03 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
13:02 urbanecm@deploy1002: Started scap: Backport for beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362), Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364)
12:58 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
12:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
12:52 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
12:52 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P48693 and previous config saved to /var/cache/conftool/dbconfig/20230601-125226-ladsgroup.json
12:50 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
12:49 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T336886)', diff saved to https://phabricator.wikimedia.org/P48692 and previous config saved to /var/cache/conftool/dbconfig/20230601-123720-ladsgroup.json
12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T336886)', diff saved to https://phabricator.wikimedia.org/P48691 and previous config saved to /var/cache/conftool/dbconfig/20230601-123236-ladsgroup.json
12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
12:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T336886)', diff saved to https://phabricator.wikimedia.org/P48690 and previous config saved to /var/cache/conftool/dbconfig/20230601-122900-ladsgroup.json
12:17 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:17 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:16 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:16 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P48689 and previous config saved to /var/cache/conftool/dbconfig/20230601-121354-ladsgroup.json
12:03 Daimona: Creating ce_tracking_tools table for the CampaignEvents extension on testwiki, test2wiki, officewiki, and metawiki # T336365
11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P48688 and previous config saved to /var/cache/conftool/dbconfig/20230601-115848-ladsgroup.json
11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T336886)', diff saved to https://phabricator.wikimedia.org/P48687 and previous config saved to /var/cache/conftool/dbconfig/20230601-114342-ladsgroup.json
11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T336886)', diff saved to https://phabricator.wikimedia.org/P48686 and previous config saved to /var/cache/conftool/dbconfig/20230601-113843-ladsgroup.json
11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T336886)', diff saved to https://phabricator.wikimedia.org/P48685 and previous config saved to /var/cache/conftool/dbconfig/20230601-113822-ladsgroup.json
11:28 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
11:28 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
11:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:25 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P48684 and previous config saved to /var/cache/conftool/dbconfig/20230601-112316-ladsgroup.json
11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P48683 and previous config saved to /var/cache/conftool/dbconfig/20230601-110810-ladsgroup.json
11:04 jayme: disabling puppet on all kubernestes control planes for https://gerrit.wikimedia.org/r/c/operations/puppet/+/925707
10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T336886)', diff saved to https://phabricator.wikimedia.org/P48682 and previous config saved to /var/cache/conftool/dbconfig/20230601-105303-ladsgroup.json
10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T336886)', diff saved to https://phabricator.wikimedia.org/P48681 and previous config saved to /var/cache/conftool/dbconfig/20230601-104803-ladsgroup.json
10:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
10:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T336886)', diff saved to https://phabricator.wikimedia.org/P48680 and previous config saved to /var/cache/conftool/dbconfig/20230601-104742-ladsgroup.json
10:45 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P48679 and previous config saved to /var/cache/conftool/dbconfig/20230601-103236-ladsgroup.json
10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P48678 and previous config saved to /var/cache/conftool/dbconfig/20230601-101730-ladsgroup.json
10:17 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:17 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
10:16 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
10:14 aborrero@cumin2002: START - Cookbook sre.dns.netbox
10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T336886)', diff saved to https://phabricator.wikimedia.org/P48677 and previous config saved to /var/cache/conftool/dbconfig/20230601-100224-ladsgroup.json
10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T336886)', diff saved to https://phabricator.wikimedia.org/P48676 and previous config saved to /var/cache/conftool/dbconfig/20230601-100011-ladsgroup.json
10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
09:56 moritzm: installing systemd security updates on bullseye
09:53 Amir1: ladsgroup@mwmaint1002:~$ foreachwikiindblist group2 extensions/AbuseFilter/maintenance/MigrateActorsAF.php (T336224)
09:52 gehel: cleaning apt archives on an-test-worker1002: `sudo apt-get clean`, recovering 14G
09:49 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
09:43 cmooney@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2004-dev']
09:36 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2004-dev']
09:36 cmooney@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol2004-dev']
09:35 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2004-dev']
09:32 volans: installed spicerack v7.2.0 on cumin2002
09:30 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
09:21 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet
09:18 godog: remove lv prometheus-global - T288196
09:17 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet
09:17 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet
09:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
09:16 volans@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
09:13 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet
09:12 volans: installed spicerack v7.2.0 on cumin1001
09:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1008.eqiad.wmnet
09:07 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1008.eqiad.wmnet
09:06 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1007.eqiad.wmnet
09:02 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1007.eqiad.wmnet
09:01 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1006.eqiad.wmnet
08:57 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1006.eqiad.wmnet
08:56 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev - aborrero@cumin1001"
08:53 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev - aborrero@cumin1001"
08:49 aborrero@cumin1001: START - Cookbook sre.dns.netbox
08:48 apergos: UTC morning backport and config training window done
08:30 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
08:29 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
08:28 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
08:28 daniel@deploy1002: Finished scap: Backport for ORES: add model versions configuration and thresholds (T319170) (duration: 10m 12s)
08:28 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
08:19 daniel@deploy1002: daniel and isaranto: Backport for ORES: add model versions configuration and thresholds (T319170) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
08:18 daniel@deploy1002: Started scap: Backport for ORES: add model versions configuration and thresholds (T319170)
07:55 daniel@deploy1002: Finished scap: Backport for Enable parser cache warming jobs for parsoid on frwiki (T329366) (duration: 09m 09s)
07:48 daniel@deploy1002: daniel: Backport for Enable parser cache warming jobs for parsoid on frwiki (T329366) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
07:46 daniel@deploy1002: Started scap: Backport for Enable parser cache warming jobs for parsoid on frwiki (T329366)
07:42 mlitn@deploy1002: Finished scap: Backport for Add $wgInterwikiLogoOverride (T315269) (duration: 33m 02s)
07:35 moritzm: installing libssh security updates
07:29 mlitn@deploy1002: mlitn: Backport for Add $wgInterwikiLogoOverride (T315269) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
07:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
07:09 mlitn@deploy1002: Started scap: Backport for Add $wgInterwikiLogoOverride (T315269)
06:16 kart_: Updated MinT to 2023-06-01-041041-production (T336525)
06:01 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: applied
05:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
05:49 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
05:46 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
05:44 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
05:42 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
05:39 kart_: Updated cxserver to 2023-06-01-041016-production (T337669)
05:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
05:34 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:32 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
05:32 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
05:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
00:11 eileen: civicrm upgraded from 885208ca to 3819d6d1

Othe archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s