Server Admin Log/Archive 67

From Wikitech

2023-06-30

  • 22:34 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:20 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2013.*
  • 22:20 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2014.*
  • 22:20 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2015.*
  • 22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2016.*
  • 22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2017.*
  • 22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2018.*
  • 22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2019.*
  • 22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2020.*
  • 22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
  • 22:19 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2022.*
  • 22:09 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 47s)
  • 22:08 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 22:08 bking@deploy1002: deploy aborted: 0.3.124 (duration: 00m 00s)
  • 22:08 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 22:00 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
  • 22:00 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 21:58 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
  • 21:58 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 21:07 jhathaway: debugging a cert issue on pki1001.eqiad
  • 21:03 dzahn@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release
  • 21:00 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:59 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:59 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:57 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:29 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:29 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:55 mutante: please hold code changes and deploys if using gitlab - upgrade in progress
  • 19:53 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release
  • 19:26 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:25 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:25 dzahn@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 19:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:25 brennen@deploy1002: Finished scap: Backport for Fix bug in opening dialog (T340816) (duration: 08m 37s)
  • 18:20 mutante: upgrading gitlab on gitlab-replica.wikimedia.org
  • 18:19 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
  • 18:18 brennen@deploy1002: brennen and jforrester: Backport for Fix bug in opening dialog (T340816) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 18:16 brennen@deploy1002: Started scap: Backport for Fix bug in opening dialog (T340816)
  • 18:06 dzahn@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 16:59 dzahn@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 16:27 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 16:27 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 16:26 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 16:25 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 16:25 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 16:25 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 16:09 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:50 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1149.eqiad.wmnet with OS bullseye
  • 15:35 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:35 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:21 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:43 jiji@cumin1001: conftool action : γετ; selector: service=kube-apiserver
  • 14:42 sbassett: Deployed updated mitigation for T337593
  • 14:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1149.eqiad.wmnet with OS bullseye
  • 14:14 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 13:23 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 13:23 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 12:39 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 12:30 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS bullseye
  • 12:22 jbond@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003']
  • 12:20 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubestagemaster2002.codfw.wmnet with OS bullseye
  • 12:17 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 12:17 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
  • 12:16 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 12:10 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:09 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:03 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 11:59 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1002.eqiad.wmnet with OS bullseye
  • 11:54 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
  • 11:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
  • 11:39 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1002.eqiad.wmnet with reason: host reimage
  • 11:38 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
  • 11:36 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1002.eqiad.wmnet with reason: host reimage
  • 11:31 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 11:28 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubestagemaster2002.codfw.wmnet with OS bullseye
  • 11:28 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
  • 11:28 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubestagemaster1002.eqiad.wmnet with OS bullseye
  • 11:23 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 11:23 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sretest1003']
  • 11:22 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 11:15 jayme: published image docker-registry.discovery.wmnet/envoy:1.18.3-2-s3 and docker-registry.discovery.wmnet/envoy-future:1.23.10-1-s1 - T300324
  • 11:14 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sretest1003']
  • 11:14 jayme: imported envoyproxy 1.23.10 to component/envoy-future in buster-wikimedia - T300324
  • 11:05 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 11:05 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
  • 11:05 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 11:05 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sretest1003']
  • 11:04 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 10:45 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
  • 10:24 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:22 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:20 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:15 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 10:15 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sretest1003']
  • 10:14 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 10:13 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['sretest1003']
  • 10:12 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 08:00 jayme: rolled back envoyproxy package in buster-wikimedia component/envoy-future to 1.18.3-1 - T300324
  • 07:52 jayme: removed docker-registry.discovery.wmnet/envoy-future:1.26.1-1 - T300324
  • 06:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on urldownloader[2001-2002].wikimedia.org with reason: pending decom
  • 06:41 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on urldownloader[2001-2002].wikimedia.org with reason: pending decom
  • 06:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on urldownloader[1001-1002].wikimedia.org with reason: Setup in progress
  • 06:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on urldownloader[1001-1002].wikimedia.org with reason: Setup in progress

2023-06-29

  • 21:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 21:25 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 21:22 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 21:18 samtar@deploy1002: Finished scap: Backport for IS: Phonos, reorder and enable for mediawikiwiki (T336763) (duration: 08m 26s)
  • 21:11 samtar@deploy1002: samtar: Backport for IS: Phonos, reorder and enable for mediawikiwiki (T336763) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:10 samtar@deploy1002: Started scap: Backport for IS: Phonos, reorder and enable for mediawikiwiki (T336763)
  • 20:13 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye
  • 20:01 mutante: contint* servers: restarted apache after deploying gerrit:932435
  • 19:50 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS bullseye
  • 19:48 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:30 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 19:30 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 19:29 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 19:29 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 19:29 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 19:28 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 19:17 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 19:16 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 19:10 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 19:10 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 18:37 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Restarting to upgraded JVM - eevans@cumin1001
  • 18:33 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore200[1-3]*: Restarting to upgraded JVM - eevans@cumin1001
  • 18:29 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye
  • 18:17 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Restarting to upgraded JVM - eevans@cumin1001
  • 18:16 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.15 refs T340243
  • 18:15 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore200[1-3]*: Restarting to upgraded JVM - eevans@cumin1001
  • 18:06 brennen: train 1.41.0-wmf.15 (T340243): no current blockers, logs calm, rolling to all wikis
  • 17:46 taavi@deploy1002: Finished scap: Backport for Revert "Add extends warning to reference dialog" (T247922 T340757) (duration: 11m 06s)
  • 17:38 taavi@deploy1002: matmarex and taavi: Backport for Revert "Add extends warning to reference dialog" (T247922 T340757) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 17:35 taavi@deploy1002: Started scap: Backport for Revert "Add extends warning to reference dialog" (T247922 T340757)
  • 17:10 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 17:09 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1002.eqiad.wmnet with OS bullseye
  • 17:07 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:06 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:06 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:05 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:05 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:04 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 16:55 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1002.eqiad.wmnet with reason: host reimage
  • 16:51 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1002.eqiad.wmnet with reason: host reimage
  • 16:50 jiji@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kubestagemaster2002.codfw.wmnet
  • 16:50 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2002.codfw.wmnet with OS bullseye
  • 16:41 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubestagemaster1002.eqiad.wmnet with OS bullseye
  • 16:38 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
  • 16:35 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
  • 16:22 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 16:21 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 16:18 mutante: releases1003 - re-enabling puppet after recent webserver debugging
  • 16:18 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubestagemaster2002.codfw.wmnet with OS bullseye
  • 16:17 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
  • 16:16 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
  • 16:16 jiji@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kubestagemaster2002.codfw.wmnet on all recursors
  • 16:16 jiji@cumin1001: START - Cookbook sre.dns.wipe-cache kubestagemaster2002.codfw.wmnet on all recursors
  • 16:16 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:16 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
  • 16:12 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp
  • 16:11 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp
  • 16:10 sukhe: systemctl restart bird.service on doh2002
  • 16:04 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 16:04 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 16:04 klausman@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 16:03 klausman@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 16:03 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
  • 15:59 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 15:59 jiji@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubestagemaster2002.codfw.wmnet
  • 15:49 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs and A:cp
  • 15:49 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs and A:cp
  • 15:49 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:48 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:47 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1486.eqiad.wmnet
  • 15:35 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1486.eqiad.wmnet
  • 15:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1485.eqiad.wmnet
  • 15:34 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1485.eqiad.wmnet
  • 15:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1484.eqiad.wmnet
  • 15:34 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1484.eqiad.wmnet
  • 15:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1483.eqiad.wmnet
  • 15:34 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1483.eqiad.wmnet
  • 15:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1482.eqiad.wmnet
  • 15:34 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1482.eqiad.wmnet
  • 15:31 claime: Pooled mw148[2-6].eqiad.wmnet as jobrunners - T329366
  • 15:29 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw148[2-6].eqiad.wmnet,cluster=jobrunner
  • 15:27 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1001"
  • 15:25 cgoubert@cumin1001: conftool action : set/pooled=no; selector: name=mw148[2-6].eqiad.wmnet
  • 15:25 cgoubert@cumin1001: conftool action : set/weight=10; selector: name=mw148[2-6].eqiad.wmnet
  • 15:23 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1484.eqiad.wmnet with OS buster
  • 15:21 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1485.eqiad.wmnet with OS buster
  • 15:19 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1483.eqiad.wmnet with OS buster
  • 15:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1482.eqiad.wmnet with OS buster
  • 15:16 moritzm: installing Java 8 security updates on sessionstore/codfw
  • 15:06 Daimona: Creating new DB tables for the CampaignEvents extension in x1.testwiki, x1.test2wiki, x1.officewiki, and x1.wikishared # T340000
  • 14:54 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on mw1486.eqiad.wmnet with reason: host reimage
  • 14:53 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1484.eqiad.wmnet with reason: host reimage
  • 14:51 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1485.eqiad.wmnet with reason: host reimage
  • 14:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1483.eqiad.wmnet with reason: host reimage
  • 14:47 jayme: published image docker-registry.discovery.wmnet/envoy-future:1.26.1-1 - T300324
  • 14:46 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1482.eqiad.wmnet with reason: host reimage
  • 14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1485.eqiad.wmnet with reason: host reimage
  • 14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1486.eqiad.wmnet with reason: host reimage
  • 14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1484.eqiad.wmnet with reason: host reimage
  • 14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1483.eqiad.wmnet with reason: host reimage
  • 14:44 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1482.eqiad.wmnet with reason: host reimage
  • 14:31 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host mw1484.eqiad.wmnet with OS buster
  • 14:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host mw1486.eqiad.wmnet with OS buster
  • 14:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host mw1485.eqiad.wmnet with OS buster
  • 14:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host mw1484.eqiad.wmnet with OS buster
  • 14:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host mw1484.eqiad.wmnet with OS buster
  • 14:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host mw1483.eqiad.wmnet with OS buster
  • 14:31 cgoubert@cumin1001: START - Cookbook sre.hosts.reimage for host mw1482.eqiad.wmnet with OS buster
  • 14:30 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:21 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw148[2-6].eqiad.wmnet
  • 14:21 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: pick up Java 8 sec updates - jmm@cumin2002
  • 14:20 claime: Depooling mw148[2-6].eqiad.wmnet from api_appserver to move them to jobrunners - T329366
  • 14:19 jiji@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host kubestagemaster2002.codfw.wmnet
  • 14:19 jiji@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kubestagemaster2002.codfw.wmnet on all recursors
  • 14:19 jiji@cumin1001: START - Cookbook sre.dns.wipe-cache kubestagemaster2002.codfw.wmnet on all recursors
  • 14:19 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:19 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
  • 14:18 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
  • 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet
  • 14:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
  • 14:13 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:11 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 14:10 jiji@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kubestagemaster2002.codfw.wmnet on all recursors
  • 14:10 jiji@cumin1001: START - Cookbook sre.dns.wipe-cache kubestagemaster2002.codfw.wmnet on all recursors
  • 14:10 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:10 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
  • 14:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
  • 14:10 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster2002.codfw.wmnet - jiji@cumin1001"
  • 14:07 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 14:07 jiji@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubestagemaster2002.codfw.wmnet
  • 14:04 jayme: imported envoyproxy 1.26.1 to component/envoy-future in buster-wikimedia - T300324
  • 14:04 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 14:03 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 14:02 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 14:02 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 14:02 taavi: UTC afternoon backports done
  • 14:01 taavi@deploy1002: Finished scap: Backport for Fix trying to get a PageRecord for a non-existent page (T340568), Revert "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace" (duration: 12m 01s)
  • 14:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:00 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet
  • 13:51 taavi@deploy1002: taavi and reedy: Backport for Fix trying to get a PageRecord for a non-existent page (T340568), Revert "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:49 taavi@deploy1002: Started scap: Backport for Fix trying to get a PageRecord for a non-existent page (T340568), Revert "Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace"
  • 13:44 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:44 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:44 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:43 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 13:40 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:40 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:40 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 13:39 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:38 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:38 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:38 moritzm: installing bind9 security updates (tools/libs only)
  • 13:36 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:35 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:35 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:35 daniel@deploy1002: Finished scap: Backport for Disable PC writes for parsoid endpoints (T339867) (duration: 07m 07s)
  • 13:32 moritzm: failover ganeti master in codfw to ganeti2020
  • 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
  • 13:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
  • 13:29 daniel@deploy1002: daniel: Backport for Disable PC writes for parsoid endpoints (T339867) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:28 daniel@deploy1002: Started scap: Backport for Disable PC writes for parsoid endpoints (T339867)
  • 13:27 taavi@deploy1002: Finished scap: Backport for Only send 1 suggestion per section (duration: 07m 08s)
  • 13:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
  • 13:22 taavi@deploy1002: mlitn and taavi: Backport for Only send 1 suggestion per section synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:20 taavi@deploy1002: Started scap: Backport for Only send 1 suggestion per section
  • 13:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
  • 13:14 taavi@deploy1002: Finished scap: Backport for Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace (T340697) (duration: 09m 05s)
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
  • 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
  • 13:10 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:10 derick@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 13:07 derick@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 13:07 taavi@deploy1002: taavi and func: Backport for Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace (T340697) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:07 derick@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 13:06 derick@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 13:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
  • 13:05 taavi@deploy1002: Started scap: Backport for Remove $wgNamespacesWithSubpages overrides on the MediaWiki namespace (T340697)
  • 13:05 derick@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 13:04 derick@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 13:03 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
  • 13:00 derick@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 13:00 derick@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 12:58 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: name=parse1002.eqiad.wmnet
  • 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
  • 12:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
  • 12:56 akosiaris@cumin1001: conftool action : set/pooled=no; selector: name=parse1002.eqiad.wmnet
  • 12:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1002.eqiad.wmnet with OS buster
  • 12:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1001"
  • 12:53 btullis@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 12:50 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: pick up Java 8 sec updates - jmm@cumin2002
  • 12:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
  • 12:48 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:46 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:46 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
  • 12:43 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 12:42 btullis@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 12:42 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
  • 12:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
  • 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: pick up Java 8 sec updates - jmm@cumin2002
  • 12:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
  • 12:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2013.codfw.wmnet
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2013.codfw.wmnet
  • 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
  • 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2013.codfw.wmnet
  • 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
  • 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
  • 11:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
  • 11:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2014.codfw.wmnet
  • 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2015.codfw.wmnet
  • 11:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2015.codfw.wmnet
  • 11:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
  • 11:31 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 11:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2015.codfw.wmnet
  • 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
  • 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
  • 11:21 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudservices2005-dev.wikimedia.org
  • 11:21 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudservices2005-dev.wikimedia.org
  • 11:20 jiji@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host kubestagemaster1002.eqiad.wmnet
  • 11:20 jiji@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubestagemaster1002.eqiad.wmnet with OS bullseye
  • 11:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
  • 11:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
  • 11:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet
  • 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
  • 11:10 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 11:09 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 11:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 11:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 11:06 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: pick up Java 8 sec updates - jmm@cumin2002
  • 11:02 akosiaris@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1001"
  • 11:02 moritzm: installing Java 8 security updates
  • 11:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
  • 10:59 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:59 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:57 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:57 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:52 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2017.codfw.wmnet
  • 10:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2018.codfw.wmnet
  • 10:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
  • 10:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
  • 10:40 claime: vrt-wiki.wikimedia.org now hosted on mw-on-k8s - T340549
  • 10:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:37 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp
  • 10:35 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1002.eqiad.wmnet with reason: host reimage
  • 10:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2018.codfw.wmnet
  • 10:34 claime: Running puppet on cp-text trafficservers - T340549
  • 10:32 claime: Redirect vrt-wiki.wikimedia.org to mw-on-k8s - T340549
  • 10:32 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1002.eqiad.wmnet with reason: host reimage
  • 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2019.codfw.wmnet
  • 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
  • 10:25 claime: office.wikimedia.org now hosted on mw-on-k8s - T337490
  • 10:25 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubestagemaster1002.eqiad.wmnet with OS bullseye
  • 10:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
  • 10:25 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster1002.eqiad.wmnet - jiji@cumin1001"
  • 10:24 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kubestagemaster1002.eqiad.wmnet - jiji@cumin1001"
  • 10:24 jiji@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kubestagemaster1002.eqiad.wmnet on all recursors
  • 10:23 jiji@cumin1001: START - Cookbook sre.dns.wipe-cache kubestagemaster1002.eqiad.wmnet on all recursors
  • 10:23 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:23 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster1002.eqiad.wmnet - jiji@cumin1001"
  • 10:23 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kubestagemaster1002.eqiad.wmnet - jiji@cumin1001"
  • 10:21 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 10:21 jiji@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubestagemaster1002.eqiad.wmnet
  • 10:20 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:20 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Delete records created by accident - jiji@cumin1001"
  • 10:19 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Delete records created by accident - jiji@cumin1001"
  • 10:19 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host parse1002.eqiad.wmnet with OS buster
  • 10:18 claime: Running puppet on cp-text trafficservers - T337490
  • 10:18 claime: Redirect office.wikimedia.org to mw-on-k8s - T337490
  • 10:17 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp
  • 10:17 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 10:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2019.codfw.wmnet
  • 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2020.codfw.wmnet
  • 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
  • 10:10 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:09 jbond: puppetserver1001 added back to puppet-merge
  • 10:09 claime: www.mediawiki.org now hosted on mw-on-k8s - T337490
  • 10:08 jiji@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host kubestagemaster1002.eqiad.wmnet
  • 10:08 jiji@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
  • 10:06 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 10:06 jiji@cumin1001: START - Cookbook sre.ganeti.makevm for new host kubestagemaster1002.eqiad.wmnet
  • 10:03 claime: Running puppet on cp-text trafficservers - T337490
  • 10:02 claime: Redirect www.mediawiki.org to mw-on-k8s - T337490
  • 10:00 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad and A:cp
  • 09:59 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad and A:cp
  • 09:58 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:58 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for wikikube-staging masters - jiji@cumin1001"
  • 09:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:57 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for wikikube-staging masters - jiji@cumin1001"
  • 09:57 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:53 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 09:53 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2020.codfw.wmnet
  • 09:46 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams and A:cp
  • 09:43 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams and A:cp
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2022.codfw.wmnet
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
  • 09:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
  • 09:36 moritzm: restarting FPM on mw canaries to pick up libx11 updates
  • 09:30 moritzm: installing libx11 security updates
  • 09:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2022.codfw.wmnet
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
  • 09:22 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams and A:cp
  • 09:21 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams and A:cp
  • 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 09:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2025.codfw.wmnet
  • 08:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 08:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 08:58 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 08:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2025.codfw.wmnet
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2026.codfw.wmnet
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2026.codfw.wmnet
  • 08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2026.codfw.wmnet
  • 08:25 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2026.codfw.wmnet
  • 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2027.codfw.wmnet
  • 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
  • 08:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
  • 08:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2027.codfw.wmnet
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
  • 08:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 08:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 08:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
  • 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
  • 07:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
  • 07:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 07:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
  • 07:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2029.codfw.wmnet
  • 07:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2029.codfw.wmnet
  • 07:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2029.codfw.wmnet
  • 06:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
  • 06:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2030.codfw.wmnet
  • 06:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
  • 06:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
  • 06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
  • 01:33 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 01:32 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply

2023-06-28

  • 22:51 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 22:50 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 21:14 eileen_: civicrm upgraded from 0a59d203 to 9e04c92d
  • 20:10 brennen@deploy1002: Finished scap: Backport for Revert "Deprecate use of targets" (duration: 07m 23s)
  • 20:05 brennen@deploy1002: jdlrobson and brennen: Backport for Revert "Deprecate use of targets" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:03 brennen@deploy1002: Started scap: Backport for Revert "Deprecate use of targets"
  • 19:46 brennen: train 1.41.0-wmf.15 (T340243): deploying a revert for T127268 related deprecation logspam - this is likely to impinge on upcoming backport window, which currently has no patches. will update when finished.
  • 19:13 mutante: contint1002,2002,2001 - sudo chmod -R g-w /etc/zuul/wikimedia with deploying gerrit:927980 for T338277
  • 19:03 mutante: contint* - temp disabled puppet - deploying gerrit:927980 - related to git cloning zuul config on CI servers
  • 18:20 brennen@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.15 refs T340243 (duration: 06m 18s)
  • 18:14 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.15 refs T340243
  • 18:02 brennen: train 1.41.0-wmf.15 )
  • 18:02 brennen: train 1.41.0-wmf.15 (T340243): no current blockers, rolling to group1.
  • 18:01 brennen: train 1.41.0-wmf.15 (
  • 17:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-test-coord1002.eqiad.wmnet
  • 17:15 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:15 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-test-coord1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
  • 17:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1003.eqiad.wmnet with OS bullseye
  • 17:06 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-test-coord1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
  • 17:03 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 16:57 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-test-coord1002.eqiad.wmnet
  • 16:48 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
  • 16:46 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
  • 16:45 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 16:34 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:33 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:33 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:33 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:33 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1003.eqiad.wmnet with OS bullseye
  • 16:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1002.eqiad.wmnet with OS bullseye
  • 16:23 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:23 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:22 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:22 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:22 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:20 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:19 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:18 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:17 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:11 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:10 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
  • 16:10 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:10 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:07 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
  • 16:01 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:54 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1002.eqiad.wmnet with OS bullseye
  • 15:54 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:53 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:53 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:53 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2031.codfw.wmnet
  • 15:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
  • 15:50 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1001.eqiad.wmnet with OS bullseye
  • 15:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
  • 15:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
  • 15:40 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2031.codfw.wmnet
  • 15:40 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
  • 15:37 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 15:37 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 15:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
  • 15:34 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti2032.codfw.wmnet
  • 15:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 15:32 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 15:32 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
  • 15:31 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 15:31 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 15:29 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1001.eqiad.wmnet with reason: host reimage
  • 15:29 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 15:28 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 15:24 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 15:23 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 15:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
  • 15:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
  • 15:16 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1001.eqiad.wmnet with OS bullseye
  • 15:08 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:08 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:06 akosiaris: Disable Vodafone DE BGP peering on cr2-esams to troubleshoot reports of users from Germany
  • 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3001.esams.wmnet
  • 14:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3001.esams.wmnet
  • 14:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3001.esams.wmnet
  • 14:19 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:19 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:08 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 14:08 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 14:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 14:07 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 14:06 btullis@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 14:04 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:04 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:04 reedy@deploy1002: Synchronized wmf-config/: Various changes (duration: 06m 27s)
  • 14:04 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:04 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:57 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 13:57 reedy@deploy1002: Synchronized private: I62beb6 (duration: 06m 22s)
  • 13:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3001.esams.wmnet
  • 13:54 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:54 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:50 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 13:49 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 13:42 hashar@deploy1002: Finished deploy [gerrit/gerrit@1ae182f]: Fix wm-custom-links to show links in footer again - T340372 (duration: 00m 08s)
  • 13:42 hashar@deploy1002: Started deploy [gerrit/gerrit@1ae182f]: Fix wm-custom-links to show links in footer again - T340372
  • 13:39 moritzm: failover ganeti master in esams to ganeti3003
  • 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3002.esams.wmnet
  • 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3002.esams.wmnet
  • 13:38 sukhe: sudo cumin 'A:dns-auth' 'enable-puppet "merging CR 926509"'
  • 13:37 jbond: remove puppetserver from puppet-merge
  • 13:36 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 13:36 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 13:36 reedy@deploy1002: Finished scap: Backport for Revert "Add <link rel="me"> to verify Mastodon account on mediawiki.org" (duration: 08m 51s)
  • 13:35 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 13:35 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 13:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3002.esams.wmnet
  • 13:30 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:28 reedy@deploy1002: legoktm and reedy: Backport for Revert "Add <link rel="me"> to verify Mastodon account on mediawiki.org" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:28 sukhe: sudo cumin 'A:dns-auth' 'disable-puppet "merging CR 926509"'
  • 13:28 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:27 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:27 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:27 reedy@deploy1002: Started scap: Backport for Revert "Add <link rel="me"> to verify Mastodon account on mediawiki.org"
  • 13:26 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:25 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:25 reedy@deploy1002: Finished scap: Backport for Set $wgWBRepoSettings['defaultEntityNamespaces'] to false (T291617) (duration: 09m 19s)
  • 13:23 btullis@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 13:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3002.esams.wmnet
  • 13:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3003.esams.wmnet
  • 13:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3003.esams.wmnet
  • 13:20 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:19 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:17 reedy@deploy1002: reedy and lucaswerkmeister-wmde: Backport for Set $wgWBRepoSettings['defaultEntityNamespaces'] to false (T291617) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:16 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 13:16 reedy@deploy1002: Started scap: Backport for Set $wgWBRepoSettings['defaultEntityNamespaces'] to false (T291617)
  • 13:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3003.esams.wmnet
  • 13:15 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 13:14 reedy@deploy1002: Finished scap: Backport for eowikisource: Add project namespace alias (T340609) (duration: 08m 18s)
  • 13:12 jbond: add puppetserver to puppet-merge
  • 13:09 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw and not P{cp2027*} and A:cp
  • 13:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3003.esams.wmnet
  • 13:07 reedy@deploy1002: reedy and anzx: Backport for eowikisource: Add project namespace alias (T340609) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:05 reedy@deploy1002: Started scap: Backport for eowikisource: Add project namespace alias (T340609)
  • 13:05 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
  • 13:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 13:05 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-test-worker1003.eqiad.wmnet']
  • 13:04 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:04 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 12:53 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw and not P{cp2027*} and A:cp
  • 12:46 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2027*} and A:cp
  • 12:44 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2027*} and A:cp
  • 12:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
  • 12:29 moritzm: failover ganeti master in eqsin to ganeti5007
  • 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
  • 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
  • 12:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
  • 11:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
  • 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
  • 11:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
  • 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
  • 11:33 volans@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 11:33 volans@cumin2002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 11:33 volans@cumin2002: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling restart_daemons on A:netbox-canary
  • 11:33 volans@cumin2002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 11:18 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 11:08 claime: Reverting migration to rsync::quickdatacopy for deployment servers - T289857
  • 11:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
  • 11:04 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Roll restart to pick up Java 11 - elukey@cumin1001
  • 11:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 11:02 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:58 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:57 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:57 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:57 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:55 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:52 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
  • 10:51 claime: Migrating to rsync::quickdatacopy for deployment servers - T289857
  • 10:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:50 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:50 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:47 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Roll restart to pick up Java 11 - elukey@cumin1001
  • 10:47 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Roll restart to pick up Java 11 - elukey@cumin1001
  • 10:44 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:42 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:42 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
  • 10:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
  • 10:41 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:38 fabfur@cumin1001: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-text_codfw
  • 10:35 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 10:34 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
  • 10:31 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:29 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Roll restart to pick up Java 11 - elukey@cumin1001
  • 10:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
  • 10:21 hnowlan: disabling puppet on A:cp-text for testing 933508
  • 10:20 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 10:11 vgutierrez: repool cp4037
  • 10:05 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:02 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:01 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:57 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 09:57 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:55 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 8 hosts with reason: Decommissioning
  • 09:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 8 hosts with reason: Decommissioning
  • 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
  • 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
  • 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
  • 09:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
  • 09:09 vgutierrez: depool cp4037 for some ATS tests
  • 09:08 moritzm: failover ganeti master in codfw to ganeti4008
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 08:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 08:40 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 08:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:24 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
  • 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
  • 08:15 marostegui: Failover m5-master to dbproxy1027 T337812
  • 08:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
  • 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
  • 08:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
  • 08:07 marostegui: Failover m2-master to dbproxy1025 T337812
  • 08:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
  • 08:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
  • 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:08 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 07:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 07:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 06:42 marostegui: Failover m1-master to dbproxy1024 T337812
  • 01:37 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui (duration: 02m 20s)
  • 01:35 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui
  • 01:24 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui (duration: 01m 58s)
  • 01:22 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui

2023-06-27

  • 23:58 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host xhgui2002.codfw.wmnet
  • 23:58 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host xhgui2002.codfw.wmnet with OS bookworm
  • 23:49 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host xhgui1002.eqiad.wmnet
  • 23:49 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host xhgui1002.eqiad.wmnet with OS bookworm
  • 23:43 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on xhgui2002.codfw.wmnet with reason: host reimage
  • 23:40 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on xhgui2002.codfw.wmnet with reason: host reimage
  • 23:34 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on xhgui1002.eqiad.wmnet with reason: host reimage
  • 23:31 denisse@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on xhgui1002.eqiad.wmnet with reason: host reimage
  • 23:23 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host xhgui2002.codfw.wmnet with OS bookworm
  • 23:23 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM xhgui2002.codfw.wmnet - denisse@cumin1001"
  • 23:22 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM xhgui2002.codfw.wmnet - denisse@cumin1001"
  • 23:22 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) xhgui2002.codfw.wmnet on all recursors
  • 23:22 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache xhgui2002.codfw.wmnet on all recursors
  • 23:22 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:22 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM xhgui2002.codfw.wmnet - denisse@cumin1001"
  • 23:21 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM xhgui2002.codfw.wmnet - denisse@cumin1001"
  • 23:20 denisse@cumin1001: START - Cookbook sre.hosts.reimage for host xhgui1002.eqiad.wmnet with OS bookworm
  • 23:20 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM xhgui1002.eqiad.wmnet - denisse@cumin1001"
  • 23:19 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM xhgui1002.eqiad.wmnet - denisse@cumin1001"
  • 23:19 denisse@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) xhgui1002.eqiad.wmnet on all recursors
  • 23:19 denisse@cumin1001: START - Cookbook sre.dns.wipe-cache xhgui1002.eqiad.wmnet on all recursors
  • 23:19 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:19 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM xhgui1002.eqiad.wmnet - denisse@cumin1001"
  • 23:18 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 23:18 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM xhgui1002.eqiad.wmnet - denisse@cumin1001"
  • 23:18 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host xhgui2002.codfw.wmnet
  • 23:16 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 23:16 denisse@cumin1001: START - Cookbook sre.ganeti.makevm for new host xhgui1002.eqiad.wmnet
  • 22:43 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui (duration: 01m 27s)
  • 22:42 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui
  • 21:50 mutante: prometheus4002 - sudo a2dismod access_compat ; sudo systemctl restart apach2 ; sudo apachectl configtest -> Syntax OK :) - to proof it works without the access_compat module T258686
  • 21:45 mutante: prometheus* - puppet and partially manaul restart of apaches after deploying gerrit:932443
  • 20:50 TheresNoTime: close UTC late backport window
  • 20:48 samtar@deploy1002: Finished scap: Backport for Title: Fix exists() assertion in toPageRecord() (T340568) (duration: 06m 52s)
  • 20:43 samtar@deploy1002: matmarex and samtar: Backport for Title: Fix exists() assertion in toPageRecord() (T340568) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:41 samtar@deploy1002: Started scap: Backport for Title: Fix exists() assertion in toPageRecord() (T340568)
  • 20:20 samtar@deploy1002: Finished scap: Backport for Remove most DiscussionTools feature configs (T322497), Remove references to auth-api.php (T204193) (duration: 06m 53s)
  • 20:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:16 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:15 samtar@deploy1002: reedy and esanders and samtar: Backport for Remove most DiscussionTools feature configs (T322497), Remove references to auth-api.php (T204193) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:13 samtar@deploy1002: Started scap: Backport for Remove most DiscussionTools feature configs (T322497), Remove references to auth-api.php (T204193)
  • 20:13 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 20:10 samtar@deploy1002: Finished scap: Backport for Remove unused config $wgVisualEditorAllowLossySwitching (T339871), Remove wgDiscussionToolsEnable config (T322497) (duration: 07m 35s)
  • 20:04 samtar@deploy1002: esanders and samtar and matmarex: Backport for Remove unused config $wgVisualEditorAllowLossySwitching (T339871), Remove wgDiscussionToolsEnable config (T322497) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:03 samtar@deploy1002: Started scap: Backport for Remove unused config $wgVisualEditorAllowLossySwitching (T339871), Remove wgDiscussionToolsEnable config (T322497)
  • 20:00 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 19:59 brennen@deploy1002: Finished deploy [phabricator/deployment@a25a737]: deploy latest state to phab1004 (duration: 00m 38s)
  • 19:59 brennen@deploy1002: Started deploy [phabricator/deployment@a25a737]: deploy latest state to phab1004
  • 19:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: patch application
  • 19:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: patch application
  • 19:56 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: patch application
  • 19:56 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: patch application
  • 19:55 kindrobot@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:54 kindrobot@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:53 kindrobot@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:51 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 19:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 19:33 brennen@deploy1002: Finished scap: Backport for Drop redundant targets (T340499) (duration: 07m 51s)
  • 19:27 brennen@deploy1002: brennen: Backport for Drop redundant targets (T340499) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 19:25 brennen@deploy1002: Started scap: Backport for Drop redundant targets (T340499)
  • 19:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 19:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 18:47 sukhe: upgrade dns6001 to gdnsd 3.99.0~alpha2
  • 18:41 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.15 refs T340243
  • 18:40 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 18:31 brennen@deploy1002: Finished scap: Backport for Display the language button on pages without languages (T315036) (duration: 08m 53s)
  • 18:29 jhathaway: puppet re-enabled, enjoy!
  • 18:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 18:27 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 18:26 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 18:25 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 18:25 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 18:24 brennen@deploy1002: abi and brennen: Backport for Display the language button on pages without languages (T315036) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 18:22 brennen@deploy1002: Started scap: Backport for Display the language button on pages without languages (T315036)
  • 18:18 jhathaway: disabling puppet to test stdlib upgrade patch
  • 17:45 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:45 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:45 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:45 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:44 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:44 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:22 brennen@deploy1002: Pruned MediaWiki: 1.41.0-wmf.12 (duration: 02m 05s)
  • 17:20 brennen@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.15 refs T340243 (duration: 42m 56s)
  • 16:49 mutante: webperf1003/2003 restarted apache after deploying gerrit:932441
  • 16:37 brennen@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.15 refs T340243
  • 16:36 brennen: train 1.41.0-wmf.15: re-running scap stage-train (T340243)
  • 16:03 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:51 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:36 jbond: puppet-merge fixed again
  • 15:35 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 15:34 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 15:34 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 15:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 15:33 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 15:33 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 15:32 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 15:32 root@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 15:24 root@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 15:24 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 15:24 jbond: puppet-merge temprrarily broken
  • 15:23 jbond: hi all fyi i have temporarily broken puppet-merge, fix is being done
  • 15:23 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 15:23 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 15:21 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 15:20 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 15:01 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:53 mforns@deploy1002: Finished deploy [airflow-dags/analytics@5e77b01]: (no justification provided) (duration: 00m 10s)
  • 14:52 mforns@deploy1002: Started deploy [airflow-dags/analytics@5e77b01]: (no justification provided)
  • 14:47 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 14:46 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 14:41 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Roll restart to pick up new certs and openjdk version - elukey@cumin1001
  • 14:27 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 14:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 14:23 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Roll restart to pick up new certs and openjdk version - elukey@cumin1001
  • 14:21 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Roll restart to pick up new certs and openjdk version - elukey@cumin1001
  • 14:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 14:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 14:04 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Roll restart to pick up new certs and openjdk version - elukey@cumin1001
  • 13:32 elukey: expand ml-staging200[12] kubelet partitions - T339231
  • 13:27 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:26 joal@deploy1002: Finished deploy [airflow-dags/analytics@9eca77f]: Regular analytics weekly train [airflow-dags/analytics@9eca77f7] (duration: 00m 09s)
  • 13:26 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:26 joal@deploy1002: Started deploy [airflow-dags/analytics@9eca77f]: Regular analytics weekly train [airflow-dags/analytics@9eca77f7]
  • 13:18 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:06 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:57 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 12:57 marostegui: Failover m3-master to dbproxy1026 T337812
  • 11:55 daniel@deploy1002: Finished scap: Backport for Parsoid: Disable PC writes on enwiki (T339867) (duration: 12m 06s)
  • 11:51 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 11:50 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 11:44 daniel@deploy1002: daniel: Backport for Parsoid: Disable PC writes on enwiki (T339867) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 11:43 daniel@deploy1002: Started scap: Backport for Parsoid: Disable PC writes on enwiki (T339867)
  • 11:21 daniel@deploy1002: Finished scap: Backport for Parsoid: Disable PC writes on dewiki (T339867) (duration: 08m 34s)
  • 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 11:14 daniel@deploy1002: daniel: Backport for Parsoid: Disable PC writes on dewiki (T339867) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 11:12 daniel@deploy1002: Started scap: Backport for Parsoid: Disable PC writes on dewiki (T339867)
  • 11:08 joal@deploy1002: Finished deploy [analytics/refinery@259c5e2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@259c5e2] (duration: 01m 43s)
  • 11:06 joal@deploy1002: Started deploy [analytics/refinery@259c5e2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@259c5e2]
  • 11:06 joal@deploy1002: Finished deploy [analytics/refinery@259c5e2] (thin): Regular analytics weekly train THIN [analytics/refinery@259c5e2] (duration: 00m 04s)
  • 11:06 joal@deploy1002: Started deploy [analytics/refinery@259c5e2] (thin): Regular analytics weekly train THIN [analytics/refinery@259c5e2]
  • 11:04 joal@deploy1002: Finished deploy [analytics/refinery@259c5e2]: Regular analytics weekly train [analytics/refinery@259c5e2] (duration: 08m 23s)
  • 11:02 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 10:55 joal@deploy1002: Started deploy [analytics/refinery@259c5e2]: Regular analytics weekly train [analytics/refinery@259c5e2]
  • 10:48 elukey@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Roll restart to pick up new certs and openjdk version - elukey@cumin1001
  • 10:43 hnowlan: disabling puppet on A:cp-text to test rollout of r/929674
  • 10:41 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 10:33 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 10:32 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 10:30 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Roll restart to pick up new certs and openjdk version - elukey@cumin1001
  • 10:30 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 10:29 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 10:26 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 10:25 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 10:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:10 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:06 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:06 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:06 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:05 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:04 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:03 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:03 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:01 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-worker1002.eqiad.wmnet with OS bullseye
  • 10:01 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:01 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:56 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:56 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:56 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:54 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:41 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
  • 09:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: host reimage
  • 09:36 akosiaris@deploy1002: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 07m 16s)
  • 09:35 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin and not P{cp5032*} and A:cp
  • 09:34 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1002.eqiad.wmnet with reason: host reimage
  • 09:27 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@1ddd94b] (releasing): (no justification provided) (duration: 00m 51s)
  • 09:26 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@1ddd94b] (releasing): (no justification provided)
  • 09:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1002.eqiad.wmnet with OS bullseye
  • 09:20 moritzm: installing libvirt bugfix updates from Bullseye point release
  • 09:12 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
  • 09:12 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin and not P{cp5032*} and A:cp
  • 09:11 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:11 kart_: Updated MinT to 2023-06-27-053706-production (T339896, T340236)
  • 09:10 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 09:10 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 09:09 vgutierrez: repool cp1082
  • 09:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 09:09 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 09:07 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 09:02 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 09:00 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 09:00 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 08:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes200[0-9].codfw.wmnet
  • 08:58 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes100[0-9].eqiad.wmnet
  • 08:58 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes100[0-9].eqiad.wmnet
  • 08:58 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes200[0-9].codfw.wmnet
  • 08:53 akosiaris@deploy1002: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 07m 21s)
  • 08:52 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 08:47 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 08:45 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 08:42 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 08:42 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 08:41 fabfur@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 08:41 kart_: Updated cxserver to 2023-06-27-053435-production (T339105)
  • 08:38 elukey: revoked puppet cert for 'varnishkafka' and cleaned up its cergen's files in puppet private - T337825
  • 08:33 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 19 hosts
  • 08:33 root@cumin2002: START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 19 hosts
  • 08:32 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 08:32 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 767 hosts
  • 08:32 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 08:32 root@cumin2002: START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 767 hosts
  • 08:31 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Neil P. Quinn-WMF out of all services on: 1265 hosts
  • 08:30 root@cumin2002: START - Cookbook sre.idm.logout Logging Neil P. Quinn-WMF out of all services on: 1265 hosts
  • 08:29 marostegui: Failover m2-master to dbproxy1022 T337812
  • 08:28 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 08:28 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 08:25 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 08:24 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 08:14 kartik@deploy1002: Finished scap: Backport for Enable Content and Section Translation for 4 Wikipedias (T338123) (duration: 16m 17s)
  • 08:03 moritzm: installing openjdk-8 security updates for bullseye
  • 08:02 kartik@deploy1002: kartik: Backport for Enable Content and Section Translation for 4 Wikipedias (T338123) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 07:58 kartik@deploy1002: Started scap: Backport for Enable Content and Section Translation for 4 Wikipedias (T338123)
  • 07:54 moritzm: uploaded openjdk-8 8u372-ga-1~deb11u1 to component/jdk8 for bullseye (forward port of Java 8 for Buster)
  • 07:48 hashar: Restart Zuul due to stuck connection | T340518 | T309376
  • 07:15 elukey: `sudo kill `pgrep -u paramd`` on stat1005 to unblock puppet
  • 06:22 marostegui: Failover m1-master to dbproxy1022 T337812

2023-06-26

  • 23:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery
  • 23:21 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1092.eqiad.wmnet with reason: Replacing RAID controller battery
  • 23:07 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 23:02 sbassett: Deployed updated mitigation for T336027
  • 23:01 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 22:55 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 22:51 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 22:46 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 22:33 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 22:31 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 22:24 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 22:18 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 22:17 ryankemper@cumin1001: END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)
  • 22:17 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 22:17 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 22:16 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 22:05 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 21:58 eevans@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance
  • 21:57 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 21:55 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 21:54 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 21:53 eevans@cumin2002: START - Cookbook sre.discovery.service-route pool sessionstore in codfw: maintenance
  • 21:53 urandom: pooling sessionstore/codfw for bullseye upgrades — T340043
  • 21:45 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 21:44 eevans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2003.codfw.wmnet with OS bullseye
  • 21:43 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 21:39 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 21:36 ryankemper@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 21:26 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 21:22 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 21:22 eevans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
  • 21:21 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 21:18 eevans@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
  • 21:15 ryankemper@cumin1001: START - Cookbook sre.wdqs.restart
  • 21:13 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2022.*
  • 21:13 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
  • 21:13 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:02 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2003.codfw.wmnet with OS bullseye
  • 20:55 eevans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2003.codfw.wmnet with OS bullseye
  • 20:45 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2003.codfw.wmnet with OS bullseye
  • 20:42 eevans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2001.codfw.wmnet with OS bullseye
  • 20:34 brennen@deploy1002: Finished deploy [phabricator/deployment@0529926]: deploy latest state to phab1004 (duration: 00m 31s)
  • 20:33 brennen@deploy1002: Started deploy [phabricator/deployment@0529926]: deploy latest state to phab1004
  • 20:30 brennen@deploy1002: Finished deploy [phabricator/deployment@a25a737]: deploy latest state to phab1004 (duration: 00m 34s)
  • 20:30 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab2002.codfw.wmnet with reason: patch application
  • 20:30 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on phab2002.codfw.wmnet with reason: patch application
  • 20:30 brennen@deploy1002: Started deploy [phabricator/deployment@a25a737]: deploy latest state to phab1004
  • 20:29 brennen@deploy1002: Finished deploy [phabricator/deployment@a25a737]: deploy latest state to phab2002 (duration: 00m 38s)
  • 20:29 brennen@deploy1002: Started deploy [phabricator/deployment@a25a737]: deploy latest state to phab2002
  • 20:28 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab1004.eqiad.wmnet with reason: patch application
  • 20:28 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on phab1004.eqiad.wmnet with reason: patch application
  • 20:27 brennen: deploying minor phabricator updates shortly
  • 20:27 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab1004.eqiad.wmnet with reason: first setup
  • 20:27 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on phab1004.eqiad.wmnet with reason: first setup
  • 20:18 eevans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
  • 20:16 eevans@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
  • 20:00 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
  • 19:49 akosiaris: force puppet run on cp hosts T340483
  • 19:48 akosiaris: revert "Redirect www.mediawiki.org to mw-on-k8s", debugging T340483
  • 19:24 eevans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2002.codfw.wmnet with OS bullseye
  • 19:02 eevans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
  • 18:57 eevans@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
  • 18:42 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2002.codfw.wmnet with OS bullseye
  • 18:38 eevans@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in codfw: maintenance
  • 18:33 eevans@cumin2002: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: maintenance
  • 18:33 urandom: depooling sessionstore/codfw for bullseye upgrades — T340043
  • 18:07 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 18:07 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 18:06 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 18:05 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 18:05 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 18:05 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 18:04 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:04 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 18:03 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:03 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@32b4b99]: update dags to use discolytics 0.15.0 (duration: 00m 17s)
  • 18:03 ebernhardson@deploy1002: Started deploy [airflow-dags/search@32b4b99]: update dags to use discolytics 0.15.0
  • 18:02 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 17:53 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:53 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 16:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:21 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:21 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:19 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:18 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:52 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:45 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:41 moritzm: installing Java 8 security updates on stat* hosts
  • 15:28 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:27 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:26 sukhe: upgrade dns5003 to gdnsd 3.99.0~alpha2
  • 15:26 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:25 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:11 sukhe: re-enable puppet on P{C:bird::anycast_healthchecker} and finish rolling out CR 922514
  • 15:01 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 15:01 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 15:00 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 15:00 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 14:55 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 14:55 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 14:54 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:53 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 14:53 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:53 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 14:51 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:51 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 14:47 hashar@deploy1002: Finished deploy [gerrit/gerrit@7db3f9b]: Fix up attribution name in wm-app-theme.js plugin (duration: 00m 08s)
  • 14:46 hashar@deploy1002: Started deploy [gerrit/gerrit@7db3f9b]: Fix up attribution name in wm-app-theme.js plugin
  • 14:40 sukhe: rolling out CR 922514 to A:durum: T336792
  • 14:40 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:40 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 14:37 sukhe: rolling out CR 922514 to A:dns-auth: T336792
  • 14:32 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 14:32 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:31 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 14:30 sukhe: rolling out CR 922514 to A:wikidough (-s1 -b30): T336792
  • 14:30 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 14:28 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:28 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 14:23 sukhe: restart pdns-rec.service on doh6001 to test systemd binding to anycast-hc
  • 14:19 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:17 sukhe: sudo cumin 'P{C:bird::anycast_healthchecker}' 'disable-puppet "merging CR 922514"'
  • 14:16 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 14:06 elukey: move varnishkafka instances in esams to pki
  • 13:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:50 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 13:48 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 13:46 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 13:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 13:45 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 13:40 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 13:39 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 13:29 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 13:29 sukhe: sudo cumin 'A:dns-auth' 'enable-puppet "merging CR 932248"'
  • 13:26 daniel@deploy1002: Finished scap: Backport for Parsoid: Disable PC writes on frwiki (T339867) (duration: 10m 20s)
  • 13:25 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 13:22 sukhe: sudo cumin 'A:dns-auth' 'disable-puppet "merging CR 932248"'
  • 13:18 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@b3751e6]: (no justification provided) (duration: 00m 09s)
  • 13:18 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@b3751e6]: (no justification provided)
  • 13:17 daniel@deploy1002: daniel: Backport for Parsoid: Disable PC writes on frwiki (T339867) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:17 tchin@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 13:15 daniel@deploy1002: Started scap: Backport for Parsoid: Disable PC writes on frwiki (T339867)
  • 13:05 claime: parse1012 pooled inactive for flapping investigation
  • 13:03 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=parse1012.eqiad.wmnet
  • 11:59 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudservices2005-dev
  • 11:59 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices2005-dev
  • 11:00 moritzm: installing libfastjson security updates
  • 10:33 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudservices2005-dev - aborrero@cumin2002"
  • 10:32 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudservices2005-dev - aborrero@cumin2002"
  • 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts krb2001.codfw.wmnet
  • 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: krb2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:25 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices2005-dev.codfw.wmnet with OS bullseye
  • 10:25 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
  • 10:25 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: krb2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:24 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
  • 10:21 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:19 claime: mw-on-k8s: Redirect www.mediawiki.org to mw-on-k8s - T337490
  • 10:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts krb2001.codfw.wmnet
  • 10:01 claime: mw-on-k8s: Redirect closed wikis to mw-on-k8s - T337490
  • 09:40 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices2005-dev.codfw.wmnet with reason: host reimage
  • 09:37 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices2005-dev.codfw.wmnet with reason: host reimage
  • 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
  • 09:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
  • 09:18 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices2005-dev.codfw.wmnet with OS bullseye
  • 09:17 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudservices2005-dev.codfw.wmnet with OS bullseye
  • 09:17 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudservices2005-dev
  • 09:17 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudservices2005-dev
  • 09:11 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices2005-dev.codfw.wmnet with OS bullseye
  • 09:10 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudservices2005-dev.codfw.wmnet on all recursors
  • 09:10 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudservices2005-dev.codfw.wmnet on all recursors
  • 09:10 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudservices2005-dev.mgmt.codfw.wmnet on all recursors
  • 09:10 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudservices2005-dev.mgmt.codfw.wmnet on all recursors
  • 09:09 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:09 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2005-dev - aborrero@cumin2002"
  • 09:08 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2005-dev - aborrero@cumin2002"
  • 09:06 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 08:19 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Paramita Das out of all services on: 19 hosts
  • 08:18 root@cumin2002: START - Cookbook sre.idm.logout Logging Paramita Das out of all services on: 19 hosts
  • 08:18 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Paramita Das out of all services on: 771 hosts
  • 08:17 root@cumin2002: START - Cookbook sre.idm.logout Logging Paramita Das out of all services on: 771 hosts
  • 08:15 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Paramita Das out of all services on: 1261 hosts
  • 08:14 root@cumin2002: START - Cookbook sre.idm.logout Logging Paramita Das out of all services on: 1261 hosts
  • 08:07 aborrero@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 08:07 aborrero@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 08:07 aborrero@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 08:06 aborrero@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 07:48 taavi@deploy1002: Finished scap: Backport for extwiki: Add an alias for old NS_PROJECT name (duration: 08m 49s)
  • 07:41 taavi@deploy1002: taavi: Backport for extwiki: Add an alias for old NS_PROJECT name synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 07:39 taavi@deploy1002: Started scap: Backport for extwiki: Add an alias for old NS_PROJECT name
  • 07:37 taavi@deploy1002: Sync cancelled.
  • 07:36 taavi@deploy1002: taavi: Backport for extwiki: Update project namespace name (T337696) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:34 taavi@deploy1002: Started scap: Backport for extwiki: Update project namespace name (T337696)
  • 07:31 taavi@deploy1002: Sync cancelled.
  • 07:16 taavi@deploy1002: anzx and taavi: Backport for Change dewiki import sources (T340264), Rename namespace on extwiki (T337696) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:07 taavi@deploy1002: Started scap: Backport for Change dewiki import sources (T340264), Rename namespace on extwiki (T337696)
  • 06:28 kart_: Updated cxserver to 2023-06-26-050753-production (T340236, T339896)
  • 06:27 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:26 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1118 from dbctl T326683', diff saved to https://phabricator.wikimedia.org/P49477 and previous config saved to /var/cache/conftool/dbconfig/20230626-062036-marostegui.json
  • 06:15 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:14 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:11 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:10 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-06-25

  • 01:45 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui
  • 01:35 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui (duration: 04m 05s)
  • 01:31 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: installing (but not registering) magnum-ui
  • 01:30 andrew@deploy1002: deploy aborted: asdf (duration: 00m 01s)
  • 01:30 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: asdf

2023-06-23

  • 16:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 16:23 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 16:02 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:51 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:27 urbanecm@deploy1002: Finished scap: Backport for Section images: Placeholder should serialize to empty string (T340170) (duration: 06m 56s)
  • 14:26 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance
  • 14:21 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in codfw: maintenance
  • 14:20 urbanecm@deploy1002: Started scap: Backport for Section images: Placeholder should serialize to empty string (T340170)
  • 14:12 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: HW issues
  • 14:12 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: HW issues
  • 13:35 Emperor: update private wiki container ACLs in eqiad-swift
  • 13:30 Emperor: update private wiki container ACLs in codfw-swift
  • 13:29 godog: add 200G to prometheus/k8s in eqiad
  • 12:40 elukey: move varnishkafka drmrs instances to pki
  • 12:10 Emperor: updating ACLs on wikipedia-office containers T340189 T338765
  • 11:24 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:12 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:02 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:27 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1110.eqiad.wmnet
  • 10:20 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1110.eqiad.wmnet
  • 10:12 moritzm: installing vim security updates
  • 09:26 moritzm: uploaded openjdk-8 8u372-ga-1~deb10u1 to component/jdk8 (forward port of Java 8 for Buster)
  • 09:20 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-worker1110.eqiad.wmnet
  • 08:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-cache1001.eqiad.wmnet with reason: Working on pki
  • 08:48 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-cache1001.eqiad.wmnet with reason: Working on pki
  • 08:37 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1110.eqiad.wmnet
  • 05:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14860
  • 05:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 14860
  • 04:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118', diff saved to https://phabricator.wikimedia.org/P49472 and previous config saved to /var/cache/conftool/dbconfig/20230623-045758-root.json
  • 01:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 01:15 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer

2023-06-22

  • 21:00 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:41 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host phab-test1001.eqiad.wmnet
  • 19:41 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host phab-test1001.eqiad.wmnet with OS buster
  • 19:29 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab-test1001.eqiad.wmnet with reason: host reimage
  • 19:26 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on phab-test1001.eqiad.wmnet with reason: host reimage
  • 19:25 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:14 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host phab-test1001.eqiad.wmnet with OS buster
  • 19:13 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
  • 19:12 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
  • 19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1001.eqiad.wmnet on all recursors
  • 19:11 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1001.eqiad.wmnet on all recursors
  • 19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:11 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
  • 19:11 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
  • 19:09 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 17:32 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=97) Rolling upgrade/restart of Apache Traffic Server on P{cp1082*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:c
  • 17:32 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp1082*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
  • 17:04 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 17:03 brett@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-wikimedia-dns (exit_code=0) rolling restart_daemons on P{doh6001*} and A:wikidough
  • 17:03 brett@cumin2002: START - Cookbook sre.dns.roll-restart-wikimedia-dns rolling restart_daemons on P{doh6001*} and A:wikidough
  • 16:54 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 16:27 eevans@cumin2002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
  • 16:26 eevans@cumin2002: START - Cookbook sre.puppet.renew-cert for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
  • 16:24 eevans@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
  • 16:24 eevans@cumin2002: START - Cookbook sre.puppet.renew-cert for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
  • 16:23 eevans@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
  • 16:22 eevans@cumin2002: START - Cookbook sre.puppet.renew-cert for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
  • 16:21 eevans@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
  • 16:21 eevans@cumin2002: START - Cookbook sre.puppet.renew-cert for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
  • 16:17 eevans@cumin2002: END (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
  • 16:17 eevans@cumin2002: START - Cookbook sre.puppet.renew-cert for sessionstore2001.codfw.wmnet: Renew puppet certificate - eevans@cumin2002
  • 16:07 eevans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
  • 16:00 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
  • 15:58 eevans@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 15:52 eevans@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 15:52 eevans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
  • 15:50 sukhe: running authdns-update to repool codfw
  • 15:48 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:48 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:46 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
  • 15:38 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:34 eevans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
  • 15:32 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:29 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
  • 15:22 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:01 eevans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
  • 14:53 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
  • 14:50 sukhe: upgrade dns3001 to gdnsd 3.99.0~alpha2
  • 14:47 eevans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
  • 14:37 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
  • 14:32 stevemunene@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 14:20 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:12 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
  • 14:11 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cloudservices2004-dev.wikimedia.org
  • 14:10 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cloudservices2004-dev.wikimedia.org
  • 14:07 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 14:07 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 14:05 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:03 stevemunene@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 14:01 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
  • 14:00 volans@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sessionstore2001.codfw.wmnet with OS bullseye
  • 14:00 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:00 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for GrowthExperiments: Deploy section-level images structured task (T339126) (duration: 12m 49s)
  • 13:54 volans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
  • 13:48 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and tgr: Backport for GrowthExperiments: Deploy section-level images structured task (T339126) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:47 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for GrowthExperiments: Deploy section-level images structured task (T339126)
  • 13:17 elukey: move varnishafka instances in eqiad to PKI
  • 13:16 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 13:15 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 13:15 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 13:14 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 13:14 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 13:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 13:11 samtar@deploy1002: Finished scap: Backport for IS: Enable Phonos on 'small' projects, set PhonosInlineAudioPlayerMode (T336763) (duration: 09m 26s)
  • 13:03 samtar@deploy1002: samtar: Backport for IS: Enable Phonos on 'small' projects, set PhonosInlineAudioPlayerMode (T336763) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:02 samtar@deploy1002: Started scap: Backport for IS: Enable Phonos on 'small' projects, set PhonosInlineAudioPlayerMode (T336763)
  • 12:32 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:32 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:28 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:28 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:28 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:27 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:26 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:26 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:25 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:25 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:25 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:17 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:06 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:06 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:06 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:06 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:05 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:04 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:04 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:04 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:03 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 12:03 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 11:57 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 11:57 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 11:45 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 11:45 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 11:44 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 11:44 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 11:41 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 11:41 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 11:37 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
  • 11:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 11:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 11:33 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 11:33 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 11:32 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 11:32 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 11:32 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['sessionstore2001']
  • 11:32 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001']
  • 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of testvm2002.codfw.wmnet to plain
  • 11:23 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of testvm2002.codfw.wmnet to plain
  • 10:42 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:33 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 10:33 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 10:33 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 10:32 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 10:32 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:31 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:29 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:29 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:25 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 10:25 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 10:24 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 10:23 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 10:23 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 10:22 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:07 moritzm: installing Apache security updates on Bullseye
  • 09:51 ladsgroup@deploy1002: Finished scap: Backport for Fix adding a domain when the page doesn't exist (duration: 08m 05s)
  • 09:44 ladsgroup@deploy1002: ladsgroup: Backport for Fix adding a domain when the page doesn't exist synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 09:43 ladsgroup@deploy1002: Started scap: Backport for Fix adding a domain when the page doesn't exist
  • 09:40 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
  • 09:40 root@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 09:33 root@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 09:33 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
  • 09:29 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
  • 09:29 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
  • 09:27 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
  • 09:26 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
  • 09:12 vgutierrez: increasing maxconns to 2000 in haproxy for port 80 - T339898
  • 08:50 vgutierrez: tighten HAProxy timeouts on port 80 globally - T339898
  • 08:23 jbond@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "test 931926 - jbond@cumin2002"
  • 08:22 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test 931926 - jbond@cumin2002"
  • 07:43 moritzm: installing containerd security updates
  • 06:55 apergos: rsync in ariel screensession on dumpsdata1003 pulling from dumpsdata1004, bwlimit 100000 (=1G) of misc dumps files
  • 06:39 kart_: Updated cxserver to 2023-06-21-112200-production (T339896, T338123)
  • 06:38 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:38 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:36 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:35 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:35 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:35 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 for dbproxy102[47] - marostegui@cumin1001"
  • 06:34 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 for dbproxy102[47] - marostegui@cumin1001"
  • 06:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:32 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:31 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:29 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 for dbproxy1023 - marostegui@cumin1001"
  • 06:29 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 for dbproxy1023 - marostegui@cumin1001"
  • 06:27 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 05:57 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
  • 05:17 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:17 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 for dbproxy1022 - marostegui@cumin1001"
  • 05:16 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove ipv6 for dbproxy1022 - marostegui@cumin1001"
  • 05:14 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 03:17 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 03:16 rzl@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 03:07 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 03:05 rzl@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 02:52 rzl@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 02:51 rzl@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 02:37 rzl@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 02:35 rzl@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 00:46 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host phab-test1001.eqiad.wmnet
  • 00:46 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 00:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1025.eqiad.wmnet with OS bullseye
  • 00:40 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 00:40 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host phab-test1001.eqiad.wmnet
  • 00:33 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host phab-test1001.eqiad.wmnet
  • 00:33 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host phab-test1001.eqiad.wmnet with OS buster
  • 00:25 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
  • 00:22 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
  • 00:10 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1025.eqiad.wmnet with OS bullseye
  • 00:09 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 00:09 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:08 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 00:07 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host phab-test1001.eqiad.wmnet with OS buster
  • 00:02 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
  • 00:01 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
  • 00:01 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1001.eqiad.wmnet on all recursors
  • 00:01 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1001.eqiad.wmnet on all recursors
  • 00:01 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:01 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
  • 00:00 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"

2023-06-21

  • 23:58 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 23:58 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host phab-test1001.eqiad.wmnet
  • 23:56 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host phab-test1002.eqiad.wmnet
  • 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1002.eqiad.wmnet on all recursors
  • 23:56 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1002.eqiad.wmnet on all recursors
  • 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:56 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
  • 23:55 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
  • 23:53 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 23:53 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1002.eqiad.wmnet on all recursors
  • 23:53 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1002.eqiad.wmnet on all recursors
  • 23:53 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:53 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
  • 23:52 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
  • 23:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
  • 23:50 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 23:50 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host phab-test1002.eqiad.wmnet
  • 23:50 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host phab-test1002.eqiad.wmnet
  • 23:50 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1002.eqiad.wmnet on all recursors
  • 23:50 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1002.eqiad.wmnet on all recursors
  • 23:50 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:50 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
  • 23:49 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
  • 23:49 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
  • 23:47 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1002.eqiad.wmnet on all recursors
  • 23:47 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1002.eqiad.wmnet on all recursors
  • 23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:47 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
  • 23:46 tstarling@deploy1002: Synchronized multiversion: Fix some mwscript bugs and clean up the style (duration: 06m 31s)
  • 23:46 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1002.eqiad.wmnet - dzahn@cumin1001"
  • 23:42 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 23:42 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host phab-test1002.eqiad.wmnet
  • 23:37 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 23:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host phab-test1001.eqiad.wmnet
  • 23:35 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 23:34 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 23:32 urbanecm: Move a large translatable page on foundationwiki (T338217)
  • 23:32 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 23:32 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host phab-test1001.eqiad.wmnet
  • 23:30 urbanecm: Move a large translatable page (T339154)
  • 23:27 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host phab-test1001.eqiad.wmnet
  • 23:27 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host phab-test1001.eqiad.wmnet with OS buster
  • 23:27 urbanecm: Move large translatable page (`mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki 'Movement Strategy and Governance/Movement Charter Amb[776/776] Program/grant' 'Movement Charter/Ambassadors Program/Grant' 'Martin Urbanec' --reason='restructuring of the Movement Charter's Meta infrastructure (per request)'`; T338808)
  • 23:15 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 23:15 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 23:09 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host phab-test1001.eqiad.wmnet with OS buster
  • 23:09 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
  • 23:09 mutante: created temporary test VM phab-test1001.eqiad.wmnet which we need for a one-time test for T335080 - it will soon be destroyed again
  • 23:08 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
  • 23:08 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) phab-test1001.eqiad.wmnet on all recursors
  • 23:07 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache phab-test1001.eqiad.wmnet on all recursors
  • 23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
  • 23:07 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM phab-test1001.eqiad.wmnet - dzahn@cumin1001"
  • 23:02 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 23:02 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host phab-test1001.eqiad.wmnet
  • 23:01 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 23:00 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
  • 22:58 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
  • 22:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:54 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:54 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1023 - jclark@cumin1001"
  • 22:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1023 - jclark@cumin1001"
  • 22:52 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp1078.eqiad.wmnet,cp1080.eqiad.wmnet,cp1082.eqiad.wmnet,cp1084.eqiad.wmnet,cp1086.eqiad.wmnet,cp1088.eqiad.wmnet,cp1090.eqiad.wmnet} and A:cp
  • 22:51 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 22:48 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp1077.eqiad.wmnet,cp1079.eqiad.wmnet,cp1081.eqiad.wmnet,cp1083.eqiad.wmnet,cp1085.eqiad.wmnet,cp1087.eqiad.wmnet,cp1089.eqiad.wmnet} and A:cp
  • 22:39 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
  • 22:39 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
  • 22:39 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
  • 22:39 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
  • 22:38 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
  • 22:38 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
  • 22:38 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
  • 22:38 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
  • 22:38 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
  • 22:38 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
  • 22:36 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:36 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1023 - jclark@cumin1001"
  • 22:35 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt dbproxy1023 - jclark@cumin1001"
  • 22:33 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 22:33 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
  • 22:33 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
  • 22:33 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
  • 22:33 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
  • 22:26 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts gerrit1001.wikimedia.org
  • 22:26 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:26 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: gerrit1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 22:25 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: gerrit1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 22:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 22:19 eevans@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2001.codfw.wmnet with OS bullseye
  • 22:16 mutante: destroying previous production gerrit server gerrit1001 - T336427
  • 22:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts gerrit1001.wikimedia.org
  • 22:10 mutante: rsyncing data from cobalt.wikimedia.org (:p) from gerrit1001 to gerrit1003, /srv/gerrit/cobalt/
  • 21:30 eevans@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
  • 21:28 eevans@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 21:24 eevans@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 21:23 eevans@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 21:23 eevans@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 21:23 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:52 kostajh: UTC late deploys done
  • 20:52 kharlan@deploy1002: Finished scap: Backport for Section images: Select placeholder when inserting it (T335209) (duration: 10m 21s)
  • 20:45 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people2002.codfw.wmnet
  • 20:45 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:45 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: people2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 20:43 kharlan@deploy1002: kharlan: Backport for Section images: Select placeholder when inserting it (T335209) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:42 kharlan@deploy1002: Started scap: Backport for Section images: Select placeholder when inserting it (T335209)
  • 20:41 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: people2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 20:36 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 20:30 mutante: gerrit1001 (formerly gerrit prod) - creating tarball of entire /home/ in /home/ and copying it over to gerrit1003 - simultaneousy adding /home on gerrit servers to bacula from now on - T336427
  • 20:15 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people2002.codfw.wmnet
  • 20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts people1003.eqiad.wmnet
  • 20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:14 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: people1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 20:13 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: people1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 20:09 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 20:04 mutante: deleting VMs people1003.eqiad.wmnet and people2002.codfw.wmnet T338827
  • 20:03 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts people1003.eqiad.wmnet
  • 19:59 mutante: people.wikimedia.org - disabling shell access to people1003/people2002 (bullseye), use people1004/people2002 (bookworm) or people.eqiad.wmnet / people.codfw.wmnet in your configs if you have something automated or git repos - T338827
  • 19:28 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:28 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:24 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:24 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:22 ejegg: civicrm upgraded from 4a4b014a to 98b2b5de
  • 19:03 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:03 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:01 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 19:01 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 19:00 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 19:00 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 18:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1025.eqiad.wmnet with OS bullseye
  • 18:48 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 18:29 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp1078.eqiad.wmnet,cp1080.eqiad.wmnet,cp1082.eqiad.wmnet,cp1084.eqiad.wmnet,cp1086.eqiad.wmnet,cp1088.eqiad.wmnet,cp1090.eqiad.wmnet} and A:cp
  • 18:27 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp1077.eqiad.wmnet,cp1079.eqiad.wmnet,cp1081.eqiad.wmnet,cp1083.eqiad.wmnet,cp1085.eqiad.wmnet,cp1087.eqiad.wmnet,cp1089.eqiad.wmnet} and A:cp
  • 18:24 sukhe: sudo ipmitool -I lanplus -H "sessionstore2001.mgmt.codfw.wmnet" -U root -E mc reset cold
  • 18:14 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 18:13 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 18:12 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 18:12 mutante: miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/racktables T327405
  • 18:12 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 18:09 mutante: miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/annualreport T337041
  • 18:08 mutante: miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/bienvenida T337047
  • 18:06 mutante: miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/TransparencyReport T338781
  • 18:00 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 18:00 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:59 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:59 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:44 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:44 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:43 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:42 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:40 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:39 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:39 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 17:37 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 17:37 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:36 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:35 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 17:34 ejegg: civicrm upgraded from b11db56d to 4a4b014a
  • 17:34 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 17:30 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:30 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:29 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 17:28 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 17:27 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:27 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 17:25 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:24 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 17:23 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 17:23 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 17:23 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:22 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:22 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:22 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:21 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:21 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:21 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:21 sukhe@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:21 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 17:20 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:18 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 17:18 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 17:16 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:16 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 17:14 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 17:13 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 17:01 sukhe: sudo ipmitool -I lanplus -H "sessionstore2001.mgmt.codfw.wmnet" -U root -E chassis power reset
  • 16:47 eevans@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 16:47 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 16:45 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 16:45 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 16:43 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 16:42 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 16:42 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 16:42 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 16:40 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 16:39 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sretest1003.eqiad.wmnet
  • 16:39 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 16:39 eevans@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 16:39 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 16:39 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1003.eqiad.wmnet
  • 16:39 sukhe@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 16:38 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sessionstore2001.codfw.wmnet']
  • 16:38 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1003.eqiad.wmnet
  • 16:37 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1003.eqiad.wmnet
  • 16:15 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2001.codfw.wmnet with OS bullseye
  • 16:02 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sretest1003.eqiad.wmnet
  • 16:01 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sretest1003.eqiad.wmnet
  • 16:00 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:58 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts puppetmaster2001.codfw.wmnet
  • 15:46 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3053.*
  • 15:45 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:45 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:45 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp2028.*
  • 15:44 mforns@deploy1002: Finished deploy [airflow-dags/analytics@d9a9135]: (no justification provided) (duration: 00m 09s)
  • 15:44 mforns@deploy1002: Started deploy [airflow-dags/analytics@d9a9135]: (no justification provided)
  • 15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=ats-be
  • 15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1076.eqiad.wmnet,service=cdn
  • 15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=ats-be
  • 15:42 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp1075.eqiad.wmnet,service=cdn
  • 15:27 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS bullseye
  • 15:24 sukhe: run authdns-update to depool codfw
  • 15:24 sukhe: run authdns-update to depool cofw
  • 15:23 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-upload_eqiad
  • 15:23 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on A:cp-text_eqiad
  • 15:20 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqiad
  • 15:19 moritzm: installing php7.3 security updates
  • 15:18 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqiad
  • 15:09 moritzm: installing joblib security updates
  • 15:08 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in codfw: maintenance
  • 15:03 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: maintenance
  • 15:03 urandom: depooling sessionstore/codfw — T340043
  • 14:50 root@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 14:49 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 14:47 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 14:47 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 14:47 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 14:47 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 14:40 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 14:40 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 14:33 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 14:33 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 14:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1024.eqiad.wmnet with OS bullseye
  • 14:29 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 14:23 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 14:21 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:21 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudservices2005-dev.private.codfw.wikimedia.cloud on all recursors
  • 14:20 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudservices2005-dev.private.codfw.wikimedia.cloud on all recursors
  • 14:20 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:20 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2005-dev - aborrero@cumin2002"
  • 14:19 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2005-dev - aborrero@cumin2002"
  • 14:17 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 14:11 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:11 bking@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:07 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1024.eqiad.wmnet with reason: host reimage
  • 14:04 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
  • 14:02 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1024.eqiad.wmnet with reason: host reimage
  • 14:01 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
  • 13:50 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1024.eqiad.wmnet with OS bullseye
  • 13:49 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1025.eqiad.wmnet with OS bullseye
  • 13:47 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024
  • 13:46 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
  • 13:46 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:45 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1024 - robh@cumin1001"
  • 13:45 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1024 - robh@cumin1001"
  • 13:44 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1025
  • 13:43 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
  • 13:43 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 13:39 volans: installed spicerack 7.2.1 to the cumin/cloudcumin hosts
  • 13:36 root@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 13:36 root@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 13:31 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024
  • 13:30 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
  • 13:30 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
  • 13:30 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
  • 13:28 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:27 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:23 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
  • 13:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
  • 13:22 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1025.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:22 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:19 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1025
  • 13:18 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1025
  • 13:18 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1023
  • 13:18 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1023
  • 13:17 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1024
  • 13:16 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1024
  • 13:12 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 12:58 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:57 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 12:54 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:53 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 12:51 elukey: move varnishafka instances in codfw to PKI
  • 12:47 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices2005-dev
  • 12:47 aborrero@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:47 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2005-dev decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 12:42 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test change - jbond@cumin1001"
  • 12:41 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test change - jbond@cumin1001"
  • 12:41 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test change - jbond@cumin1001"
  • 12:39 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test change - jbond@cumin1001"
  • 12:30 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2005-dev decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 12:16 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 12:12 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test change - jbond@cumin1001"
  • 12:08 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 12:06 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test change - jbond@cumin1001"
  • 12:06 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "test change - jbond@cumin1001"
  • 12:03 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test change - jbond@cumin1001"
  • 12:01 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudservices2005-dev
  • 11:40 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
  • 11:40 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
  • 11:40 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
  • 11:40 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
  • 11:39 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 11:39 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 11:39 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 11:39 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 11:02 moritzm: installing python2.7 security updates
  • 10:58 vgutierrez: re-enable puppet in A:cp - T339898
  • 10:57 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) (duration: 00m 48s)
  • 10:57 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided)
  • 10:51 volans: uploaded spicerack_7.2.1 to apt.wikimedia.org bullseye-wikimedia
  • 10:37 dcausse@deploy1002: Finished deploy [airflow-dags/search@29d9615]: search: schedule cirrus_consistency_check (take 2) (duration: 00m 10s)
  • 10:37 dcausse@deploy1002: Started deploy [airflow-dags/search@29d9615]: search: schedule cirrus_consistency_check (take 2)
  • 10:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db[1124,1133].eqiad.wmnet with reason: Testing cloning
  • 10:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db[1124,1133].eqiad.wmnet with reason: Testing cloning
  • 09:59 dcausse@deploy1002: Finished deploy [airflow-dags/search@9c03845]: search: schedule cirrus_consistency_check (duration: 00m 18s)
  • 09:58 dcausse@deploy1002: Started deploy [airflow-dags/search@9c03845]: search: schedule cirrus_consistency_check
  • 09:38 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:21 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test sync - jbond@cumin1001"
  • 09:20 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) (duration: 01m 14s)
  • 09:20 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test sync - jbond@cumin1001"
  • 09:19 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided)
  • 09:18 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test sync - jbond@cumin1001"
  • 09:17 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test sync - jbond@cumin1001"
  • 09:17 jbond@cumin1001: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "test sync - jbond@cumin1001"
  • 09:16 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test sync - jbond@cumin1001"
  • 09:15 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "test sync - jbond@cumin1001"
  • 09:14 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "test sync - jbond@cumin1001"
  • 09:06 jbond: disable puppet on R:git::clone to deploy gerrit:927750
  • 08:36 vgutierrez: disable puppet on A:cp before merging Ie84c15
  • 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
  • 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
  • 07:23 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 07:22 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 07:21 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 07:21 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 07:13 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dumpsdata1003.eqiad.wmnet with OS bullseye
  • 06:44 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 06:44 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 06:40 hashar@deploy1002: Finished deploy [integration/docroot@51d2552]: Add TimedMediaHandler to docroot - T338458 (duration: 00m 11s)
  • 06:40 hashar@deploy1002: Started deploy [integration/docroot@51d2552]: Add TimedMediaHandler to docroot - T338458
  • 06:07 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dumpsdata1003.eqiad.wmnet with reason: host reimage
  • 06:04 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dumpsdata1003.eqiad.wmnet with reason: host reimage
  • 06:03 ariel@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1003.eqiad.wmnet with OS bullseye
  • 00:20 tzatziki: removing one file for legal compliance
  • 00:13 tzatziki: removing 2files for legal compliance
  • 00:11 tzatziki: removing one file for legal compliancee

2023-06-20

  • 23:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 22:47 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2021.codfw.wmnet with OS buster
  • 22:37 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 22:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 22:19 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 22:16 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 22:13 tgr_: UTC late backports done
  • 22:12 tgr@deploy1002: Finished scap: Backport for Section images: Fix ve.scrollIntoView override (T339900 T335209), Backport translations from master (T339225) (duration: 22m 30s)
  • 22:01 tgr@deploy1002: tgr and kharlan: Backport for Section images: Fix ve.scrollIntoView override (T339900 T335209), Backport translations from master (T339225) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:59 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_esams
  • 21:59 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS buster
  • 21:49 tgr@deploy1002: Started scap: Backport for Section images: Fix ve.scrollIntoView override (T339900 T335209), Backport translations from master (T339225)
  • 21:36 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: wgEventStreams - page_content_change should use eventgate-analytics-external for canary events - T336817 (duration: 07m 22s)
  • 21:26 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 21:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 21:25 sbassett: Deployed updated mitigation for T336027
  • 21:18 tgr@deploy1002: Finished scap: Backport for Remove unused data attribs on a/v sources (T199129) (duration: 18m 45s)
  • 21:01 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp3053.esams.wmnet,cp3055.esams.wmnet,cp3057.esams.wmnet,cp3059.esams.wmnet,cp3061.esams.wmnet,cp3063.esams.wmnet,cp3065.esams.wmnet} and A:cp
  • 21:01 tgr@deploy1002: jforrester and tgr: Backport for Remove unused data attribs on a/v sources (T199129) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:00 tgr@deploy1002: Started scap: Backport for Remove unused data attribs on a/v sources (T199129)
  • 20:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1026.eqiad.wmnet with OS bullseye
  • 20:47 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:46 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:46 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 20:46 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:44 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:42 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:42 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:31 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
  • 20:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 20:30 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 20:29 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
  • 20:26 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
  • 20:25 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
  • 20:16 samtar@deploy1002: Finished scap: Backport for Turn off Zebra test for multiple wikis (T337956) (duration: 13m 32s)
  • 20:13 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1026.eqiad.wmnet with OS bullseye
  • 20:13 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 20:09 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS buster
  • 20:03 samtar@deploy1002: ksarabia and samtar: Backport for Turn off Zebra test for multiple wikis (T337956) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:02 samtar@deploy1002: Started scap: Backport for Turn off Zebra test for multiple wikis (T337956)
  • 19:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 19:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts parse1002.eqiad.wmnet
  • 19:30 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 19:30 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 19:28 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 19:13 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:13 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS buster
  • 19:13 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 19:13 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:13 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 19:13 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1022.eqiad.wmnet with reason: host reimage
  • 19:13 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:09 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 19:08 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:07 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:06 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:05 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 19:05 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:04 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 18:54 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:54 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:50 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:47 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:47 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:37 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 18:37 robh@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 18:33 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:28 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:28 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:28 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:26 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:24 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:24 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:18 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:17 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 18:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics@d55173d]: (no justification provided) (duration: 00m 11s)
  • 18:12 mforns@deploy1002: Started deploy [airflow-dags/analytics@d55173d]: (no justification provided)
  • 18:03 joal@deploy1002: Finished deploy [analytics/refinery@181eac6] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@181eac6] (duration: 01m 52s)
  • 18:01 joal@deploy1002: Started deploy [analytics/refinery@181eac6] (hadoop-test): Hotfix analytics deploy TEST [analytics/refinery@181eac6]
  • 18:01 joal@deploy1002: Finished deploy [analytics/refinery@181eac6] (thin): Hotfix analytics deploy THIN [analytics/refinery@181eac6] (duration: 00m 04s)
  • 18:01 joal@deploy1002: Started deploy [analytics/refinery@181eac6] (thin): Hotfix analytics deploy THIN [analytics/refinery@181eac6]
  • 18:00 joal@deploy1002: Finished deploy [analytics/refinery@181eac6]: Hotfix analytics deploy [analytics/refinery@181eac6] (duration: 06m 22s)
  • 17:54 joal@deploy1002: Started deploy [analytics/refinery@181eac6]: Hotfix analytics deploy [analytics/refinery@181eac6]
  • 17:54 sukhe: running authdns-update for T339942
  • 17:44 ottomata: remove stream-enrichment-poc namespace and related resources from dse-k8s-eqiad - T325303
  • 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:44 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:13 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_esams
  • 16:59 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:55 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3053.esams.wmnet,cp3055.esams.wmnet,cp3057.esams.wmnet,cp3059.esams.wmnet,cp3061.esams.wmnet,cp3063.esams.wmnet,cp3065.esams.wmnet} and A:cp
  • 16:52 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 16:52 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:52 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 16:52 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:49 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on P{cp3053.esams.wmnet,cp3055.esams.wmnet,cp3057.esams.wmnet,cp3059.esams.wmnet,cp3061.esams.wmnet,cp3063.esams.wmnet,cp3065.esams.wmnet} and A:cp
  • 16:49 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3053.esams.wmnet,cp3055.esams.wmnet,cp3057.esams.wmnet,cp3059.esams.wmnet,cp3061.esams.wmnet,cp3063.esams.wmnet,cp3065.esams.wmnet} and A:cp
  • 16:44 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 16:44 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:28 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: wgEventStreams - remove unused rc stream names for page_change related streams - T336817 (duration: 07m 35s)
  • 16:21 sukhe: sudo cumin 'A:cp' 'enable-puppet "merging CR 931626"'
  • 16:17 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: wgEventBusStreamNamesMap - Remove page_change stream name override - T336817 (duration: 07m 42s)
  • 16:14 sukhe: sudo cumin 'A:cp' 'disable-puppet "merging CR 931626"'
  • 16:09 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
  • 16:09 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
  • 15:25 moritzm: installing unbound security updates
  • 15:14 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 15:13 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 15:13 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 15:13 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 14:55 bking@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:55 bking@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:42 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 14:36 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 14:36 arturo: homer run for CR eqiad/codfw to allow bacula traffic in from cloud-hosts (T338132, T339894)
  • 14:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:26 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:26 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:26 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:24 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:24 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host parse1002.eqiad.wmnet
  • 14:16 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes20[12][0-9].codfw.wmnet
  • 14:15 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=kubernetes10[12][0-9].eqiad.wmnet
  • 14:15 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes202[0-9].codfw.wmnet
  • 14:15 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes201[0-9].codfw.wmnet
  • 14:15 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes102[0-9].eqiad.wmnet
  • 14:15 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0-9].eqiad.wmnet
  • 14:14 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes1*.eqiad.wmnet
  • 14:11 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4044.ulsfo.wmnet,cp4051.ulsfo.wmnet} and A:cp
  • 14:07 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4044.ulsfo.wmnet,cp4051.ulsfo.wmnet} and A:cp
  • 14:06 vgutierrez: test HAProxy 2.6.14 on cp4044 and cp4051
  • 14:03 vgutierrez: fetch HAProxy 2.6.14 on thirdparty/haproxy26 for bullseye (apt.wm.o)
  • 13:22 vgutierrez: repooling cp3050 - T339898
  • 13:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 13:18 moritzm: installing python2.7 security updates
  • 13:15 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts otrs1001.eqiad.wmnet
  • 13:15 aokoth@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:15 aokoth@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: otrs1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aokoth@cumin1001"
  • 13:14 urbanecm@deploy1002: Finished scap: Backport for Enable Extension:Translate on pt.wikisource.org (T339139) (duration: 09m 11s)
  • 13:13 aokoth@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: otrs1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aokoth@cumin1001"
  • 13:10 aokoth@cumin1001: START - Cookbook sre.dns.netbox
  • 13:06 urbanecm@deploy1002: albertoleoncio and urbanecm: Backport for Enable Extension:Translate on pt.wikisource.org (T339139) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:05 urbanecm: Create ext:Translate tables on ptwikisource (T339139)
  • 13:04 urbanecm@deploy1002: Started scap: Backport for Enable Extension:Translate on pt.wikisource.org (T339139)
  • 13:04 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts otrs1001.eqiad.wmnet
  • 13:04 urbanecm: Start foreachwikiindblist 'group2 & s1' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all on a tmux in mwmaint1002 (T315510)
  • 12:58 jclark@cumin1001: START - Cookbook sre.hosts.reboot-single for host parse1002.eqiad.wmnet
  • 12:57 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts parse1002.eqiad.wmnet
  • 12:47 aokoth@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts otrs1001.eqiad.wmnet
  • 12:46 aokoth@cumin1001: START - Cookbook sre.hosts.decommission for hosts otrs1001.eqiad.wmnet
  • 12:37 vgutierrez: depooling cp3050 - T339898
  • 12:32 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 12:32 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 12:26 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 12:25 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 11:27 jnuche@deploy1002: deploy aborted: (no justification provided) (duration: 01m 32s)
  • 11:26 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided)
  • 11:15 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:14 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:13 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:13 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:10 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:10 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:57 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:30 ladsgroup@deploy1002: Finished scap: Backport for Stop setting wgLegacyEncdoing (T128150 T128151) (duration: 08m 06s)
  • 10:23 ladsgroup@deploy1002: ladsgroup: Backport for Stop setting wgLegacyEncdoing (T128150 T128151) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 10:22 ladsgroup@deploy1002: Started scap: Backport for Stop setting wgLegacyEncdoing (T128150 T128151)
  • 10:16 Lucas_WMDE: deployed patches for T339111
  • 09:35 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:23 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1003.eqiad.wmnet with OS bullseye
  • 09:20 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:02 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-eqiad
  • 08:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1119.eqiad.wmnet with OS bookworm
  • 08:37 ariel@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1003.eqiad.wmnet with OS bullseye
  • 08:37 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-eqiad
  • 08:06 ariel@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dumpsdata1003.eqiad.wmnet with OS bullseye
  • 07:40 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-codfw
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1119.eqiad.wmnet with reason: host reimage
  • 07:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1119.eqiad.wmnet with reason: host reimage
  • 07:20 ariel@cumin1001: START - Cookbook sre.hosts.reimage for host dumpsdata1003.eqiad.wmnet with OS bullseye
  • 07:18 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-codfw
  • 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1119.eqiad.wmnet with OS bookworm
  • 07:14 kartik@deploy1002: Finished scap: Backport for Enable Content and Section Translation for a 3rd group of 10 languages previously lacking MT (T337834) (duration: 10m 25s)
  • 07:07 moritzm: installing openssl securit updates on buster
  • 07:05 kartik@deploy1002: kartik: Backport for Enable Content and Section Translation for a 3rd group of 10 languages previously lacking MT (T337834) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:04 kartik@deploy1002: Started scap: Backport for Enable Content and Section Translation for a 3rd group of 10 languages previously lacking MT (T337834)
  • 06:34 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1119.eqiad.wmnet with OS bookworm
  • 06:29 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1119.eqiad.wmnet with OS bookworm
  • 05:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14860
  • 05:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 14860
  • 00:14 zabe: Deployed patch for T330968

2023-06-19

  • 16:41 ladsgroup@deploy1002: Finished scap: Backport for Revert "Temporarily bring back legacy encoding in four wikis" (duration: 15m 19s)
  • 16:27 ladsgroup@deploy1002: ladsgroup: Backport for Revert "Temporarily bring back legacy encoding in four wikis" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 16:26 ladsgroup@deploy1002: Started scap: Backport for Revert "Temporarily bring back legacy encoding in four wikis"
  • 16:22 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 16:16 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 16:09 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:50 elukey@cumin1001: END (ERROR) - Cookbook sre.cassandra.roll-restart (exit_code=97) for nodes matching A:ml-cache-codfw: Applying internode-encryption: all - elukey@cumin1001
  • 15:47 elukey@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Applying internode-encryption: all - elukey@cumin1001
  • 15:22 brett: Rolling reboot of codfw cache_text nodes to apply Linux update for CVE-2023-1872 - T335835
  • 15:07 moritzm: installing libxpm security updates
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
  • 14:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:47 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:47 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:46 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:46 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:45 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:45 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:39 ladsgroup@deploy1002: Finished scap: Backport for file: Make pre-gen rendering of multi-page files (pdf, ...) serial (T337649) (duration: 20m 07s)
  • 14:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:26 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:26 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:24 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:23 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:20 ladsgroup@deploy1002: ladsgroup: Backport for file: Make pre-gen rendering of multi-page files (pdf, ...) serial (T337649) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:19 ladsgroup@deploy1002: Started scap: Backport for file: Make pre-gen rendering of multi-page files (pdf, ...) serial (T337649)
  • 14:17 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:04 elukey: move varnishafka instances in eqsin to PKI
  • 13:44 kamila_: updated DNS: added discovery records for rest-gateway and device-analytics T335505
  • 13:14 moritzm: installing openjdk-17 security updates
  • 12:21 moritzm: uploaded wmfmariadbpy 0.10+deb12u1 T339835
  • 12:01 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:01 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:00 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:00 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:55 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:55 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:54 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:54 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:53 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:53 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:39 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1060.eqiad.wmnet
  • 11:39 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:39 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1060.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 11:38 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1060.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 11:36 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 11:28 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1060.eqiad.wmnet
  • 11:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 100%: Maint over (T338354)', diff saved to https://phabricator.wikimedia.org/P49449 and previous config saved to /var/cache/conftool/dbconfig/20230619-110207-ladsgroup.json
  • 10:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1059.eqiad.wmnet
  • 10:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:59 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1059.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 10:58 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1059.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 10:56 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1124.eqiad.wmnet with OS bookworm
  • 10:52 moritzm: imported megacli and ssacli to thirdparty/hwraid for bookworm-wikimedia T339847
  • 10:48 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1059.eqiad.wmnet
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 75%: Maint over (T338354)', diff saved to https://phabricator.wikimedia.org/P49448 and previous config saved to /var/cache/conftool/dbconfig/20230619-104702-ladsgroup.json
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: Maint over (T338354)', diff saved to https://phabricator.wikimedia.org/P49447 and previous config saved to /var/cache/conftool/dbconfig/20230619-103157-ladsgroup.json
  • 10:17 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:16 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 10%: Maint over (T338354)', diff saved to https://phabricator.wikimedia.org/P49446 and previous config saved to /var/cache/conftool/dbconfig/20230619-101653-ladsgroup.json
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1135 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P49445 and previous config saved to /var/cache/conftool/dbconfig/20230619-101623-ladsgroup.json
  • 10:15 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:15 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1124.eqiad.wmnet with reason: host reimage
  • 10:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1124.eqiad.wmnet with reason: host reimage
  • 10:00 claime: Switching test.wikipedia.org to mw-on-k8s - T337489
  • 09:51 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1124.eqiad.wmnet with OS bookworm
  • 09:43 ladsgroup@deploy1002: Finished scap: Backport for Enable new spam block page in all wikis except meta, commons, wikidata (T337431) (duration: 10m 45s)
  • 09:40 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1058.eqiad.wmnet
  • 09:40 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:40 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1058.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 09:34 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:34 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:34 ladsgroup@deploy1002: ladsgroup: Backport for Enable new spam block page in all wikis except meta, commons, wikidata (T337431) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 09:32 ladsgroup@deploy1002: Started scap: Backport for Enable new spam block page in all wikis except meta, commons, wikidata (T337431)
  • 09:30 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1058.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 09:30 ladsgroup@deploy1002: Finished scap: Backport for Blocked domains: Fix removing a domain via the special page (T337431) (duration: 08m 24s)
  • 09:27 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 09:22 ladsgroup@deploy1002: ladsgroup: Backport for Blocked domains: Fix removing a domain via the special page (T337431) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 09:21 ladsgroup@deploy1002: Started scap: Backport for Blocked domains: Fix removing a domain via the special page (T337431)
  • 09:21 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1058.eqiad.wmnet
  • 09:15 kart_: Updated MinT to 2023-06-16-042302-production, Updated people egress (T339271, T335491)
  • 09:12 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 09:12 ladsgroup@deploy1002: Finished scap: Backport for blocked domains: Make sure users can't bypass the list by using uppercase (T337431) (duration: 09m 53s)
  • 09:07 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 09:06 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 09:03 ladsgroup@deploy1002: ladsgroup: Backport for blocked domains: Make sure users can't bypass the list by using uppercase (T337431) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 09:02 ladsgroup@deploy1002: Started scap: Backport for blocked domains: Make sure users can't bypass the list by using uppercase (T337431)
  • 09:01 ladsgroup@deploy1002: Finished scap: Backport for Temporarily bring back legacy encoding in four wikis (T128150) (duration: 07m 31s)
  • 09:00 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 08:55 ladsgroup@deploy1002: ladsgroup: Backport for Temporarily bring back legacy encoding in four wikis (T128150) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 08:53 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 08:53 ladsgroup@deploy1002: Started scap: Backport for Temporarily bring back legacy encoding in four wikis (T128150)
  • 08:51 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 08:49 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1124.eqiad.wmnet with OS bookworm
  • 08:45 ladsgroup@deploy1002: Finished scap: Backport for moveToExternal: First decompress gziped entries before iconv (T128150) (duration: 08m 52s)
  • 08:38 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3050.esams.wmnet
  • 08:38 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3051.esams.wmnet
  • 08:37 ladsgroup@deploy1002: ladsgroup: Backport for moveToExternal: First decompress gziped entries before iconv (T128150) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 08:36 ladsgroup@deploy1002: Started scap: Backport for moveToExternal: First decompress gziped entries before iconv (T128150)
  • 08:30 fabfur: rebooting cp3051 and cp3051 for kernel upgrade (T335835)
  • 08:29 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3050.esams.wmnet
  • 08:29 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp3051.esams.wmnet
  • 08:20 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Chad out of all services on: 19 hosts
  • 08:20 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Chad out of all services on: 19 hosts
  • 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Chad out of all services on: 776 hosts
  • 08:19 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Chad out of all services on: 776 hosts
  • 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1124.eqiad.wmnet with OS bookworm
  • 08:03 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1124.eqiad.wmnet with OS bullseye
  • 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Chad out of all services on: 1259 hosts
  • 07:54 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Chad out of all services on: 1259 hosts
  • 07:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1124.eqiad.wmnet with reason: host reimage
  • 07:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1124.eqiad.wmnet with reason: host reimage
  • 07:40 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1124.eqiad.wmnet with OS bullseye
  • 07:39 marostegui@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1124.eqiad.wmnet with OS bookworm
  • 07:38 moritzm: uploaded wmfmariadbpy 0.10+deb12u1
  • 07:14 kartik@deploy1002: Finished scap: Backport for Use Parsoid for all Wikis for Content Translation (T339322) (duration: 11m 31s)
  • 07:11 marostegui@cumin1001: START - Cookbook sre.hosts.reimage for host db1124.eqiad.wmnet with OS bookworm
  • 07:04 kartik@deploy1002: kartik: Backport for Use Parsoid for all Wikis for Content Translation (T339322) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:03 kartik@deploy1002: Started scap: Backport for Use Parsoid for all Wikis for Content Translation (T339322)
  • 06:39 urbanecm@deploy1002: Finished scap: Backport for Add throttle rule (duration: 07m 10s)
  • 06:32 urbanecm@deploy1002: Started scap: Backport for Add throttle rule
  • 05:34 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:14 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 04:49 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 04:29 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply

2023-06-18

2023-06-16

  • 22:25 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS buster
  • 21:29 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS buster
  • 21:08 sbassett: Deployed updated security mitigation for T336027
  • 21:04 brett: Finished rolling reboot of codfw cache_upload nodes to apply Linux update for CVE-2023-1872 - T335835
  • 19:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1156.eqiad.wmnet with OS bullseye
  • 19:03 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
  • 19:03 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 18:51 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1156.eqiad.wmnet with OS bullseye
  • 17:53 wfan: civicrm upgraded from d61220cd to b11db56d
  • 16:14 brett: Rolling reboot of codfw cache_upload nodes to apply Linux update for CVE-2023-1872 - T335835
  • 16:09 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 15:59 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:28 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 15:13 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:13 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack IPv6 - aborrero@cumin1001"
  • 15:12 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack IPv6 - aborrero@cumin1001"
  • 15:09 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 14:59 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:59 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack IPv6 - aborrero@cumin1001"
  • 14:57 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack IPv6 - aborrero@cumin1001"
  • 14:54 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 14:40 aborrero@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 14:40 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 13:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: hw troubleshooting
  • 13:55 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse1002.eqiad.wmnet with reason: hw troubleshooting
  • 13:54 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:52 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 13:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 13:51 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 12:16 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts parse1002.eqiad.wmnet
  • 12:08 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts parse1002.eqiad.wmnet
  • 12:07 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 12:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 12:02 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 12:00 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 12:00 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:59 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:56 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:55 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:53 aborrero@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:53 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:53 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:50 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:47 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:46 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1002.eqiad.wmnet
  • 11:15 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 11:14 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 11:14 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1002.eqiad.wmnet
  • 10:38 Amir1: root@cumin1001:/home/ladsgroup/software2/dbtools# cat s1.dblist | grep -v "#" | while read db; do cat tables_to_check.txt | while read table index; do echo "$db.$table"; db-compare $db $table $index db1135.eqiad.wmnet:3306 db1118 db1139:3311 || break 2; done ; done (T338354)
  • 09:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:02 akosiaris@cumin1001: START - Cookbook sre.dns.netbox
  • 08:47 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 08:41 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 08:35 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 08:25 akosiaris@cumin1001: conftool action : set/pooled=inactive; selector: name=parse1002.eqiad.wmnet
  • 08:19 hashar@deploy1002: Finished scap: Backport for Revert "Structured tasks: Fix toolbar rewriting" (T339292 T338934) (duration: 21m 08s)
  • 08:00 hashar@deploy1002: hashar: Backport for Revert "Structured tasks: Fix toolbar rewriting" (T339292 T338934) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 07:58 hashar@deploy1002: Started scap: Backport for Revert "Structured tasks: Fix toolbar rewriting" (T339292 T338934)
  • 07:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief2001.codfw.wmnet
  • 07:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.remove-downtime for acmechief2001.codfw.wmnet
  • 01:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
  • 01:50 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
  • 01:36 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b8-codfw.mgmt.codfw.wmnet
  • 01:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:36 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b8-codfw - pt1979@cumin2002"
  • 01:35 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b8-codfw - pt1979@cumin2002"
  • 01:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 01:31 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b8-codfw.mgmt.codfw.wmnet
  • 01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b7-codfw.mgmt.codfw.wmnet
  • 01:28 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b7-codfw - pt1979@cumin2002"
  • 01:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b7-codfw - pt1979@cumin2002"
  • 01:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 01:24 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b7-codfw.mgmt.codfw.wmnet
  • 01:22 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b6-codfw.mgmt.codfw.wmnet
  • 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b6-codfw - pt1979@cumin2002"
  • 01:20 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b6-codfw - pt1979@cumin2002"
  • 01:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 01:17 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b6-codfw.mgmt.codfw.wmnet
  • 01:16 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b5-codfw.mgmt.codfw.wmnet
  • 01:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:15 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b5-codfw - pt1979@cumin2002"
  • 01:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b5-codfw - pt1979@cumin2002"
  • 01:10 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 01:10 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b5-codfw.mgmt.codfw.wmnet
  • 01:07 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b4-codfw.mgmt.codfw.wmnet
  • 01:06 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:06 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b4-codfw - pt1979@cumin2002"
  • 01:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b4-codfw - pt1979@cumin2002"
  • 01:01 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 01:01 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b4-codfw.mgmt.codfw.wmnet
  • 01:01 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b3-codfw.mgmt.codfw.wmnet
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b3-codfw - pt1979@cumin2002"
  • 00:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b3-codfw - pt1979@cumin2002"
  • 00:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1149.eqiad.wmnet with OS bullseye
  • 00:56 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:56 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b3-codfw.mgmt.codfw.wmnet
  • 00:47 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-b2-codfw.mgmt.codfw.wmnet
  • 00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b2-codfw - pt1979@cumin2002"
  • 00:46 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-b2-codfw - pt1979@cumin2002"
  • 00:42 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:42 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-b2-codfw.mgmt.codfw.wmnet
  • 00:38 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a7-codfw.mgmt.codfw.wmnet
  • 00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a7-codfw - pt1979@cumin2002"
  • 00:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a7-codfw - pt1979@cumin2002"
  • 00:25 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:25 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a7-codfw.mgmt.codfw.wmnet
  • 00:23 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2003.codfw.wmnet with OS bullseye
  • 00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a6-codfw.mgmt.codfw.wmnet
  • 00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:22 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a6-codfw - pt1979@cumin2002"
  • 00:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a6-codfw - pt1979@cumin2002"
  • 00:18 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:18 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a6-codfw.mgmt.codfw.wmnet
  • 00:17 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a5-codfw.mgmt.codfw.wmnet
  • 00:16 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:16 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a5-codfw - pt1979@cumin2002"
  • 00:15 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a5-codfw - pt1979@cumin2002"
  • 00:11 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:11 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a5-codfw.mgmt.codfw.wmnet
  • 00:10 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a4-codfw.mgmt.codfw.wmnet
  • 00:10 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:10 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a4-codfw - pt1979@cumin2002"
  • 00:09 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a4-codfw - pt1979@cumin2002"
  • 00:06 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:06 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a4-codfw.mgmt.codfw.wmnet
  • 00:05 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a3-codfw.mgmt.codfw.wmnet
  • 00:04 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a3-codfw - pt1979@cumin2002"
  • 00:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a3-codfw - pt1979@cumin2002"
  • 00:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
  • 00:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1149.eqiad.wmnet with OS bullseye
  • 00:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:00 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a3-codfw.mgmt.codfw.wmnet

2023-06-15

  • 23:58 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2003.codfw.wmnet with reason: host reimage
  • 23:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1149']
  • 23:56 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a2-codfw.mgmt.codfw.wmnet
  • 23:55 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
  • 23:55 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:55 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a2-codfw - pt1979@cumin2002"
  • 23:54 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a2-codfw - pt1979@cumin2002"
  • 23:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:51 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a2-codfw.mgmt.codfw.wmnet
  • 23:47 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 23:47 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
  • 23:46 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
  • 23:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:45 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 23:44 pt1979@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['an-worker1153']
  • 23:44 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1153']
  • 23:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1153']
  • 23:43 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1153']
  • 23:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
  • 23:42 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
  • 23:42 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2003.codfw.wmnet with OS bullseye
  • 23:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
  • 23:31 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
  • 23:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1154']
  • 23:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1154']
  • 23:26 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 23:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1154']
  • 23:23 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1154']
  • 23:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1151']
  • 23:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1151']
  • 23:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1152']
  • 23:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1152']
  • 23:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1153']
  • 23:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2002.codfw.wmnet with OS bullseye
  • 23:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1153']
  • 23:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1154']
  • 23:10 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1154']
  • 23:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1155']
  • 23:02 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1155']
  • 23:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1156']
  • 22:54 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1156']
  • 22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1156']
  • 22:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1156']
  • 22:49 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1156']
  • 22:38 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1156']
  • 22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1155']
  • 22:33 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
  • 22:30 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2002.codfw.wmnet with reason: host reimage
  • 22:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1155']
  • 22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1154']
  • 22:21 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
  • 22:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
  • 22:14 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2002.codfw.wmnet with OS bullseye
  • 22:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:14 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 22:14 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1154']
  • 22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1153']
  • 22:01 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1153']
  • 21:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cassandra-dev2001.codfw.wmnet with OS bullseye
  • 21:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1152']
  • 21:40 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1152']
  • 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1151']
  • 21:30 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1151']
  • 21:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 21:24 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cassandra-dev2001.codfw.wmnet with reason: host reimage
  • 21:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
  • 21:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
  • 21:19 jhancock@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
  • 21:19 jhancock@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
  • 21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
  • 21:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
  • 21:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1150']
  • 21:14 thcipriani: parse1002 having ssh connection problems during backport window
  • 21:13 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
  • 21:13 thcipriani@deploy1002: Finished scap: Backport for Revert "Targets: Use align:'after' instead of actionGroups" (T339292), HelpCompletionTool wasn't added to extension.json (T338254) (duration: 16m 09s)
  • 21:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1150']
  • 21:12 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
  • 21:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1150']
  • 21:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
  • 21:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1150']
  • 21:08 eevans@cumin1001: START - Cookbook sre.hosts.reimage for host cassandra-dev2001.codfw.wmnet with OS bullseye
  • 20:58 thcipriani@deploy1002: thcipriani and matmarex: Backport for Revert "Targets: Use align:'after' instead of actionGroups" (T339292), HelpCompletionTool wasn't added to extension.json (T338254) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:57 thcipriani@deploy1002: Started scap: Backport for Revert "Targets: Use align:'after' instead of actionGroups" (T339292), HelpCompletionTool wasn't added to extension.json (T338254)
  • 20:54 thcipriani@deploy1002: Finished scap: Backport for [uzwiki] Add the 'patroller' usergroup (T338826) (duration: 15m 27s)
  • 20:52 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1150']
  • 20:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1149']
  • 20:42 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
  • 20:40 thcipriani@deploy1002: superpes and thcipriani: Backport for [uzwiki] Add the 'patroller' usergroup (T338826) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:38 thcipriani@deploy1002: Started scap: Backport for [uzwiki] Add the 'patroller' usergroup (T338826)
  • 20:32 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1149']
  • 20:30 thcipriani@deploy1002: Finished scap: Backport for Remove GDI survey from RU and JA wikis. (T338926) (duration: 16m 30s)
  • 20:27 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
  • 20:15 thcipriani@deploy1002: essexigyan and thcipriani: Backport for Remove GDI survey from RU and JA wikis. (T338926) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:13 thcipriani@deploy1002: Started scap: Backport for Remove GDI survey from RU and JA wikis. (T338926)
  • 19:06 ladsgroup@deploy1002: Finished scap: Backport for Enable blocked domain list in testwiki and fawiki (T337431) (duration: 17m 40s)
  • 18:50 ladsgroup@deploy1002: ladsgroup: Backport for Enable blocked domain list in testwiki and fawiki (T337431) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 18:48 ladsgroup@deploy1002: Started scap: Backport for Enable blocked domain list in testwiki and fawiki (T337431)
  • 18:48 ryankemper: [WDQS] `ryankemper@wdqs2012:~$ sudo pool`
  • 18:44 ryankemper@puppetmaster1001: conftool action : set/weight=0:pooled=inactive; selector: name=wdqs2021.*
  • 18:44 ladsgroup@deploy1002: Finished scap: Backport for BlockedDomains: Add logging in case of hit (T337431) (duration: 30m 33s)
  • 18:25 ladsgroup@deploy1002: ladsgroup: Backport for BlockedDomains: Add logging in case of hit (T337431) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 18:13 ladsgroup@deploy1002: Started scap: Backport for BlockedDomains: Add logging in case of hit (T337431)
  • 17:13 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:13 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:12 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:11 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:02 joal@deploy1002: Finished deploy [airflow-dags/analytics@bba655e]: (no justification provided) (duration: 00m 11s)
  • 17:02 joal@deploy1002: Started deploy [airflow-dags/analytics@bba655e]: (no justification provided)
  • 17:00 jnuche@deploy1002: Installation of scap version "4.53.0" completed for 594 hosts
  • 16:59 jnuche@deploy1002: Installing scap version "4.53.0" for 594 hosts
  • 16:55 jnuche@deploy1002: Installing scap version "4.53.0" for 595 hosts
  • 16:53 jnuche@deploy1002: Installing scap version "4.53.0" for 595 hosts
  • 16:52 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudservices2004-dev.codfw.wmnet with OS bullseye
  • 16:52 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
  • 16:51 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
  • 16:38 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on acmechief2001.codfw.wmnet with reason: https://letsencrypt.status.io/pages/55957a99e800baa4470002da
  • 16:38 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on acmechief2001.codfw.wmnet with reason: https://letsencrypt.status.io/pages/55957a99e800baa4470002da
  • 16:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on acmechief2001.codfw.wmnet with reason: https://letsencrypt.status.io/pages/55957a99e800baa4470002da
  • 16:15 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on acmechief2001.codfw.wmnet with reason: https://letsencrypt.status.io/pages/55957a99e800baa4470002da
  • 16:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief2001.codfw.wmnet
  • 16:14 vgutierrez@cumin1001: START - Cookbook sre.hosts.remove-downtime for acmechief2001.codfw.wmnet
  • 15:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:56 joal@deploy1002: Finished deploy [airflow-dags/analytics@c584b62]: (no justification provided) (duration: 00m 12s)
  • 15:56 joal@deploy1002: Started deploy [airflow-dags/analytics@c584b62]: (no justification provided)
  • 15:51 mutante: phabricator - made jnuche (https://phabricator.wikimedia.org/people/manage/32076/) an Administrator T339174
  • 15:46 milimetric@deploy1002: Finished deploy [analytics/refinery@106bf30] (thin): Patch for HiveToDruid with snapshots [thin] (duration: 00m 04s)
  • 15:45 milimetric@deploy1002: Started deploy [analytics/refinery@106bf30] (thin): Patch for HiveToDruid with snapshots [thin]
  • 15:44 claime: mw2323.codfw.wmnet repooled following T326564
  • 15:44 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2323.codfw.wmnet
  • 15:44 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2323.codfw.wmnet
  • 15:44 milimetric@deploy1002: Finished deploy [analytics/refinery@106bf30]: Patch for HiveToDruid with snapshots (duration: 07m 01s)
  • 15:43 claime: mw2324.codfw.wmnet repooled following T326564
  • 15:39 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2323.codfw.wmnet
  • 15:37 milimetric@deploy1002: Started deploy [analytics/refinery@106bf30]: Patch for HiveToDruid with snapshots
  • 15:36 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2324.codfw.wmnet
  • 15:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2324.codfw.wmnet
  • 15:36 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2324.codfw.wmnet
  • 15:33 cgoubert@cumin1001: conftool action : set/pooled=no; selector: name=mw2324.codfw.wmnet
  • 15:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 15:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 15:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 15:28 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 15:28 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 15:27 claime: mw2411.codfw.wmnet repooled following T326564
  • 15:26 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2411.codfw.wmnet
  • 15:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 15:24 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2411.codfw.wmnet
  • 15:24 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2411.codfw.wmnet
  • 15:23 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 15:22 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 15:21 claime: mw2401.codfw.wmnet repooled following T326564
  • 15:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 15:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 15:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 15:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 15:17 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:17 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:16 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 15:16 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 15:16 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2401.codfw.wmnet
  • 15:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2401.codfw.wmnet
  • 15:16 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2401.codfw.wmnet
  • 15:14 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:14 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:14 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 15:14 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 15:13 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 15:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 15:12 claime: Deploying new mediawiki chart: Gracefully handle termination - T331609
  • 15:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:10 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:09 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:09 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 15:00 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:00 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:55 claime: Powering down mw2401 mw2411 mw2324 mw2323 - T326564
  • 14:54 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2323.codfw.wmnet
  • 14:54 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2324.codfw.wmnet
  • 14:54 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2411.codfw.wmnet
  • 14:54 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2401.codfw.wmnet
  • 14:53 claime: Depooling mw2401 mw2411 mw2324 mw2323 as invalid for powerdown - T326564
  • 14:53 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw2323.codfw.wmnet with reason: powering off for T326564
  • 14:52 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw2323.codfw.wmnet with reason: powering off for T326564
  • 14:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw2324.codfw.wmnet with reason: powering off for T326564
  • 14:52 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw2324.codfw.wmnet with reason: powering off for T326564
  • 14:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw2411.codfw.wmnet with reason: powering off for T326564
  • 14:52 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw2411.codfw.wmnet with reason: powering off for T326564
  • 14:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw2401.codfw.wmnet with reason: powering off for T326564
  • 14:51 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on mw2401.codfw.wmnet with reason: powering off for T326564
  • 14:40 Lucas_WMDE: UTC afternoon backport+config window done (maintenance script runs are ongoing and “will probably take a few weeks to complete”)
  • 14:39 Lucas_WMDE: Start `foreachwikiindblist 'group2 & s6' DiscussionTools:persistRevisionThreadItems --current --all; touch ~/T315510-s6-exited-$?` in tmux on mwmaint1002 (T315510)
  • 14:39 Lucas_WMDE: Start `foreachwikiindblist 'group2 & s5' DiscussionTools:persistRevisionThreadItems --current --all; touch ~/T315510-s5-exited-$?` in tmux on mwmaint1002 (T315510)
  • 14:35 Lucas_WMDE: Start `foreachwikiindblist 'group2 & s3' DiscussionTools:persistRevisionThreadItems --current --all; touch ~/T315510-s3-exited-$?` in tmux on mwmaint1002 (T315510)
  • 14:34 Lucas_WMDE: Start `foreachwikiindblist 'group2 & s2' DiscussionTools:persistRevisionThreadItems --current --all; touch ~/T315510-s2-exited-$?` in tmux on mwmaint1002 (T315510)
  • 14:29 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "Implement Language Converter for yue (Cantonese)" (T59106 T337527) (duration: 09m 53s)
  • 14:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 14:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 14:21 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and wsung: Backport for Revert "Implement Language Converter for yue (Cantonese)" (T59106 T337527) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 14:19 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "Implement Language Converter for yue (Cantonese)" (T59106 T337527)
  • 14:01 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "Temporarily disable UCoC link from non tech wikis" (T280886) (duration: 08m 44s)
  • 14:00 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5032.eqsin.wmnet
  • 14:00 moritzm: remove ruby2.5 2.5.5-3+deb10u5+wmf1 (superseded by corrected Debian build 2.5.5-3+deb10u6 T338294
  • 14:00 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5024.eqsin.wmnet
  • 13:55 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 13:54 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 13:54 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 13:54 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:54 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 13:54 lucaswerkmeister-wmde@deploy1002: reedy and lucaswerkmeister-wmde: Backport for Revert "Temporarily disable UCoC link from non tech wikis" (T280886) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:54 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 13:53 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 13:53 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:53 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "Temporarily disable UCoC link from non tech wikis" (T280886)
  • 13:51 moritzm: installing ruby2.5 security updates
  • 13:49 fabfur: reboot cp5024 and cp5032 for kernel upgrade (T335835)
  • 13:49 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5024.eqsin.wmnet
  • 13:49 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5032.eqsin.wmnet
  • 13:48 samtar@deploy1002: Finished scap: Backport for Section images: Fix scrolling to placeholder (T335209), Section images: update rtl asset with flipped question mark (T335207) (duration: 09m 40s)
  • 13:40 samtar@deploy1002: kharlan and samtar: Backport for Section images: Fix scrolling to placeholder (T335209), Section images: update rtl asset with flipped question mark (T335207) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:39 samtar@deploy1002: Started scap: Backport for Section images: Fix scrolling to placeholder (T335209), Section images: update rtl asset with flipped question mark (T335207)
  • 13:28 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES codfw cluster: Roll restart of ORES's daemons.
  • 13:24 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5031.eqsin.wmnet
  • 13:24 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5023.eqsin.wmnet
  • 13:21 daniel@deploy1002: Finished scap: Backport for Switch VisualEditor to bypass RESTbase on all wikis. (T320529) (duration: 11m 48s)
  • 13:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 13:13 fabfur: reboot cp5023 and cp5031 for kernel upgrade (T335835)
  • 13:13 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5031.eqsin.wmnet
  • 13:13 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5023.eqsin.wmnet
  • 13:10 daniel@deploy1002: daniel: Backport for Switch VisualEditor to bypass RESTbase on all wikis. (T320529) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:09 daniel@deploy1002: Started scap: Backport for Switch VisualEditor to bypass RESTbase on all wikis. (T320529)
  • 13:08 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 13:08 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES codfw cluster: Roll restart of ORES's daemons.
  • 13:07 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 13:05 elukey@cumin1001: END (PASS) - Cookbook sre.ores.roll-restart-workers (exit_code=0) for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 12:59 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:58 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:57 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:57 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:53 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5022.eqsin.wmnet
  • 12:51 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5030.eqsin.wmnet
  • 12:48 stevemunene@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 12:45 elukey@cumin1001: START - Cookbook sre.ores.roll-restart-workers for ORES eqiad cluster: Roll restart of ORES's daemons.
  • 12:40 fabfur: reboot cp5022 and cp5030 for kernel upgrade (T335835)
  • 12:40 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5022.eqsin.wmnet
  • 12:40 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5030.eqsin.wmnet
  • 12:35 moritzm: installing ffmpeg security updates
  • 12:34 joal@deploy1002: Finished deploy [airflow-dags/analytics@d458338]: (no justification provided) (duration: 00m 09s)
  • 12:34 joal@deploy1002: Started deploy [airflow-dags/analytics@d458338]: (no justification provided)
  • 12:27 moritzm: installing containerd security updates
  • 12:18 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5029.eqsin.wmnet
  • 12:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5021.eqsin.wmnet
  • 12:14 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
  • 12:11 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudservices2004-dev.codfw.wmnet with reason: host reimage
  • 12:07 fabfur: reboot cp5021 and cp5029 for kernel upgrade (T335835)
  • 12:06 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5021.eqsin.wmnet
  • 12:06 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5029.eqsin.wmnet
  • 12:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
  • 12:02 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
  • 11:58 moritzm: restarting exim on lists1001
  • 11:52 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices2004-dev.codfw.wmnet with OS bullseye
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 11:51 claime: Repooled parse1002.eqiad.wmnet after powercycle
  • 11:49 moritzm: restarting slapd on seagorgium/serpens
  • 11:48 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
  • 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
  • 11:46 ladsgroup@deploy1002: Finished scap: Backport for Switch five large wikis to extlinks read new (T335343) (duration: 09m 10s)
  • 11:45 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-codfw
  • 11:40 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-eqiad
  • 11:40 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on parse1002.eqiad.wmnet with reason: Powercycle
  • 11:40 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on parse1002.eqiad.wmnet with reason: Powercycle
  • 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema
  • 11:39 claime: parse1002 not responding to ssh or console, depooled
  • 11:38 ladsgroup@deploy1002: ladsgroup: Backport for Switch five large wikis to extlinks read new (T335343) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 11:37 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: dc=eqiad,name=parse1002.eqiad.wmnet
  • 11:37 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema
  • 11:37 ladsgroup@deploy1002: Started scap: Backport for Switch five large wikis to extlinks read new (T335343)
  • 11:32 ladsgroup@deploy1002: Finished scap: Backport for Remove nlwiki from windows-1252 encoding (T128154) (duration: 17m 38s)
  • 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 11:29 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wcqs-public
  • 11:28 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-eqiad
  • 11:16 ladsgroup@deploy1002: ladsgroup: Backport for Remove nlwiki from windows-1252 encoding (T128154) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 11:14 ladsgroup@deploy1002: Started scap: Backport for Remove nlwiki from windows-1252 encoding (T128154)
  • 11:11 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudservices2004-dev.codfw.wmnet with OS bullseye
  • 11:08 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5028.eqsin.wmnet
  • 11:08 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5020.eqsin.wmnet
  • 10:58 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:57 fabfur: reboot cp5020 and cp5028 for kernel upgrade (T335835)
  • 10:57 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5020.eqsin.wmnet
  • 10:57 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5028.eqsin.wmnet
  • 10:56 fabfur@cumin1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet
  • 10:54 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:34 klausman@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 10:34 klausman@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-codfw
  • 10:30 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 10:30 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:22 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-codfw
  • 10:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:18 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:18 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 10:16 klausman@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 10:15 klausman@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 10:14 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 10:09 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:07 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx rolling restart_daemons on A:wdqs-all
  • 10:06 btullis: removed hadoop packages incorrectly labelled for i386 in thirdparty/bigtop15 bullseye-wikimedia
  • 10:04 Amir1: root@clouddb1021.eqiad.wmnet[metawiki]> ALTER TABLE pagelinks ROW_FORMAT=COMPRESSED; (T337961)
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb1021.eqiad.wmnet with reason: T337961
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb1021.eqiad.wmnet with reason: T337961
  • 10:02 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:57 moritzm: restarting FPM on mw canaries
  • 09:54 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:53 moritzm: installing openssl security updates on buster
  • 09:51 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp5019.eqsin.wmnet
  • 09:47 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5027.eqsin.wmnet
  • 09:43 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudservices2004-dev.private.codfw.wikimedia.cloud on all recursors
  • 09:43 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudservices2004-dev.private.codfw.wikimedia.cloud on all recursors
  • 09:42 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:42 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
  • 09:41 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
  • 09:39 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 09:34 fabfur: reboot cp5019 and cp5027 for kernel upgrade (T335835)
  • 09:34 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5019.eqsin.wmnet
  • 09:34 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5027.eqsin.wmnet
  • 09:13 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5018.eqsin.wmnet
  • 09:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5026.eqsin.wmnet
  • 09:08 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudservices2004-dev.codfw.wmnet with OS bullseye
  • 09:07 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudservices2004-dev.mgmt.codfw.wmnet on all recursors
  • 09:07 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudservices2004-dev.mgmt.codfw.wmnet on all recursors
  • 09:06 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudservices2004-dev.codfw.wmnet on all recursors
  • 09:06 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudservices2004-dev.codfw.wmnet on all recursors
  • 09:05 elukey: move varnishkafka instances in ulsfo to PKI - T337825
  • 09:05 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:05 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2004-dev - aborrero@cumin2002"
  • 09:04 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2004-dev - aborrero@cumin2002"
  • 09:02 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 09:01 fabfur: reboot cp5018 and cp5026 for kernel upgrade (T335835)
  • 09:01 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5018.eqsin.wmnet
  • 09:01 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5026.eqsin.wmnet
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
  • 09:00 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1492.eqiad.wmnet
  • 09:00 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1492.eqiad.wmnet
  • 08:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1492.eqiad.wmnet
  • 08:59 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw1492.eqiad.wmnet
  • 08:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
  • 08:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1492.eqiad.wmnet with OS buster
  • 08:52 elukey@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1001"
  • 08:31 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5025.eqsin.wmnet
  • 08:31 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp5017.eqsin.wmnet
  • 08:20 fabfur: reboot cp5017 and cp5025 for kernel upgrade (T335835)
  • 08:20 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5017.eqsin.wmnet
  • 08:20 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp5025.eqsin.wmnet
  • 08:15 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 08:13 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.13 refs T337527
  • 08:13 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 08:11 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 08:10 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 07:55 elukey@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1001"
  • 07:34 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49438 and previous config saved to /var/cache/conftool/dbconfig/20230615-073248-root.json
  • 07:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1492.eqiad.wmnet with reason: host reimage
  • 07:24 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1492.eqiad.wmnet with reason: host reimage
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49437 and previous config saved to /var/cache/conftool/dbconfig/20230615-071744-root.json
  • 07:11 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host mw1492.eqiad.wmnet with OS buster
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49436 and previous config saved to /var/cache/conftool/dbconfig/20230615-070239-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49435 and previous config saved to /var/cache/conftool/dbconfig/20230615-064734-root.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49434 and previous config saved to /var/cache/conftool/dbconfig/20230615-063230-root.json
  • 06:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox interface ID 2066
  • 06:31 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox interface ID 2066
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49433 and previous config saved to /var/cache/conftool/dbconfig/20230615-061725-root.json
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49432 and previous config saved to /var/cache/conftool/dbconfig/20230615-060220-root.json
  • 05:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1168 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49431 and previous config saved to /var/cache/conftool/dbconfig/20230615-054716-root.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1168 to upgrade to 10.6.14 T338918', diff saved to https://phabricator.wikimedia.org/P49430 and previous config saved to /var/cache/conftool/dbconfig/20230615-053318-root.json

2023-06-14

  • 23:38 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe
  • 23:30 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe
  • 21:38 mutante: phabricator - made dancy (https://phabricator.wikimedia.org/people/manage/25411/) and administrator (T339174)
  • 21:02 taavi@deploy1002: Finished scap: Backport for Fix thumb styling on file description page (T337804) (duration: 10m 44s)
  • 20:54 taavi@deploy1002: arlolra and taavi: Backport for Fix thumb styling on file description page (T337804) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:52 taavi@deploy1002: Started scap: Backport for Fix thumb styling on file description page (T337804)
  • 20:34 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on otrs1001.eqiad.wmnet with reason: Replacing Host
  • 20:34 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on otrs1001.eqiad.wmnet with reason: Replacing Host
  • 20:29 taavi@deploy1002: Finished scap: Backport for Enable mobile page tabs for everyone in ptwikisource. (T338974) (duration: 10m 23s)
  • 20:21 taavi@deploy1002: taavi and albertoleoncio: Backport for Enable mobile page tabs for everyone in ptwikisource. (T338974) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:19 taavi@deploy1002: Started scap: Backport for Enable mobile page tabs for everyone in ptwikisource. (T338974)
  • 20:12 taavi@deploy1002: Finished scap: Backport for simplewiki: Remove "changetags" from registered user (T339124) (duration: 08m 55s)
  • 20:10 mutante: https://ticket.wikimedia.org down for migration
  • 20:06 taavi@deploy1002: taavi and stang: Backport for simplewiki: Remove "changetags" from registered user (T339124) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:03 taavi@deploy1002: Started scap: Backport for simplewiki: Remove "changetags" from registered user (T339124)
  • 20:01 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on otrs1001.eqiad.wmnet with reason: Replacing Host
  • 20:01 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on otrs1001.eqiad.wmnet with reason: Replacing Host
  • 18:29 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@3d6caed]: Deploying mostly to rerun druid loading for mediawiki history reduced (duration: 00m 09s)
  • 18:29 milimetric@deploy1002: Started deploy [airflow-dags/analytics@3d6caed]: Deploying mostly to rerun druid loading for mediawiki history reduced
  • 18:11 moritzm: installing libssh security updates on buster
  • 17:04 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 17:03 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 17:03 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 17:02 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 17:01 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 17:01 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 16:52 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS bullseye
  • 16:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw1492.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:48 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host mw1492.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:42 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided) (duration: 02m 03s)
  • 15:40 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@0c82f2d] (releasing): (no justification provided)
  • 15:00 lucaswerkmeister-wmde:: Deployed security patch for T250720
  • 14:53 lucaswerkmeister-wmde:: Deployed security patch for T250720
  • 14:36 Amir1: mwscript findBadBlobs.php --wiki=nlwiki --revisions 880583,880584,880585,880586,880587,880588,880589,880590,880591,880592,880593,880594,880595,880596,880597,880598,880599,880600,880601,880602 --mark "T128154"
  • 14:33 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:32 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:32 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:30 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:22 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Declare mediawiki.page_outlink_topic_prediction_change.v1 stream - T328899 (duration: 10m 25s)
  • 14:19 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:17 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 14:17 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 14:16 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:15 bblack: dns2006: updating gdnsd package
  • 14:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:13 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:12 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:12 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:59 topranks: adjusting port buffer partition asw2-esams T284592
  • 13:58 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
  • 13:58 topranks: adjusting port buffer partition asw1-eqsin T284592
  • 13:58 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
  • 13:57 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 13:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 13:57 topranks: adjusting port buffer partition asw2-ulsfo T284592
  • 13:53 topranks: adjusting port buffer partition asw-d-codfw T284592
  • 13:52 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:49 topranks: adjusting port buffer partition asw-c-codfw T284592
  • 13:46 topranks: adjusting port buffer partition asw-b-codfw T284592
  • 13:44 moritzm: imported jenkins 2.401.1 to thirdparty/ci for buster-wikimedia
  • 13:42 topranks: adjusting port buffer partition asw-a-codfw T284592
  • 13:42 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:14 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:05 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:57 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:47 jnuche@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.13 refs T337527 (duration: 06m 10s)
  • 12:41 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.13 refs T337527
  • 12:29 ladsgroup@deploy1002: Finished scap: Backport for Fix cases of LogicException in $update->getParserOutputForMetaData() (T339094) (duration: 08m 21s)
  • 12:23 ladsgroup@deploy1002: ladsgroup: Backport for Fix cases of LogicException in $update->getParserOutputForMetaData() (T339094) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 12:21 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw1492.eqiad.wmnet
  • 12:21 ladsgroup@deploy1002: Started scap: Backport for Fix cases of LogicException in $update->getParserOutputForMetaData() (T339094)
  • 12:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1492.eqiad.wmnet
  • 12:01 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "fix mgmt for cloudservices2004-dev - jbond@cumin1001"
  • 12:00 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "fix mgmt for cloudservices2004-dev - jbond@cumin1001"
  • 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: T337961
  • 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb1021.eqiad.wmnet with reason: T337961
  • 11:11 XioNoX: eqiad row D, move VRRP primary back to cr2 - T313463
  • 11:06 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "fix mgmt - jbond@cumin1001"
  • 11:05 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "fix mgmt - jbond@cumin1001"
  • 11:00 XioNoX: disable cr2<->row D link for link migration - T313463
  • 10:40 XioNoX: eqiad row D, move VRRP primary to cr1 - T313463
  • 10:30 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1040-1043].eqiad.wmnet
  • 10:30 mvernon@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:30 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[1040-1043].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1001"
  • 10:24 mvernon@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[1040-1043].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1001"
  • 10:22 mvernon@cumin1001: START - Cookbook sre.dns.netbox
  • 10:11 XioNoX: disable cr1<->row D link for link migration - T313463
  • 10:03 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.13 refs T337527
  • 10:03 mvernon@cumin1001: START - Cookbook sre.hosts.decommission for hosts ms-be[1040-1043].eqiad.wmnet
  • 10:01 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 10:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 09:54 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:28 jnuche@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.13 refs T337527 (duration: 06m 56s)
  • 09:21 moritzm: installing php7.4 security updates
  • 09:21 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.13 refs T337527
  • 09:11 hashar: zuul: rolled back config changes for T309376 and restarted Zuul. CI is back up.
  • 09:00 tgr_: UTC morning deploys done
  • 08:59 tgr@deploy1002: Finished scap: Backport for Section images: Pass section parameters to VE in add image tasks (T339046) (duration: 07m 55s)
  • 08:58 hashar: Rolling back Zuul config change and restarting Zuul to clear ssh connections
  • 08:53 tgr@deploy1002: tgr: Backport for Section images: Pass section parameters to VE in add image tasks (T339046) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 08:51 tgr@deploy1002: Started scap: Backport for Section images: Pass section parameters to VE in add image tasks (T339046)
  • 08:51 hashar: Restarting Zuul to apply config change for T309376
  • 08:48 tgr@deploy1002: Finished scap: Backport for Revert "jquery.makeCollapsible: Use `unset: all` on buttons" (T333357 T338927) (duration: 08m 14s)
  • 08:41 tgr@deploy1002: tgr: Backport for Revert "jquery.makeCollapsible: Use `unset: all` on buttons" (T333357 T338927) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:40 tgr@deploy1002: Started scap: Backport for Revert "jquery.makeCollapsible: Use `unset: all` on buttons" (T333357 T338927)
  • 08:18 tgr@deploy1002: Finished scap: Backport for Structured tasks: Fix toolbar rewriting (T338934) (duration: 12m 52s)
  • 08:07 tgr@deploy1002: tgr: Backport for Structured tasks: Fix toolbar rewriting (T338934) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 08:05 tgr@deploy1002: Started scap: Backport for Structured tasks: Fix toolbar rewriting (T338934)
  • 07:46 tgr_: backporting https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/929966 (can't edit wikitech due to DB issues)
  • 07:40 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 29
  • 07:40 ayounsi@cumin2002: START - Cookbook sre.network.debug for Netbox circuit ID 29
  • 07:32 tgr_: test
  • 07:31 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 3 Wikipedias (T338123) (duration: 09m 54s)
  • 07:23 kartik@deploy1002: kartik: Backport for testwiki: Enable Section Translation for 3 Wikipedias (T338123) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:21 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 3 Wikipedias (T338123)
  • 07:19 kartik@deploy1002: Backport cancelled.
  • 07:18 kartik@deploy1002: Finished scap: Backport for Enable Content and Section Translation for a 2nd group of 9 languages previously lacking machine translation (T337669) (duration: 13m 35s)
  • 07:06 kartik@deploy1002: kartik: Backport for Enable Content and Section Translation for a 2nd group of 9 languages previously lacking machine translation (T337669) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:04 kartik@deploy1002: Started scap: Backport for Enable Content and Section Translation for a 2nd group of 9 languages previously lacking machine translation (T337669)
  • 07:04 marostegui: Test
  • 04:34 ejegg: civicrm upgraded from fd87e0df to d61220cd
  • 04:01 ejegg: civicrm upgraded from a675c2c9 to fd87e0df
  • 01:57 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 01:57 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 01:50 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 01:47 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2021.codfw.wmnet with reason: host reimage
  • 01:41 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 01:05 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2021.codfw.wmnet with OS bullseye
  • 00:09 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2021.codfw.wmnet with OS bullseye

2023-06-13

  • 23:57 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 23:40 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 23:00 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 22:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1035']
  • 22:14 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
  • 21:26 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:06 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 21:06 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 20:57 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on people1004.eqiad.wmnet with reason: first setup
  • 20:57 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on people1004.eqiad.wmnet with reason: first setup
  • 20:55 ebernhardson@deploy1002: Finished scap: Backport for cirrus: Enable analysis chain deduplication for wikibase (T334194) (duration: 07m 36s)
  • 20:49 ebernhardson@deploy1002: ebernhardson: Backport for cirrus: Enable analysis chain deduplication for wikibase (T334194) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1004.eqiad.wmnet
  • 20:48 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host people1004.eqiad.wmnet with OS bookworm
  • 20:48 ebernhardson@deploy1002: Started scap: Backport for cirrus: Enable analysis chain deduplication for wikibase (T334194)
  • 20:34 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on people1004.eqiad.wmnet with reason: host reimage
  • 20:31 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on people1004.eqiad.wmnet with reason: host reimage
  • 20:29 urbanecm@deploy1002: Finished scap: Backport for Exclude after-aligned tools when creating target widgets (T338978) (duration: 08m 10s)
  • 20:22 urbanecm@deploy1002: matmarex and urbanecm: Backport for Exclude after-aligned tools when creating target widgets (T338978) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:21 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host people1004.eqiad.wmnet with OS bookworm
  • 20:20 urbanecm@deploy1002: Started scap: Backport for Exclude after-aligned tools when creating target widgets (T338978)
  • 20:18 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 20:17 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 20:17 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
  • 20:17 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
  • 20:17 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:16 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 20:14 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 20:09 urbanecm: Start `foreachwikiindblist 'group2 & s7' extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --current --all` in a tmux session on mwmaint1002 (T315510)
  • 20:01 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 20:01 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1004.eqiad.wmnet
  • 19:56 eileen: civicrm: revision a675c2c9, config c83f9a1a
  • 19:50 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people2003.codfw.wmnet
  • 19:50 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host people2003.codfw.wmnet with OS bookworm
  • 19:38 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on people2003.codfw.wmnet with reason: host reimage
  • 19:35 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on people2003.codfw.wmnet with reason: host reimage
  • 19:33 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:18 bblack: dns4004 - downtime removed, agent back to normal, etc
  • 19:17 jclark@cumin1001: START - Cookbook sre.hosts.provision for host an-worker1150.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:17 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host people2003.codfw.wmnet with OS bookworm
  • 19:09 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people2003.codfw.wmnet - dzahn@cumin1001"
  • 19:08 bblack: dns4004: downtiming and stopping agent for a bit, to test some new software
  • 19:08 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people2003.codfw.wmnet - dzahn@cumin1001"
  • 19:08 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people2003.codfw.wmnet on all recursors
  • 19:08 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people2003.codfw.wmnet on all recursors
  • 19:08 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:08 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people2003.codfw.wmnet - dzahn@cumin1001"
  • 18:48 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people2003.codfw.wmnet - dzahn@cumin1001"
  • 18:47 Amir1: root@clouddb1021.eqiad.wmnet[commonswiki]> ALTER TABLE externallinks ROW_FORMAT=COMPRESSED; (T337961)
  • 18:44 ladsgroup@deploy1002: Finished scap: Backport for Retrieve external links from PreparedUpdate (T65632 T264104) (duration: 12m 18s)
  • 18:43 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 18:43 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people2003.codfw.wmnet
  • 18:43 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1017.eqiad.wmnet with OS buster
  • 18:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:35 mutante: grafana2001 - apt-get clean
  • 18:34 ladsgroup@deploy1002: ladsgroup: Backport for Retrieve external links from PreparedUpdate (T65632 T264104) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 18:32 ladsgroup@deploy1002: Started scap: Backport for Retrieve external links from PreparedUpdate (T65632 T264104)
  • 18:30 mutante: ganeti2021 - deleting VM people2003
  • 18:30 mutante: ganeti1028 - deleting VM people2003
  • 18:29 mutante: ganeti1028 - deleting VM people1004
  • 18:29 Amir1: root@clouddb1021.eqiad.wmnet[ruwikinews]> ALTER TABLE externallinks ROW_FORMAT=COMPRESSED; (T337961)
  • 18:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb1021.eqiad.wmnet with reason: T337961
  • 18:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb1021.eqiad.wmnet with reason: T337961
  • 18:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1016.eqiad.wmnet with OS buster
  • 18:23 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:21 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1016.eqiad.wmnet with reason: host reimage
  • 17:56 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1016.eqiad.wmnet with reason: host reimage
  • 17:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
  • 17:55 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host snapshot1016.eqiad.wmnet with OS buster
  • 17:38 ladsgroup@deploy1002: Finished scap: Backport for Make old_links retrieval cleaner (duration: 18m 09s)
  • 17:28 otto@deploy1002: Finished deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f] - to stat1009f (duration: 01m 25s)
  • 17:27 otto@deploy1002: Started deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f] - to stat1009f
  • 17:22 otto@deploy1002: Finished deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f] - to stat1009 (duration: 00m 02s)
  • 17:22 otto@deploy1002: Started deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f] - to stat1009
  • 17:22 ladsgroup@deploy1002: ladsgroup: Backport for Make old_links retrieval cleaner synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 17:22 otto@deploy1002: Finished deploy [analytics/refinery@c337e2f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@c337e2f] (duration: 01m 43s)
  • 17:20 ladsgroup@deploy1002: Started scap: Backport for Make old_links retrieval cleaner
  • 17:20 otto@deploy1002: Started deploy [analytics/refinery@c337e2f] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@c337e2f]
  • 17:20 otto@deploy1002: Finished deploy [analytics/refinery@c337e2f] (thin): Regular analytics weekly train THIN [analytics/refinery@c337e2f] (duration: 00m 04s)
  • 17:20 otto@deploy1002: Started deploy [analytics/refinery@c337e2f] (thin): Regular analytics weekly train THIN [analytics/refinery@c337e2f]
  • 17:13 otto@deploy1002: Finished deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f] (duration: 07m 51s)
  • 17:06 otto@deploy1002: Started deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f]
  • 17:05 otto@deploy1002: Finished deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f] (duration: 24m 03s)
  • 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1017.eqiad.wmnet with reason: host reimage
  • 16:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 16:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 16:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 16:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 16:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 16:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 16:53 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1017.eqiad.wmnet with reason: host reimage
  • 16:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 16:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 16:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 16:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 16:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 16:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 16:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 16:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 16:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 16:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 16:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 16:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 16:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 16:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 16:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 16:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 16:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 16:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 16:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 16:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 16:41 otto@deploy1002: Started deploy [analytics/refinery@c337e2f]: Regular analytics weekly train [analytics/refinery@c337e2f]
  • 16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 16:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 16:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 16:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 16:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 16:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 16:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 16:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 16:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 16:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 16:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 16:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 16:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 16:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 16:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1017.eqiad.wmnet with OS buster
  • 16:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 16:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 16:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 16:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 16:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 16:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 16:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 16:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 16:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 16:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 16:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 16:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 16:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 16:19 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 16:19 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 16:13 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1017.eqiad.wmnet with OS buster
  • 16:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 16:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 16:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 15:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1016.eqiad.wmnet with reason: host reimage
  • 15:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 15:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 15:51 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1016.eqiad.wmnet with reason: host reimage
  • 15:45 SandraEbele: Deployed refinery-source using jenkins
  • 15:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['an-worker1149']
  • 15:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
  • 15:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
  • 15:28 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
  • 15:28 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
  • 15:28 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1149']
  • 15:28 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 15:28 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 15:27 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host snapshot1016.eqiad.wmnet with OS buster
  • 15:21 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
  • 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1149']
  • 15:20 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
  • 15:16 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
  • 15:15 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
  • 15:14 SandraEbele: deploying refinery source
  • 15:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 15:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 15:02 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 15:01 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 15:00 elukey: run kafka re-assign partitions for eqiad.change-prop.transcludes.resource-change on kafka-main1001 - T338357
  • 14:59 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 14:58 otto@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 14:58 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 14:57 otto@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 14:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
  • 14:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 14:47 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 14:16 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4052.ulsfo.wmnet
  • 14:16 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4044.ulsfo.wmnet
  • 14:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 14:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 14:05 fabfur: reboot cp4044 and cp4052 for kernel upgrade (T335835)
  • 14:05 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4044.ulsfo.wmnet
  • 14:05 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4052.ulsfo.wmnet
  • 14:03 claime: Revert noc.wikimedia.org to eqiad, running authdns-update - T331634
  • 13:56 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
  • 13:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 13:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 13:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 13:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 13:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 13:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 13:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:42 urbanecm@deploy1002: Finished scap: Backport for Section images: Fix image placeholder alignment for RTL content (T338837) (duration: 10m 29s)
  • 13:41 sukhe: disable puppet on R:Class bird::anycast_healthchecker to merge CR 928804
  • 13:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 13:35 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4051.ulsfo.wmnet
  • 13:35 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4043.ulsfo.wmnet
  • 13:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 13:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 13:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 13:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 13:33 urbanecm@deploy1002: kharlan and urbanecm: Backport for Section images: Fix image placeholder alignment for RTL content (T338837) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 13:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 13:31 urbanecm@deploy1002: Started scap: Backport for Section images: Fix image placeholder alignment for RTL content (T338837)
  • 13:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 13:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 13:25 fabfur: reboot cp4043 and cp4051 for kernel upgrade (T335835)
  • 13:24 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4043.ulsfo.wmnet
  • 13:24 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4051.ulsfo.wmnet
  • 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:21 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 13:21 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 13:21 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 13:20 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 13:20 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 13:19 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:18 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 13:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:15 urbanecm@deploy1002: Finished scap: Backport for Drop disabling removed Datatype (T332724), Testwikidatawiki: Enable new EntitySchema Datatype (T332724) (duration: 09m 29s)
  • 13:07 urbanecm@deploy1002: migr and urbanecm: Backport for Drop disabling removed Datatype (T332724), Testwikidatawiki: Enable new EntitySchema Datatype (T332724) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:06 urbanecm@deploy1002: Started scap: Backport for Drop disabling removed Datatype (T332724), Testwikidatawiki: Enable new EntitySchema Datatype (T332724)
  • 13:02 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4050.ulsfo.wmnet
  • 13:01 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4042.ulsfo.wmnet
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P49421 and previous config saved to /var/cache/conftool/dbconfig/20230613-130129-ladsgroup.json
  • 13:01 moritzm: installing nbconvert security updates
  • 12:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 12:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 12:51 fabfur: reboot cp4042 and cp4050 for kernel upgrade (T335835)
  • 12:51 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4042.ulsfo.wmnet
  • 12:51 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4050.ulsfo.wmnet
  • 12:46 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P49420 and previous config saved to /var/cache/conftool/dbconfig/20230613-124623-ladsgroup.json
  • 12:45 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:45 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:44 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:44 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 12:44 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 12:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P49419 and previous config saved to /var/cache/conftool/dbconfig/20230613-123117-ladsgroup.json
  • 12:29 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4049.ulsfo.wmnet
  • 12:28 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4041.ulsfo.wmnet
  • 12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 12:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 12:18 fabfur: reboot cp4041 and cp4049 for kernel upgrade (T335835)
  • 12:18 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4041.ulsfo.wmnet
  • 12:18 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4049.ulsfo.wmnet
  • 12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P49418 and previous config saved to /var/cache/conftool/dbconfig/20230613-121611-ladsgroup.json
  • 12:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 12:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 12:09 hashar: Restarted Zuul CI due to T309376
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 12:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:45 Amir1: cat wikis_having_stubs | xargs -I {} bash -c 'echo {}; touch /home/ladsgroup/{}.undo.sql; chmod 777 /home/ladsgroup/{}.undo.sql; mwscript maintenance/storage/moveToExternal.php --wiki={} --end 200000000 --undo /home/ladsgroup/{}.undo.sql DB cluster26' (T299387)
  • 11:43 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4048.ulsfo.wmnet
  • 11:42 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4040.ulsfo.wmnet
  • 11:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T329049)
  • 11:40 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T329049)
  • 11:37 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T329049)
  • 11:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:36 ladsgroup@deploy1002: Finished scap: Backport for moveToExternal: Also check for utf8 encoding before trying to convert (duration: 09m 59s)
  • 11:35 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T329049)
  • 11:32 fabfur: reboot cp4040 and cp4048 for kernel upgrade (T335835)
  • 11:32 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4040.ulsfo.wmnet
  • 11:32 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4048.ulsfo.wmnet
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49417 and previous config saved to /var/cache/conftool/dbconfig/20230613-113111-root.json
  • 11:28 ladsgroup@deploy1002: ladsgroup: Backport for moveToExternal: Also check for utf8 encoding before trying to convert synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 11:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 11:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@deploy1002: Started scap: Backport for moveToExternal: Also check for utf8 encoding before trying to convert
  • 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 11:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 11:20 ladsgroup@deploy1002: Finished scap: Backport for Set medium wikis to read new for externallinks (T335343) (duration: 10m 09s)
  • 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 11:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 11:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 11:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 11:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 11:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49416 and previous config saved to /var/cache/conftool/dbconfig/20230613-111607-root.json
  • 11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P49415 and previous config saved to /var/cache/conftool/dbconfig/20230613-111549-ladsgroup.json
  • 11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 11:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 11:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 11:12 ladsgroup@deploy1002: ladsgroup: Backport for Set medium wikis to read new for externallinks (T335343) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 11:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 11:10 ladsgroup@deploy1002: Started scap: Backport for Set medium wikis to read new for externallinks (T335343)
  • 11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 11:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49414 and previous config saved to /var/cache/conftool/dbconfig/20230613-110746-ladsgroup.json
  • 11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 11:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 11:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 11:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 11:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49413 and previous config saved to /var/cache/conftool/dbconfig/20230613-110102-root.json
  • 10:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 10:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 10:56 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4047.ulsfo.wmnet
  • 10:56 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4039.ulsfo.wmnet
  • 10:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P49412 and previous config saved to /var/cache/conftool/dbconfig/20230613-105240-ladsgroup.json
  • 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 10:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 10:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 10:46 fabfur: reboot cp4039 and cp4047 for kernel upgrade (T335835)
  • 10:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49411 and previous config saved to /var/cache/conftool/dbconfig/20230613-104557-root.json
  • 10:45 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4047.ulsfo.wmnet
  • 10:45 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4039.ulsfo.wmnet
  • 10:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 10:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 10:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 10:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 10:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 10:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 10:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 10:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 10:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 10:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P49410 and previous config saved to /var/cache/conftool/dbconfig/20230613-103734-ladsgroup.json
  • 10:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 10:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 10:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 10:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 10:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49409 and previous config saved to /var/cache/conftool/dbconfig/20230613-103053-root.json
  • 10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 10:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 10:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 10:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 10:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49408 and previous config saved to /var/cache/conftool/dbconfig/20230613-102227-ladsgroup.json
  • 10:18 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4046.ulsfo.wmnet
  • 10:18 Amir1: killed extensions/MachineVision/maintenance/prioritizeFilesWithTemplate.php it was blocking a depool in s4
  • 10:17 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4038.ulsfo.wmnet
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49407 and previous config saved to /var/cache/conftool/dbconfig/20230613-101548-root.json
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49406 and previous config saved to /var/cache/conftool/dbconfig/20230613-101310-ladsgroup.json
  • 10:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 10:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 10:07 fabfur: reboot cp4038 and cp4046 for kernel upgrade (T335835)
  • 10:07 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4038.ulsfo.wmnet
  • 10:07 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4046.ulsfo.wmnet
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netflow2002.codfw.wmnet
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:04 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netflow2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 2%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49405 and previous config saved to /var/cache/conftool/dbconfig/20230613-100043-root.json
  • 09:58 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts netflow2002.codfw.wmnet
  • 09:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 09:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 09:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49404 and previous config saved to /var/cache/conftool/dbconfig/20230613-094538-root.json
  • 09:45 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 09:42 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 09:38 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 09:38 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 09:33 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4037.ulsfo.wmnet
  • 09:33 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4045.ulsfo.wmnet
  • 09:24 fabfur: reboot cp4037 and cp4045 for kernel upgrade (T335835)
  • 09:23 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4037.ulsfo.wmnet
  • 09:23 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp4045.ulsfo.wmnet
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2180 to upgrade to 10.6.14 T338918', diff saved to https://phabricator.wikimedia.org/P49403 and previous config saved to /var/cache/conftool/dbconfig/20230613-092208-root.json
  • 09:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6008.drmrs.wmnet
  • 09:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6016.drmrs.wmnet
  • 09:08 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudservices2004-dev
  • 09:08 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:08 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2004-dev decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 09:07 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudservices2004-dev decommissioned, removing all IPs except the asset tag one - aborrero@cumin2002"
  • 09:03 fabfur: reboot cp6008 and cp6016 for kernel upgrade (T335835)
  • 09:03 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6008.drmrs.wmnet
  • 09:03 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6016.drmrs.wmnet
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow1002.eqiad.wmnet with OS bookworm
  • 08:59 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 08:49 aborrero@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudservices2004-dev
  • 08:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow1002.eqiad.wmnet with reason: host reimage
  • 08:31 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow1002.eqiad.wmnet with reason: host reimage
  • 08:30 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6015.drmrs.wmnet
  • 08:30 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6007.drmrs.wmnet
  • 08:25 vgutierrez: cleaning up prometheus-https service from IPVS on lvs2014 - T326657
  • 08:22 fabfur: reboot cp6007 and cp6015 for kernel upgrade (T335835)
  • 08:22 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6007.drmrs.wmnet
  • 08:22 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6015.drmrs.wmnet
  • 08:20 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.13 refs T337527
  • 08:19 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host netflow1002.eqiad.wmnet with OS bookworm
  • 08:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow3002.esams.wmnet with OS bookworm
  • 08:00 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6006.drmrs.wmnet
  • 08:00 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6014.drmrs.wmnet
  • 07:53 fabfur: reboot cp6006.drmrs.wmnet and cp6014.drmrs.wmnet for kernel upgrade (T335835)
  • 07:52 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6014.drmrs.wmnet
  • 07:52 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6006.drmrs.wmnet
  • 07:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow3002.esams.wmnet with reason: host reimage
  • 07:32 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6005.drmrs.wmnet
  • 07:32 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6013.drmrs.wmnet
  • 07:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow3002.esams.wmnet with reason: host reimage
  • 07:23 fabfur: rebooting cp6005.drmrs.wmnet and cp6013.drmrs.wmnet for upgrade
  • 07:23 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6005.drmrs.wmnet
  • 07:23 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6013.drmrs.wmnet
  • 07:10 elukey: move varnishkafka instances on cp4037 to PKI TLS certs - T337825
  • 07:09 kart_: Updated MinT to 2023-06-13-061519-production (T337656, T334465)
  • 07:08 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 07:08 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 07:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host netflow3002.esams.wmnet with OS bookworm
  • 07:03 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6012.drmrs.wmnet
  • 07:02 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6004.drmrs.wmnet
  • 06:59 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:59 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 06:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
  • 06:55 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
  • 06:55 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 06:55 fabfur: rebooting cp6004.drmrs.wmnet and cp6012.drmrs.wmnet for upgrade
  • 06:55 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 06:54 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6012.drmrs.wmnet
  • 06:53 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6004.drmrs.wmnet
  • 06:51 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 06:48 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:41 kart_: Updated cxserver to 2023-06-13-054849-production (T338123, T338146, T337834)
  • 06:39 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:38 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:26 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:18 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:17 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:48 marostegui: dbmaint Deploy schema change on x1 eqiad with replication T337940
  • 03:54 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.11 (duration: 02m 13s)
  • 03:52 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.13 refs T337527 (duration: 49m 27s)
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.13 refs T337527
  • 02:54 eileen: civicrm upgraded from 5bbed553 to d63f548c
  • 02:46 eileen: civicrm upgraded from 5bbed553 to d63f548c
  • 00:10 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 00:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye

2023-06-12

  • 23:52 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 23:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 23:36 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 23:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 23:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 23:05 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS bullseye
  • 22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 22:22 brett: Roll restarting pybal on lvs2014 to revert prometheus service rollout - T326657
  • 22:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 22:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host snapshot1016.eqiad.wmnet with OS buster
  • 22:07 cstone: payments-wiki upgraded from f3b229c6 to b1cf4f26
  • 21:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
  • 21:20 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 21:20 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 21:16 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
  • 20:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1003.eqiad.wmnet with OS bullseye
  • 20:36 urbanecm@deploy1002: Finished scap: Backport for [Growth] Enable new Impact module for rowiki (T336203) (duration: 07m 06s)
  • 20:31 urbanecm@deploy1002: urbanecm: Backport for [Growth] Enable new Impact module for rowiki (T336203) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:29 urbanecm@deploy1002: Started scap: Backport for [Growth] Enable new Impact module for rowiki (T336203)
  • 20:29 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host people2003.codfw.wmnet
  • 20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people2003.codfw.wmnet on all recursors
  • 20:29 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people2003.codfw.wmnet on all recursors
  • 20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:29 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people2003.codfw.wmnet - dzahn@cumin1001"
  • 20:28 urbanecm: Run extensions/GrowthExperiments/maintenance/refreshUserImpactData.php for rowiki (T336203)
  • 20:28 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people2003.codfw.wmnet - dzahn@cumin1001"
  • 20:25 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 20:24 urbanecm@deploy1002: Finished scap: Backport for [Growth] Enable user impact refresh for rowiki (T336203) (duration: 06m 53s)
  • 20:23 dzahn@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:22 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 20:22 dzahn@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host people2003.codfw.wmnet with OS bookworm
  • 20:19 urbanecm@deploy1002: urbanecm: Backport for [Growth] Enable user impact refresh for rowiki (T336203) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:17 urbanecm@deploy1002: Started scap: Backport for [Growth] Enable user impact refresh for rowiki (T336203)
  • 20:16 urbanecm@deploy1002: Finished scap: Backport for prod: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336364), Remove references to $wgEnableLocalTimedText from CommonSettings, Remove unused variable wmgEnableLocalTimedText (duration: 11m 33s)
  • 20:14 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1010.eqiad.wmnet with OS bullseye
  • 20:14 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:11 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 20:06 urbanecm@deploy1002: daimona and urbanecm: Backport for prod: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336364), Remove references to $wgEnableLocalTimedText from CommonSettings, Remove unused variable wmgEnableLocalTimedText synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codf
  • 20:04 urbanecm@deploy1002: Started scap: Backport for prod: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336364), Remove references to $wgEnableLocalTimedText from CommonSettings, Remove unused variable wmgEnableLocalTimedText
  • 20:03 brett: Roll restarting pybal on lvs2014 then lvs2013 - T863380
  • 20:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS bullseye
  • 19:54 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
  • 19:51 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1010.eqiad.wmnet with reason: host reimage
  • 19:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['an-worker1149']
  • 19:41 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
  • 19:38 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['an-worker1149']
  • 19:35 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@fb9dba3]: repoint drafttopic ingestion to model specific stream (duration: 00m 10s)
  • 19:35 ebernhardson@deploy1002: Started deploy [airflow-dags/search@fb9dba3]: repoint drafttopic ingestion to model specific stream
  • 19:33 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
  • 19:33 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 19:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['an-worker1149']
  • 19:14 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
  • 19:14 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 19:11 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
  • 19:11 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 19:05 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host people2003.codfw.wmnet with OS bookworm
  • 18:44 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people2003.codfw.wmnet - dzahn@cumin1001"
  • 18:43 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people2003.codfw.wmnet - dzahn@cumin1001"
  • 18:42 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people2003.codfw.wmnet on all recursors
  • 18:42 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people2003.codfw.wmnet on all recursors
  • 18:42 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:42 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people2003.codfw.wmnet - dzahn@cumin1001"
  • 18:42 robh@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 18:41 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people2003.codfw.wmnet - dzahn@cumin1001"
  • 18:39 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 18:39 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people2003.codfw.wmnet
  • 18:37 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host people1004.eqiad.wmnet
  • 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
  • 18:37 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
  • 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:37 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 18:36 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 18:33 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
  • 18:33 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
  • 18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:33 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 18:32 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 18:26 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 18:26 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1004.eqiad.wmnet
  • 18:25 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host people1004.eqiad.wmnet
  • 18:25 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
  • 18:25 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
  • 18:25 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:25 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 18:24 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 18:21 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 18:21 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
  • 18:21 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
  • 18:21 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:21 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 18:20 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 18:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host backup1010.eqiad.wmnet with OS bullseye
  • 18:14 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 18:14 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1004.eqiad.wmnet
  • 18:09 dzahn@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host people1004.eqiad.wmnet
  • 18:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
  • 18:09 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
  • 18:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:09 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 18:06 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 18:04 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 18:04 dzahn@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host people1004.eqiad.wmnet with OS bookworm
  • 17:22 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 17:15 dzahn@cumin1001: START - Cookbook sre.hosts.reimage for host people1004.eqiad.wmnet with OS bookworm
  • 17:11 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 17:10 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 17:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1004.eqiad.wmnet on all recursors
  • 17:09 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache people1004.eqiad.wmnet on all recursors
  • 17:09 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:09 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 17:08 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1004.eqiad.wmnet - dzahn@cumin1001"
  • 17:03 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 17:03 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host people1004.eqiad.wmnet
  • 17:03 mutante: creating ganeti VM people1004 with os==bookworm passed to makevm cookbook to test bookworm and because this is traditionally an early adoptor of new distro releases
  • 16:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003']
  • 16:52 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 16:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 16:49 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 16:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['sretest1003']
  • 16:48 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:08 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6011.drmrs.wmnet
  • 16:07 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 03s)
  • 16:02 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
  • 16:01 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Setup Incomplete
  • 16:01 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 21s)
  • 15:59 fabfur: reboot cp6011.drmrs.wmnet for upgrade
  • 15:59 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6011.drmrs.wmnet
  • 15:51 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6003.drmrs.wmnet
  • 15:43 fabfur: reboot cp6003.drmrs.wmnet for upgrade
  • 15:42 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6003.drmrs.wmnet
  • 15:34 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6010.drmrs.wmnet
  • 15:25 fabfur: rebooting cp6010.drmrs.wmnet for upgrade
  • 15:25 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6010.drmrs.wmnet
  • 15:23 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6002.drmrs.wmnet
  • 15:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host backup1010.eqiad.wmnet with OS bullseye
  • 15:17 fabfur: reboot cp6002.drmrs.wmnet for upgrade
  • 15:14 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6002.drmrs.wmnet
  • 15:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:04 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6009.drmrs.wmnet
  • 15:00 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1011.eqiad.wmnet with OS bullseye
  • 15:00 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:58 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:56 fabfur: reboot cp6009.drmrs.wmnet for pgrade
  • 14:56 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6009.drmrs.wmnet
  • 14:51 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp6001.drmrs.wmnet
  • 14:44 fabfur: rebooting cp6001.drmrs.wmnet for upgrade
  • 14:42 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host cp6001.drmrs.wmnet
  • 14:41 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1011.eqiad.wmnet with reason: host reimage
  • 14:38 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1011.eqiad.wmnet with reason: host reimage
  • 14:29 zabe: Deployed updated mitigations for T336027
  • 14:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host snapshot1016.eqiad.wmnet with OS buster
  • 14:26 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
  • 14:26 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 14:23 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup1011.eqiad.wmnet']
  • 14:22 jclark@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup1010.eqiad.wmnet']
  • 14:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1011.eqiad.wmnet']
  • 14:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1010.eqiad.wmnet']
  • 14:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup1011.eqiad.wmnet']
  • 14:16 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup1010.eqiad.wmnet']
  • 14:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1011.eqiad.wmnet']
  • 14:15 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1010.eqiad.wmnet']
  • 14:02 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:02 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Remove wmgWikibaseTmpEnableLabelsInApiSummaries feature flag (T335107) (duration: 06m 49s)
  • 14:01 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:57 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Remove wmgWikibaseTmpEnableLabelsInApiSummaries feature flag (T335107) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:55 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Remove wmgWikibaseTmpEnableLabelsInApiSummaries feature flag (T335107)
  • 13:54 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Remove wmgWikibaseTmpWbsubscribersSensibleOutput feature flag (T335783) (duration: 06m 54s)
  • 13:51 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Remove wmgWikibaseTmpWbsubscribersSensibleOutput feature flag (T335783) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:47 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Remove wmgWikibaseTmpWbsubscribersSensibleOutput feature flag (T335783)
  • 13:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for [wikidatawiki] Add pagelang to wikidata-staff (T337760) (duration: 07m 27s)
  • 13:40 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for [wikidatawiki] Add pagelang to wikidata-staff (T337760) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:38 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for [wikidatawiki] Add pagelang to wikidata-staff (T337760)
  • 13:32 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for ImageSuggestions: add help link to 4 new languages (T331036) (duration: 11m 23s)
  • 13:21 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and mfossati: Backport for ImageSuggestions: add help link to 4 new languages (T331036) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:20 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for ImageSuggestions: add help link to 4 new languages (T331036)
  • 13:15 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Switch VisualEditor to not use RESTbase on English Wikipedia. (T320529) (duration: 10m 51s)
  • 13:14 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
  • 13:13 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host snapshot1016.eqiad.wmnet with OS buster
  • 13:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
  • 13:10 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host snapshot1016.eqiad.wmnet with OS buster
  • 13:09 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
  • 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow6001.drmrs.wmnet with OS bookworm
  • 13:05 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and daniel: Backport for Switch VisualEditor to not use RESTbase on English Wikipedia. (T320529) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:04 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Switch VisualEditor to not use RESTbase on English Wikipedia. (T320529)
  • 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow6001.drmrs.wmnet with reason: host reimage
  • 12:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow6001.drmrs.wmnet with reason: host reimage
  • 12:28 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host netflow6001.drmrs.wmnet with OS bookworm
  • 12:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 12:01 ladsgroup@deploy1002: Finished scap: Backport for Set small wikis to read new for externallinks (T335343) (duration: 12m 22s)
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow5002.eqsin.wmnet with OS bookworm
  • 11:50 ladsgroup@deploy1002: ladsgroup: Backport for Set small wikis to read new for externallinks (T335343) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 11:49 ladsgroup@deploy1002: Started scap: Backport for Set small wikis to read new for externallinks (T335343)
  • 11:32 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 11:32 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 11:31 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 11:30 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 11:29 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 11:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow5002.eqsin.wmnet with reason: host reimage
  • 11:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow5002.eqsin.wmnet with reason: host reimage
  • 10:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 10:56 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 10:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 10:56 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 10:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host netflow5002.eqsin.wmnet with OS bookworm
  • 10:40 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=enwiki --start 31000000 --end 110000000 --undo /home/ladsgroup/T128151.undo.sql --iconv DB cluster27 (T128151)
  • 10:08 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 09:57 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 09:48 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netflow4002.ulsfo.wmnet with OS bookworm
  • 09:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
  • 09:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
  • 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netflow4002.ulsfo.wmnet with reason: host reimage
  • 08:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netflow4002.ulsfo.wmnet with reason: host reimage
  • 08:56 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
  • 08:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
  • 08:50 taavi@deploy1002: Finished scap: Backport for [knwiki] Add a temporary logo for the 20th anniversary (T338136), [lmowiki] Removing the Purtaal namespace and fixing the Portal talk translation (T338621) (duration: 16m 44s)
  • 08:42 taavi@deploy1002: superpes and taavi: Backport for [knwiki] Add a temporary logo for the 20th anniversary (T338136), [lmowiki] Removing the Purtaal namespace and fixing the Portal talk translation (T338621) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:39 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host netflow4002.ulsfo.wmnet with OS bookworm
  • 08:33 taavi@deploy1002: Started scap: Backport for [knwiki] Add a temporary logo for the 20th anniversary (T338136), [lmowiki] Removing the Purtaal namespace and fixing the Portal talk translation (T338621)
  • 08:30 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
  • 08:30 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
  • 08:30 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 07:01 moritzm: upgrading bookworm netboot images to final/released bookworm images T330495
  • 06:54 kart_: Updated MinT to 2023-06-10-124931-production (T284905)
  • 06:45 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 06:44 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2003.wikimedia.org
  • 06:41 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 06:36 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
  • 06:36 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 06:16 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 04:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 04:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 04:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 04:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 04:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 04:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 02:51 Amir1: ladsgroup@mwmaint1002:~$ mwscript maintenance/storage/moveToExternal.php --wiki=enwiki --end 32000000 --undo /home/ladsgroup/T128151.undo.sql --iconv DB cluster27 (T128151)

2023-06-11

Welcome di casino online kami bosku, dengan banyak game casino yang menarik.

Progresif yang sangat besar, main blackjack langsung, baccarat atau poker secara live, terpilih menjadi kasino yang terbaik secara berturut - turut.

2023-06-10

  • 17:58 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 17:58 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye

2023-06-09

  • 21:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1011.eqiad.wmnet with OS bullseye
  • 21:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
  • 20:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
  • 20:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 20:38 btullis@cumin1001: END (ERROR) - Cookbook sre.aqs.roll-restart-reboot (exit_code=97) rolling restart_daemons on A:aqs
  • 20:23 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs
  • 17:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS bullseye
  • 17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host snapshot1016.eqiad.wmnet with OS buster
  • 17:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T336886)', diff saved to https://phabricator.wikimedia.org/P49398 and previous config saved to /var/cache/conftool/dbconfig/20230609-173202-ladsgroup.json
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P49397 and previous config saved to /var/cache/conftool/dbconfig/20230609-171656-ladsgroup.json
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P49396 and previous config saved to /var/cache/conftool/dbconfig/20230609-170150-ladsgroup.json
  • 16:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host snapshot1016.eqiad.wmnet with OS buster
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T336886)', diff saved to https://phabricator.wikimedia.org/P49395 and previous config saved to /var/cache/conftool/dbconfig/20230609-164644-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T336886)', diff saved to https://phabricator.wikimedia.org/P49394 and previous config saved to /var/cache/conftool/dbconfig/20230609-163007-ladsgroup.json
  • 16:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T336886)', diff saved to https://phabricator.wikimedia.org/P49393 and previous config saved to /var/cache/conftool/dbconfig/20230609-162946-ladsgroup.json
  • 16:20 urandom: powercycling restbase1028
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P49392 and previous config saved to /var/cache/conftool/dbconfig/20230609-161440-ladsgroup.json
  • 16:05 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host snapshot1017.mgmt.eqiad.wmnet with reboot policy FORCED
  • 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['snapshot1016']
  • 16:02 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['snapshot1016']
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P49391 and previous config saved to /var/cache/conftool/dbconfig/20230609-155934-ladsgroup.json
  • 15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host snapshot1016.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T336886)', diff saved to https://phabricator.wikimedia.org/P49390 and previous config saved to /var/cache/conftool/dbconfig/20230609-154428-ladsgroup.json
  • 15:30 andrewbogott: wikitech-static: deleted everything in /srv/mediawiki/images/wikitech/archive for T338520
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T336886)', diff saved to https://phabricator.wikimedia.org/P49388 and previous config saved to /var/cache/conftool/dbconfig/20230609-152845-ladsgroup.json
  • 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T336886)', diff saved to https://phabricator.wikimedia.org/P49387 and previous config saved to /var/cache/conftool/dbconfig/20230609-152824-ladsgroup.json
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host snapshot1017.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:27 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host snapshot1016.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:23 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for snapshot101[6-7] - pt1979@cumin2002"
  • 15:22 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for snapshot101[6-7] - pt1979@cumin2002"
  • 15:17 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P49386 and previous config saved to /var/cache/conftool/dbconfig/20230609-151318-ladsgroup.json
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P49385 and previous config saved to /var/cache/conftool/dbconfig/20230609-145812-ladsgroup.json
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T336886)', diff saved to https://phabricator.wikimedia.org/P49384 and previous config saved to /var/cache/conftool/dbconfig/20230609-144305-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T336886)', diff saved to https://phabricator.wikimedia.org/P49383 and previous config saved to /var/cache/conftool/dbconfig/20230609-142731-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 14:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49382 and previous config saved to /var/cache/conftool/dbconfig/20230609-142655-ladsgroup.json
  • 14:14 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P49381 and previous config saved to /var/cache/conftool/dbconfig/20230609-141149-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P49380 and previous config saved to /var/cache/conftool/dbconfig/20230609-135643-ladsgroup.json
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49379 and previous config saved to /var/cache/conftool/dbconfig/20230609-134137-ladsgroup.json
  • 13:29 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
  • 13:29 sukhe: start pybal on lvs2013
  • 13:28 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cp4037.ulsfo.wmnet with reason: Working on vk
  • 13:25 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49378 and previous config saved to /var/cache/conftool/dbconfig/20230609-132541-ladsgroup.json
  • 13:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 13:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49377 and previous config saved to /var/cache/conftool/dbconfig/20230609-132520-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P49376 and previous config saved to /var/cache/conftool/dbconfig/20230609-131014-ladsgroup.json
  • 13:07 sukhe: stop pybal on lvs2013 to test lvs2014
  • 13:02 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs2014
  • 13:02 sukhe: sudo cumin 'A:lvs and A:codfw' 'enable-puppet "CR 928818"'
  • 13:01 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2014
  • 12:59 sukhe: sudo cumin 'A:lvs and A:codfw' 'disable-puppet "CR 928818"'
  • 12:57 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2014
  • 12:57 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2014
  • 12:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2014
  • 12:55 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2014
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P49373 and previous config saved to /var/cache/conftool/dbconfig/20230609-125508-ladsgroup.json
  • 12:50 krinkle@deploy1002: Finished scap: I385d28 (duration: 06m 59s)
  • 12:43 krinkle@deploy1002: Started scap: I385d28
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49371 and previous config saved to /var/cache/conftool/dbconfig/20230609-124002-ladsgroup.json
  • 12:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:30 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-add DNS for cloud-hosts-codfw vlan. - cmooney@cumin1001"
  • 12:29 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Re-add DNS for cloud-hosts-codfw vlan. - cmooney@cumin1001"
  • 12:27 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T336886)', diff saved to https://phabricator.wikimedia.org/P49370 and previous config saved to /var/cache/conftool/dbconfig/20230609-122303-ladsgroup.json
  • 12:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T336886)', diff saved to https://phabricator.wikimedia.org/P49369 and previous config saved to /var/cache/conftool/dbconfig/20230609-122243-ladsgroup.json
  • 12:16 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:16 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2003-dev - aborrero@cumin2002"
  • 12:15 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudgw2003-dev - aborrero@cumin2002"
  • 12:13 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P49368 and previous config saved to /var/cache/conftool/dbconfig/20230609-120737-ladsgroup.json
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Fsero out of all services on: 778 hosts
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P49367 and previous config saved to /var/cache/conftool/dbconfig/20230609-115230-ladsgroup.json
  • 11:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Fsero out of all services on: 778 hosts
  • 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Fsero out of all services on: 1262 hosts
  • 11:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Fsero out of all services on: 1262 hosts
  • 11:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T336886)', diff saved to https://phabricator.wikimedia.org/P49366 and previous config saved to /var/cache/conftool/dbconfig/20230609-113724-ladsgroup.json
  • 11:27 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T336886)', diff saved to https://phabricator.wikimedia.org/P49365 and previous config saved to /var/cache/conftool/dbconfig/20230609-112250-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T336886)', diff saved to https://phabricator.wikimedia.org/P49364 and previous config saved to /var/cache/conftool/dbconfig/20230609-112229-ladsgroup.json
  • 11:20 sukhe: pcc-db1001: sudo systemctl start pcc_facts_processor.service
  • 11:14 sukhe: sudo /usr/local/sbin/puppet-facts-upload --proxy http://webproxy.eqiad.wmnet:8080
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P49363 and previous config saved to /var/cache/conftool/dbconfig/20230609-110723-ladsgroup.json
  • 11:02 sukhe: homer "cr*-codfw*" commit "Gerrit: 928113 add new LVS host lvs2014
  • 10:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2014.codfw.wmnet with OS bullseye
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P49362 and previous config saved to /var/cache/conftool/dbconfig/20230609-105217-ladsgroup.json
  • 10:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T336886)', diff saved to https://phabricator.wikimedia.org/P49361 and previous config saved to /var/cache/conftool/dbconfig/20230609-103711-ladsgroup.json
  • 10:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T336886)', diff saved to https://phabricator.wikimedia.org/P49360 and previous config saved to /var/cache/conftool/dbconfig/20230609-102217-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T336886)', diff saved to https://phabricator.wikimedia.org/P49359 and previous config saved to /var/cache/conftool/dbconfig/20230609-102156-ladsgroup.json
  • 10:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
  • 10:12 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 10:12 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 10:09 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 10:08 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P49358 and previous config saved to /var/cache/conftool/dbconfig/20230609-100650-ladsgroup.json
  • 09:57 elukey: increase {eqiad,codfw}.change-prop.transcludes.resource-change topic partitions (3->5) on kafka main clusters - T338357
  • 09:56 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:54 moritzm: installing jupyter-core security updates on bullseye
  • 09:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P49357 and previous config saved to /var/cache/conftool/dbconfig/20230609-095144-ladsgroup.json
  • 09:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T336886)', diff saved to https://phabricator.wikimedia.org/P49356 and previous config saved to /var/cache/conftool/dbconfig/20230609-093638-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T336886)', diff saved to https://phabricator.wikimedia.org/P49355 and previous config saved to /var/cache/conftool/dbconfig/20230609-092141-ladsgroup.json
  • 09:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T336886)', diff saved to https://phabricator.wikimedia.org/P49354 and previous config saved to /var/cache/conftool/dbconfig/20230609-090829-ladsgroup.json
  • 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P49353 and previous config saved to /var/cache/conftool/dbconfig/20230609-085322-ladsgroup.json
  • 08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P49352 and previous config saved to /var/cache/conftool/dbconfig/20230609-083816-ladsgroup.json
  • 08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T336886)', diff saved to https://phabricator.wikimedia.org/P49351 and previous config saved to /var/cache/conftool/dbconfig/20230609-082310-ladsgroup.json
  • 08:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T336886)', diff saved to https://phabricator.wikimedia.org/P49350 and previous config saved to /var/cache/conftool/dbconfig/20230609-080708-ladsgroup.json
  • 08:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T336886)', diff saved to https://phabricator.wikimedia.org/P49349 and previous config saved to /var/cache/conftool/dbconfig/20230609-080637-ladsgroup.json
  • 07:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P49348 and previous config saved to /var/cache/conftool/dbconfig/20230609-075130-ladsgroup.json
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P49347 and previous config saved to /var/cache/conftool/dbconfig/20230609-073624-ladsgroup.json
  • 07:33 jmm@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=mw1492.eqiad.wmnet
  • 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T336886)', diff saved to https://phabricator.wikimedia.org/P49346 and previous config saved to /var/cache/conftool/dbconfig/20230609-072118-ladsgroup.json
  • 07:19 moritzm: powercycling restbase2018 (kernel hung following what looks like I/O errors)
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T336886)', diff saved to https://phabricator.wikimedia.org/P49345 and previous config saved to /var/cache/conftool/dbconfig/20230609-070520-ladsgroup.json
  • 07:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 07:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T336886)', diff saved to https://phabricator.wikimedia.org/P49344 and previous config saved to /var/cache/conftool/dbconfig/20230609-070459-ladsgroup.json
  • 06:50 moritzm: installing wireshark security updates
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P49343 and previous config saved to /var/cache/conftool/dbconfig/20230609-064953-ladsgroup.json
  • 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: puppetmaster2005.codfw.wmnet
  • 06:49 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: puppetmaster2005.codfw.wmnet
  • 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: puppetmaster1005.eqiad.wmnet
  • 06:49 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: puppetmaster1005.eqiad.wmnet
  • 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: prometheus3001.esams.wmnet
  • 06:48 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: prometheus3001.esams.wmnet
  • 06:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
  • 06:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
  • 06:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P49342 and previous config saved to /var/cache/conftool/dbconfig/20230609-063447-ladsgroup.json
  • 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T336886)', diff saved to https://phabricator.wikimedia.org/P49341 and previous config saved to /var/cache/conftool/dbconfig/20230609-061941-ladsgroup.json
  • 06:06 eileen: config 97c57848 -> 6f4a9d19 restart jobs
  • 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T336886)', diff saved to https://phabricator.wikimedia.org/P49340 and previous config saved to /var/cache/conftool/dbconfig/20230609-060438-ladsgroup.json
  • 06:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 06:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 05:53 eileen: civicrm upgraded from 158896cc to 5bbed553
  • 05:52 eileen: config revision changed from 8b71fa7a to 97c57848
  • 05:50 moritzm: installing cpio security updates
  • 05:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 05:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 05:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T336886)', diff saved to https://phabricator.wikimedia.org/P49339 and previous config saved to /var/cache/conftool/dbconfig/20230609-052315-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P49338 and previous config saved to /var/cache/conftool/dbconfig/20230609-050809-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P49337 and previous config saved to /var/cache/conftool/dbconfig/20230609-045302-ladsgroup.json
  • 04:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T336886)', diff saved to https://phabricator.wikimedia.org/P49336 and previous config saved to /var/cache/conftool/dbconfig/20230609-043756-ladsgroup.json
  • 04:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1219 (T336886)', diff saved to https://phabricator.wikimedia.org/P49335 and previous config saved to /var/cache/conftool/dbconfig/20230609-042306-ladsgroup.json
  • 04:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 04:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 04:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T336886)', diff saved to https://phabricator.wikimedia.org/P49334 and previous config saved to /var/cache/conftool/dbconfig/20230609-042246-ladsgroup.json
  • 04:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P49333 and previous config saved to /var/cache/conftool/dbconfig/20230609-040739-ladsgroup.json
  • 03:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P49332 and previous config saved to /var/cache/conftool/dbconfig/20230609-035233-ladsgroup.json
  • 03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T336886)', diff saved to https://phabricator.wikimedia.org/P49331 and previous config saved to /var/cache/conftool/dbconfig/20230609-033727-ladsgroup.json
  • 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1218 (T336886)', diff saved to https://phabricator.wikimedia.org/P49330 and previous config saved to /var/cache/conftool/dbconfig/20230609-032127-ladsgroup.json
  • 03:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 03:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T336886)', diff saved to https://phabricator.wikimedia.org/P49329 and previous config saved to /var/cache/conftool/dbconfig/20230609-032106-ladsgroup.json
  • 03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P49328 and previous config saved to /var/cache/conftool/dbconfig/20230609-030600-ladsgroup.json
  • 02:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P49327 and previous config saved to /var/cache/conftool/dbconfig/20230609-025054-ladsgroup.json
  • 02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T336886)', diff saved to https://phabricator.wikimedia.org/P49326 and previous config saved to /var/cache/conftool/dbconfig/20230609-023548-ladsgroup.json
  • 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T336886)', diff saved to https://phabricator.wikimedia.org/P49325 and previous config saved to /var/cache/conftool/dbconfig/20230609-022054-ladsgroup.json
  • 02:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 02:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T336886)', diff saved to https://phabricator.wikimedia.org/P49324 and previous config saved to /var/cache/conftool/dbconfig/20230609-022034-ladsgroup.json
  • 02:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P49323 and previous config saved to /var/cache/conftool/dbconfig/20230609-020528-ladsgroup.json
  • 02:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
  • 02:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
  • 02:00 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P49322 and previous config saved to /var/cache/conftool/dbconfig/20230609-015021-ladsgroup.json
  • 01:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1011.eqiad.wmnet with OS bullseye
  • 01:48 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
  • 01:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T336886)', diff saved to https://phabricator.wikimedia.org/P49321 and previous config saved to /var/cache/conftool/dbconfig/20230609-013515-ladsgroup.json
  • 01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki-root1002.eqiad.wmnet with OS bullseye
  • 01:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T336886)', diff saved to https://phabricator.wikimedia.org/P49320 and previous config saved to /var/cache/conftool/dbconfig/20230609-011945-ladsgroup.json
  • 01:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 01:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 01:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T336886)', diff saved to https://phabricator.wikimedia.org/P49319 and previous config saved to /var/cache/conftool/dbconfig/20230609-011924-ladsgroup.json
  • 01:08 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P49318 and previous config saved to /var/cache/conftool/dbconfig/20230609-010418-ladsgroup.json
  • 00:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
  • 00:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage
  • 00:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1011.eqiad.wmnet with OS bullseye
  • 00:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P49317 and previous config saved to /var/cache/conftool/dbconfig/20230609-004912-ladsgroup.json
  • 00:48 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1002.eqiad.wmnet with reason: host reimage
  • 00:47 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
  • 00:34 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host pki-root1002.eqiad.wmnet with OS bullseye
  • 00:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T336886)', diff saved to https://phabricator.wikimedia.org/P49316 and previous config saved to /var/cache/conftool/dbconfig/20230609-003406-ladsgroup.json
  • 00:31 eileen: civicrm upgraded from 6f64e77d to 158896cc
  • 00:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pki-root1002']
  • 00:25 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki-root1002']
  • 00:24 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['pki-root1002']
  • 00:24 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pki-root1002']
  • 00:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki-root1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T336886)', diff saved to https://phabricator.wikimedia.org/P49315 and previous config saved to /var/cache/conftool/dbconfig/20230609-001821-ladsgroup.json
  • 00:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 00:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 00:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T336886)', diff saved to https://phabricator.wikimedia.org/P49314 and previous config saved to /var/cache/conftool/dbconfig/20230609-001732-ladsgroup.json
  • 00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P49313 and previous config saved to /var/cache/conftool/dbconfig/20230609-000226-ladsgroup.json

2023-06-08

  • 23:55 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1011.eqiad.wmnet with OS bullseye
  • 23:54 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1010.eqiad.wmnet with OS bullseye
  • 23:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 23:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host backup1010.eqiad.wmnet with OS bullseye
  • 23:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P49312 and previous config saved to /var/cache/conftool/dbconfig/20230608-234720-ladsgroup.json
  • 23:42 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host pki-root1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:41 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:41 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for pki-root - pt1979@cumin2002"
  • 23:40 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for pki-root - pt1979@cumin2002"
  • 23:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T336886)', diff saved to https://phabricator.wikimedia.org/P49311 and previous config saved to /var/cache/conftool/dbconfig/20230608-233214-ladsgroup.json
  • 23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T336886)', diff saved to https://phabricator.wikimedia.org/P49310 and previous config saved to /var/cache/conftool/dbconfig/20230608-231650-ladsgroup.json
  • 23:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T336886)', diff saved to https://phabricator.wikimedia.org/P49309 and previous config saved to /var/cache/conftool/dbconfig/20230608-231629-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P49308 and previous config saved to /var/cache/conftool/dbconfig/20230608-230123-ladsgroup.json
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P49307 and previous config saved to /var/cache/conftool/dbconfig/20230608-224617-ladsgroup.json
  • 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: decom
  • 22:39 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on gerrit1001.wikimedia.org with reason: decom
  • 22:37 mutante: gerrit1001 - rmdir /etc/ssh/userkeys/gerrit.d which leads to puppet warnings because it cant remove empty dir
  • 22:35 mutante: removing gerrit role from former gerrit prod machine gerrit1001, removes firewall rules, shell access, monitoring..etc
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T336886)', diff saved to https://phabricator.wikimedia.org/P49306 and previous config saved to /var/cache/conftool/dbconfig/20230608-223111-ladsgroup.json
  • 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T336886)', diff saved to https://phabricator.wikimedia.org/P49305 and previous config saved to /var/cache/conftool/dbconfig/20230608-221536-ladsgroup.json
  • 22:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 22:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 22:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T336886)', diff saved to https://phabricator.wikimedia.org/P49304 and previous config saved to /var/cache/conftool/dbconfig/20230608-221515-ladsgroup.json
  • 22:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P49303 and previous config saved to /var/cache/conftool/dbconfig/20230608-220009-ladsgroup.json
  • 21:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P49302 and previous config saved to /var/cache/conftool/dbconfig/20230608-214503-ladsgroup.json
  • 21:31 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T336886)', diff saved to https://phabricator.wikimedia.org/P49301 and previous config saved to /var/cache/conftool/dbconfig/20230608-212957-ladsgroup.json
  • 21:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T336886)', diff saved to https://phabricator.wikimedia.org/P49300 and previous config saved to /var/cache/conftool/dbconfig/20230608-211419-ladsgroup.json
  • 21:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 21:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 21:08 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup1011.eqiad.wmnet']
  • 21:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1011.eqiad.wmnet']
  • 21:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup1011.eqiad.wmnet']
  • 21:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup1010.eqiad.wmnet']
  • 21:07 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1010.eqiad.wmnet']
  • 21:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup1010.eqiad.wmnet']
  • 21:06 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1011.eqiad.wmnet']
  • 21:05 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup1010.eqiad.wmnet']
  • 21:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 21:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T336886)', diff saved to https://phabricator.wikimedia.org/P49298 and previous config saved to /var/cache/conftool/dbconfig/20230608-204722-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P49297 and previous config saved to /var/cache/conftool/dbconfig/20230608-203216-ladsgroup.json
  • 20:31 ladsgroup@deploy1002: Finished scap: Backport for Externallinks: Make port part of the index (T337149) (duration: 10m 10s)
  • 20:22 ladsgroup@deploy1002: ladsgroup: Backport for Externallinks: Make port part of the index (T337149) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:21 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1028.eqiad.wmnet with OS bullseye
  • 20:20 ladsgroup@deploy1002: Started scap: Backport for Externallinks: Make port part of the index (T337149)
  • 20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P49296 and previous config saved to /var/cache/conftool/dbconfig/20230608-201710-ladsgroup.json
  • 20:12 ladsgroup@deploy1002: Finished scap: Backport for Remove VectorLimitedWidthIndicator (T336197) (duration: 07m 32s)
  • 20:06 ladsgroup@deploy1002: ladsgroup and ksarabia: Backport for Remove VectorLimitedWidthIndicator (T336197) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:05 ladsgroup@deploy1002: Started scap: Backport for Remove VectorLimitedWidthIndicator (T336197)
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T336886)', diff saved to https://phabricator.wikimedia.org/P49295 and previous config saved to /var/cache/conftool/dbconfig/20230608-200204-ladsgroup.json
  • 20:01 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:56 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
  • 19:54 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1028.eqiad.wmnet with reason: host reimage
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T336886)', diff saved to https://phabricator.wikimedia.org/P49294 and previous config saved to /var/cache/conftool/dbconfig/20230608-194555-ladsgroup.json
  • 19:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 19:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T336886)', diff saved to https://phabricator.wikimedia.org/P49293 and previous config saved to /var/cache/conftool/dbconfig/20230608-194534-ladsgroup.json
  • 19:40 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1028.eqiad.wmnet with OS bullseye
  • 19:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P49292 and previous config saved to /var/cache/conftool/dbconfig/20230608-193028-ladsgroup.json
  • 19:22 jclark@cumin1001: START - Cookbook sre.hosts.provision for host backup1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P49291 and previous config saved to /var/cache/conftool/dbconfig/20230608-191522-ladsgroup.json
  • 19:08 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T336886)', diff saved to https://phabricator.wikimedia.org/P49290 and previous config saved to /var/cache/conftool/dbconfig/20230608-190016-ladsgroup.json
  • 18:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T336886)', diff saved to https://phabricator.wikimedia.org/P49289 and previous config saved to /var/cache/conftool/dbconfig/20230608-184312-ladsgroup.json
  • 18:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T336886)', diff saved to https://phabricator.wikimedia.org/P49288 and previous config saved to /var/cache/conftool/dbconfig/20230608-184251-ladsgroup.json
  • 18:36 jclark@cumin1001: START - Cookbook sre.hosts.provision for host backup1011.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:36 jclark@cumin1001: START - Cookbook sre.hosts.provision for host backup1010.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P49287 and previous config saved to /var/cache/conftool/dbconfig/20230608-182745-ladsgroup.json
  • 18:24 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in eqiad: maintenance
  • 18:19 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in eqiad: maintenance
  • 18:18 urandom: (Re)pooling sessionstore/eqiad — T337426
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P49286 and previous config saved to /var/cache/conftool/dbconfig/20230608-181238-ladsgroup.json
  • 18:09 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.12 refs T337526
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T336886)', diff saved to https://phabricator.wikimedia.org/P49285 and previous config saved to /var/cache/conftool/dbconfig/20230608-175732-ladsgroup.json
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T336886)', diff saved to https://phabricator.wikimedia.org/P49284 and previous config saved to /var/cache/conftool/dbconfig/20230608-174135-ladsgroup.json
  • 17:41 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 17:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:36 stevemunene@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 17:35 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:31 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:31 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:30 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:30 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:28 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T336886)', diff saved to https://phabricator.wikimedia.org/P49283 and previous config saved to /var/cache/conftool/dbconfig/20230608-172746-ladsgroup.json
  • 17:24 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:14 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P49282 and previous config saved to /var/cache/conftool/dbconfig/20230608-171240-ladsgroup.json
  • 17:10 stevemunene@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 17:05 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetmaster1006.eqiad.wmnet with OS bullseye
  • 17:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:58 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P49281 and previous config saved to /var/cache/conftool/dbconfig/20230608-165734-ladsgroup.json
  • 16:56 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs
  • 16:46 urandom: Starting traffic test against sessionstore.svc.eqiad.wmnet — T337426
  • 16:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetmaster1006.eqiad.wmnet with reason: host reimage
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T336886)', diff saved to https://phabricator.wikimedia.org/P49280 and previous config saved to /var/cache/conftool/dbconfig/20230608-164228-ladsgroup.json
  • 16:41 urandom: Upgrading Cassandra to 4.1.1, sessionstore1003 — T337426
  • 16:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetmaster1006.eqiad.wmnet with reason: host reimage
  • 16:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host puppetmaster1006.eqiad.wmnet with OS bullseye
  • 16:36 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetmaster1006.eqiad.wmnet with OS bullseye
  • 16:35 urandom: Upgrading Cassandra to 4.1.1, sessionstore1002 — T337426
  • 16:34 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs
  • 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T336886)', diff saved to https://phabricator.wikimedia.org/P49279 and previous config saved to /var/cache/conftool/dbconfig/20230608-162650-ladsgroup.json
  • 16:26 urandom: Upgrading Cassandra to 4.1.1, sessionstore1001 — T337426
  • 16:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 16:22 urandom: creating pre-upgrade Cassandra snapshots, sessionstore/eqiad — T337426
  • 16:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 16:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 16:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 16:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 16:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 16:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 16:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 16:11 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in eqiad: maintenance
  • 16:06 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2014.codfw.wmnet with OS bullseye
  • 16:06 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in eqiad: maintenance
  • 16:06 urandom: depooling eqiad sessionstore for Cassandra upgrade — T337426
  • 16:00 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
  • 15:58 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs2014.codfw.wmnet with OS bullseye
  • 15:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:23 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
  • 15:20 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host puppetmaster1006.eqiad.wmnet with OS bullseye
  • 15:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['puppetmaster1006']
  • 15:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['puppetmaster1006']
  • 15:09 moritzm: installing c-ares security updates on bullseye
  • 14:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:42 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 14:41 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 14:41 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 14:41 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 14:36 moritzm: installing libwep security updates on buster
  • 14:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['sretest1003']
  • 14:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['sretest1003']
  • 14:28 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 14:28 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetmaster1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:19 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:19 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
  • 14:17 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
  • 14:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2014.codfw.wmnet with OS bullseye
  • 14:14 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:13 XioNoX: cloudsw2-c8-eqiad> request system zeroize - T338459
  • 14:13 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:11 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:09 XioNoX: decom cloudsw2-c8-eqiad - T338459
  • 14:08 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:07 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1001"
  • 14:07 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:07 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:06 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:04 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:04 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
  • 14:02 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
  • 14:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:00 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:59 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:58 ladsgroup@deploy1002: Finished scap: Backport for Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153) (duration: 09m 13s)
  • 13:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
  • 13:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2014.codfw.wmnet with reason: host reimage
  • 13:52 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
  • 13:52 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:51 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:51 ladsgroup@deploy1002: ladsgroup: Backport for Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:51 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:49 ladsgroup@deploy1002: Started scap: Backport for Remove svwiktionary, svwiki and dawiki from legacy encoding (T128156 T128152 T128153)
  • 13:49 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host puppetmaster1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:48 jclark@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudswift1002.eqiad.wmnet with reason: host reimage
  • 13:44 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:44 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
  • 13:43 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:43 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse for new ns-recursor.openstack.codfw1dev.wikimediacloud.org IP. - cmooney@cumin1001"
  • 13:41 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:40 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:39 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:36 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 13:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
  • 13:30 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:29 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:06 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:05 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:05 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:57 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:36 cmooney@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937 (duration: 17m 22s)
  • 12:19 topranks: De-pooling lvs1017 to move link to lsw1-e1-eqiad to ssw1-e1-eqiad T322937
  • 12:18 cmooney@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937
  • 12:12 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:11 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:03 vgutierrez: restore cp4052 HAProxy configuration - T317799
  • 11:51 vgutierrez: repooling cp4052 - T317799
  • 11:40 vgutierrez: depooling cp4052 for some HAProxy tests - T317799
  • 11:28 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=nlwiki --iconv DB cluster26 (T128154)
  • 11:03 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=dawiki --iconv DB cluster27 (T128153)
  • 10:49 Amir1: mwscript maintenance/storage/moveToExternal.php --wiki=svwiki --iconv DB cluster27 (T128153)
  • 10:22 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:21 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 09:58 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@bb7526e]: (no justification provided) (duration: 00m 08s)
  • 09:57 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@bb7526e]: (no justification provided)
  • 09:40 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver2001.codfw.wmnet with OS bookworm
  • 09:40 jbond@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin2002"
  • 09:24 vgutierrez: updated to HAProxy 2.7.9 on cp4052 and cp5032
  • 09:22 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5032.eqsin.wmnet,cp4052.ulsfo.wmnet} and A:cp
  • 09:19 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
  • 09:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
  • 09:17 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5032.eqsin.wmnet,cp4052.ulsfo.wmnet} and A:cp
  • 09:10 vgutierrez: fetch HAProxy 2.7.9 for thirdparty/haproxy27 bullseye (apt.wm.o)
  • 08:54 apergos: UTC morning backport and config training window done
  • 08:38 ariel@deploy1002: Finished scap: Backport for [ruwiki] Add an editautoreviewprotected level protecion (T337430) (duration: 08m 25s)
  • 08:31 ariel@deploy1002: ariel and superpes: Backport for [ruwiki] Add an editautoreviewprotected level protecion (T337430) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:30 ariel@deploy1002: Started scap: Backport for [ruwiki] Add an editautoreviewprotected level protecion (T337430)
  • 08:25 ariel@deploy1002: Finished scap: Backport for [fiwiki] Limitate the use of the ContentTranslation tool (T337412) (duration: 09m 16s)
  • 08:17 ariel@deploy1002: superpes and ariel: Backport for [fiwiki] Limitate the use of the ContentTranslation tool (T337412) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 08:16 ariel@deploy1002: Started scap: Backport for [fiwiki] Limitate the use of the ContentTranslation tool (T337412)
  • 08:12 ariel@deploy1002: Finished scap: Backport for [itwiktionary] Add a tagline (T337688) (duration: 08m 07s)
  • 08:06 ariel@deploy1002: ariel and superpes: Backport for [itwiktionary] Add a tagline (T337688) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:04 ariel@deploy1002: Started scap: Backport for [itwiktionary] Add a tagline (T337688)
  • 07:49 ariel@deploy1002: Finished scap: Backport for [kaawiki] Change the logo with an HD version and the tagline (T337641) (duration: 09m 09s)
  • 07:41 ariel@deploy1002: ariel and superpes: Backport for [kaawiki] Change the logo with an HD version and the tagline (T337641) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:40 ariel@deploy1002: Started scap: Backport for [kaawiki] Change the logo with an HD version and the tagline (T337641)
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T336886)', diff saved to https://phabricator.wikimedia.org/P49271 and previous config saved to /var/cache/conftool/dbconfig/20230608-073524-ladsgroup.json
  • 07:27 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337834) (duration: 09m 19s)
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P49270 and previous config saved to /var/cache/conftool/dbconfig/20230608-072018-ladsgroup.json
  • 07:19 kartik@deploy1002: kartik: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337834) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:17 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337834)
  • 07:14 elukey: delete pod kask-production-7dfdfc7cbc-2vw5q in wikikube codfw, since it was scheduled on a non dedicated node
  • 07:14 kartik@deploy1002: Finished scap: Backport for Enable Content and Section Translation for 9 Wikipedia (T337290) (duration: 09m 52s)
  • 07:06 kartik@deploy1002: kartik: Backport for Enable Content and Section Translation for 9 Wikipedia (T337290) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P49268 and previous config saved to /var/cache/conftool/dbconfig/20230608-070512-ladsgroup.json
  • 07:04 kartik@deploy1002: Started scap: Backport for Enable Content and Section Translation for 9 Wikipedia (T337290)
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T336886)', diff saved to https://phabricator.wikimedia.org/P49267 and previous config saved to /var/cache/conftool/dbconfig/20230608-065006-ladsgroup.json
  • 06:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T336886)', diff saved to https://phabricator.wikimedia.org/P49266 and previous config saved to /var/cache/conftool/dbconfig/20230608-064508-ladsgroup.json
  • 06:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 06:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 06:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T336886)', diff saved to https://phabricator.wikimedia.org/P49265 and previous config saved to /var/cache/conftool/dbconfig/20230608-064447-ladsgroup.json
  • 06:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P49264 and previous config saved to /var/cache/conftool/dbconfig/20230608-062941-ladsgroup.json
  • 06:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P49263 and previous config saved to /var/cache/conftool/dbconfig/20230608-061435-ladsgroup.json
  • 06:10 elukey: kill remaining processes for `andyrussg` on stat100x nodes to unblock puppet
  • 05:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T336886)', diff saved to https://phabricator.wikimedia.org/P49262 and previous config saved to /var/cache/conftool/dbconfig/20230608-055929-ladsgroup.json
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T336886)', diff saved to https://phabricator.wikimedia.org/P49261 and previous config saved to /var/cache/conftool/dbconfig/20230608-055432-ladsgroup.json
  • 05:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T336886)', diff saved to https://phabricator.wikimedia.org/P49260 and previous config saved to /var/cache/conftool/dbconfig/20230608-055411-ladsgroup.json
  • 05:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P49259 and previous config saved to /var/cache/conftool/dbconfig/20230608-053904-ladsgroup.json
  • 05:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P49258 and previous config saved to /var/cache/conftool/dbconfig/20230608-052358-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T336886)', diff saved to https://phabricator.wikimedia.org/P49257 and previous config saved to /var/cache/conftool/dbconfig/20230608-050852-ladsgroup.json
  • 05:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T336886)', diff saved to https://phabricator.wikimedia.org/P49256 and previous config saved to /var/cache/conftool/dbconfig/20230608-050353-ladsgroup.json
  • 05:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 05:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 05:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 05:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 05:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49255 and previous config saved to /var/cache/conftool/dbconfig/20230608-050328-ladsgroup.json
  • 04:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P49254 and previous config saved to /var/cache/conftool/dbconfig/20230608-044821-ladsgroup.json
  • 04:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P49253 and previous config saved to /var/cache/conftool/dbconfig/20230608-043315-ladsgroup.json
  • 04:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49252 and previous config saved to /var/cache/conftool/dbconfig/20230608-041809-ladsgroup.json
  • 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49251 and previous config saved to /var/cache/conftool/dbconfig/20230608-041311-ladsgroup.json
  • 04:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 04:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 04:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 04:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49250 and previous config saved to /var/cache/conftool/dbconfig/20230608-040935-ladsgroup.json
  • 03:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P49249 and previous config saved to /var/cache/conftool/dbconfig/20230608-035428-ladsgroup.json
  • 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P49248 and previous config saved to /var/cache/conftool/dbconfig/20230608-033922-ladsgroup.json
  • 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49247 and previous config saved to /var/cache/conftool/dbconfig/20230608-032416-ladsgroup.json
  • 03:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49246 and previous config saved to /var/cache/conftool/dbconfig/20230608-031911-ladsgroup.json
  • 03:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 03:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 03:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49245 and previous config saved to /var/cache/conftool/dbconfig/20230608-031901-ladsgroup.json
  • 03:11 eileen: civicrm upgraded from 066095b8 to 6f64e77d
  • 03:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P49244 and previous config saved to /var/cache/conftool/dbconfig/20230608-030355-ladsgroup.json
  • 02:54 samtar@deploy1002: Finished scap: Backport for Remove additional v1 suffix when computing internalRestbaseURL (T334842 T338381) (duration: 09m 50s)
  • 02:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P49243 and previous config saved to /var/cache/conftool/dbconfig/20230608-024849-ladsgroup.json
  • 02:46 samtar@deploy1002: samtar: Backport for Remove additional v1 suffix when computing internalRestbaseURL (T334842 T338381) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 02:44 samtar@deploy1002: Started scap: Backport for Remove additional v1 suffix when computing internalRestbaseURL (T334842 T338381)
  • 02:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49242 and previous config saved to /var/cache/conftool/dbconfig/20230608-023343-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49241 and previous config saved to /var/cache/conftool/dbconfig/20230608-022842-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 02:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T336886)', diff saved to https://phabricator.wikimedia.org/P49240 and previous config saved to /var/cache/conftool/dbconfig/20230608-022821-ladsgroup.json
  • 02:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P49239 and previous config saved to /var/cache/conftool/dbconfig/20230608-021315-ladsgroup.json
  • 01:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P49238 and previous config saved to /var/cache/conftool/dbconfig/20230608-015809-ladsgroup.json
  • 01:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T336886)', diff saved to https://phabricator.wikimedia.org/P49237 and previous config saved to /var/cache/conftool/dbconfig/20230608-014303-ladsgroup.json
  • 01:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T336886)', diff saved to https://phabricator.wikimedia.org/P49236 and previous config saved to /var/cache/conftool/dbconfig/20230608-013808-ladsgroup.json
  • 01:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 01:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T336886)', diff saved to https://phabricator.wikimedia.org/P49235 and previous config saved to /var/cache/conftool/dbconfig/20230608-013736-ladsgroup.json
  • 01:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 01:23 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P49234 and previous config saved to /var/cache/conftool/dbconfig/20230608-012230-ladsgroup.json
  • 01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T336886)', diff saved to https://phabricator.wikimedia.org/P49233 and previous config saved to /var/cache/conftool/dbconfig/20230608-010853-ladsgroup.json
  • 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P49232 and previous config saved to /var/cache/conftool/dbconfig/20230608-010724-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P49231 and previous config saved to /var/cache/conftool/dbconfig/20230608-005347-ladsgroup.json
  • 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T336886)', diff saved to https://phabricator.wikimedia.org/P49230 and previous config saved to /var/cache/conftool/dbconfig/20230608-005218-ladsgroup.json
  • 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T336886)', diff saved to https://phabricator.wikimedia.org/P49229 and previous config saved to /var/cache/conftool/dbconfig/20230608-004713-ladsgroup.json
  • 00:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 00:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 00:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T336886)', diff saved to https://phabricator.wikimedia.org/P49228 and previous config saved to /var/cache/conftool/dbconfig/20230608-004653-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P49227 and previous config saved to /var/cache/conftool/dbconfig/20230608-003841-ladsgroup.json
  • 00:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P49226 and previous config saved to /var/cache/conftool/dbconfig/20230608-003146-ladsgroup.json
  • 00:28 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-cluster (exit_code=99)
  • 00:28 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-cluster
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T336886)', diff saved to https://phabricator.wikimedia.org/P49225 and previous config saved to /var/cache/conftool/dbconfig/20230608-002335-ladsgroup.json
  • 00:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P49224 and previous config saved to /var/cache/conftool/dbconfig/20230608-001640-ladsgroup.json
  • 00:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T336886)', diff saved to https://phabricator.wikimedia.org/P49223 and previous config saved to /var/cache/conftool/dbconfig/20230608-001555-ladsgroup.json
  • 00:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 00:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 00:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49222 and previous config saved to /var/cache/conftool/dbconfig/20230608-001534-ladsgroup.json
  • 00:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T336886)', diff saved to https://phabricator.wikimedia.org/P49221 and previous config saved to /var/cache/conftool/dbconfig/20230608-000134-ladsgroup.json
  • 00:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P49220 and previous config saved to /var/cache/conftool/dbconfig/20230608-000028-ladsgroup.json

2023-06-07

  • 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T336886)', diff saved to https://phabricator.wikimedia.org/P49219 and previous config saved to /var/cache/conftool/dbconfig/20230607-235624-ladsgroup.json
  • 23:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 23:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T336886)', diff saved to https://phabricator.wikimedia.org/P49218 and previous config saved to /var/cache/conftool/dbconfig/20230607-235603-ladsgroup.json
  • 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P49217 and previous config saved to /var/cache/conftool/dbconfig/20230607-234522-ladsgroup.json
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P49216 and previous config saved to /var/cache/conftool/dbconfig/20230607-234057-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49215 and previous config saved to /var/cache/conftool/dbconfig/20230607-233016-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P49214 and previous config saved to /var/cache/conftool/dbconfig/20230607-232551-ladsgroup.json
  • 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49213 and previous config saved to /var/cache/conftool/dbconfig/20230607-232223-ladsgroup.json
  • 23:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 23:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49212 and previous config saved to /var/cache/conftool/dbconfig/20230607-232203-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T336886)', diff saved to https://phabricator.wikimedia.org/P49211 and previous config saved to /var/cache/conftool/dbconfig/20230607-231045-ladsgroup.json
  • 23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P49210 and previous config saved to /var/cache/conftool/dbconfig/20230607-230657-ladsgroup.json
  • 23:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T336886)', diff saved to https://phabricator.wikimedia.org/P49209 and previous config saved to /var/cache/conftool/dbconfig/20230607-230540-ladsgroup.json
  • 23:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 23:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 23:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 23:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 22:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T336886)', diff saved to https://phabricator.wikimedia.org/P49208 and previous config saved to /var/cache/conftool/dbconfig/20230607-225926-ladsgroup.json
  • 22:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P49207 and previous config saved to /var/cache/conftool/dbconfig/20230607-225150-ladsgroup.json
  • 22:45 zabe@deploy1002: Finished scap: T338287 (duration: 07m 30s)
  • 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P49206 and previous config saved to /var/cache/conftool/dbconfig/20230607-224420-ladsgroup.json
  • 22:38 zabe@deploy1002: Started scap: T338287
  • 22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49205 and previous config saved to /var/cache/conftool/dbconfig/20230607-223644-ladsgroup.json
  • 22:34 zabe@deploy1002: Sync cancelled.
  • 22:34 zabe@deploy1002: zabe: Backport for Use cuc_timestamp as index field when reading old (T338287) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 22:32 zabe@deploy1002: Started scap: Backport for Use cuc_timestamp as index field when reading old (T338287)
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P49204 and previous config saved to /var/cache/conftool/dbconfig/20230607-222914-ladsgroup.json
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49203 and previous config saved to /var/cache/conftool/dbconfig/20230607-222905-ladsgroup.json
  • 22:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49202 and previous config saved to /var/cache/conftool/dbconfig/20230607-222844-ladsgroup.json
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T336886)', diff saved to https://phabricator.wikimedia.org/P49201 and previous config saved to /var/cache/conftool/dbconfig/20230607-221408-ladsgroup.json
  • 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P49200 and previous config saved to /var/cache/conftool/dbconfig/20230607-221338-ladsgroup.json
  • 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1221 (T336886)', diff saved to https://phabricator.wikimedia.org/P49199 and previous config saved to /var/cache/conftool/dbconfig/20230607-220859-ladsgroup.json
  • 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T336886)', diff saved to https://phabricator.wikimedia.org/P49198 and previous config saved to /var/cache/conftool/dbconfig/20230607-220821-ladsgroup.json
  • 22:05 eileen: civicrm upgraded from bcc8fccc to 066095b8
  • 22:05 zabe@deploy1002: Finished scap: Backport for Use cuc_timestamp as index field when reading old (T338287) (duration: 11m 48s)
  • 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P49197 and previous config saved to /var/cache/conftool/dbconfig/20230607-215831-ladsgroup.json
  • 21:55 zabe@deploy1002: dreamyjazz and zabe: Backport for Use cuc_timestamp as index field when reading old (T338287) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:53 zabe@deploy1002: Started scap: Backport for Use cuc_timestamp as index field when reading old (T338287)
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P49196 and previous config saved to /var/cache/conftool/dbconfig/20230607-215315-ladsgroup.json
  • 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49195 and previous config saved to /var/cache/conftool/dbconfig/20230607-214325-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P49194 and previous config saved to /var/cache/conftool/dbconfig/20230607-213809-ladsgroup.json
  • 21:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2012.codfw.wmnet
  • 21:36 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2012.codfw.wmnet
  • 21:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49193 and previous config saved to /var/cache/conftool/dbconfig/20230607-213530-ladsgroup.json
  • 21:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 21:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 21:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T336886)', diff saved to https://phabricator.wikimedia.org/P49192 and previous config saved to /var/cache/conftool/dbconfig/20230607-213509-ladsgroup.json
  • 21:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 21:32 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 21:32 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs1016.eqiad.wmnet
  • 21:32 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs1016.eqiad.wmnet
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T336886)', diff saved to https://phabricator.wikimedia.org/P49191 and previous config saved to /var/cache/conftool/dbconfig/20230607-212303-ladsgroup.json
  • 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P49190 and previous config saved to /var/cache/conftool/dbconfig/20230607-212003-ladsgroup.json
  • 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T336886)', diff saved to https://phabricator.wikimedia.org/P49189 and previous config saved to /var/cache/conftool/dbconfig/20230607-211807-ladsgroup.json
  • 21:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 21:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T336886)', diff saved to https://phabricator.wikimedia.org/P49188 and previous config saved to /var/cache/conftool/dbconfig/20230607-211746-ladsgroup.json
  • 21:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P49187 and previous config saved to /var/cache/conftool/dbconfig/20230607-210457-ladsgroup.json
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P49186 and previous config saved to /var/cache/conftool/dbconfig/20230607-210240-ladsgroup.json
  • 20:50 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 20:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T336886)', diff saved to https://phabricator.wikimedia.org/P49185 and previous config saved to /var/cache/conftool/dbconfig/20230607-204951-ladsgroup.json
  • 20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P49184 and previous config saved to /var/cache/conftool/dbconfig/20230607-204734-ladsgroup.json
  • 20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T336886)', diff saved to https://phabricator.wikimedia.org/P49183 and previous config saved to /var/cache/conftool/dbconfig/20230607-204728-ladsgroup.json
  • 20:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 20:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 20:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 20:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T336886)', diff saved to https://phabricator.wikimedia.org/P49182 and previous config saved to /var/cache/conftool/dbconfig/20230607-204652-ladsgroup.json
  • 20:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 20:35 catrope@deploy1002: Finished scap: Backport for Link to translations of CC BY-SA 4.0 where possible (T319064) (duration: 12m 12s)
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T336886)', diff saved to https://phabricator.wikimedia.org/P49181 and previous config saved to /var/cache/conftool/dbconfig/20230607-203228-ladsgroup.json
  • 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P49180 and previous config saved to /var/cache/conftool/dbconfig/20230607-203146-ladsgroup.json
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T336886)', diff saved to https://phabricator.wikimedia.org/P49179 and previous config saved to /var/cache/conftool/dbconfig/20230607-202733-ladsgroup.json
  • 20:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 20:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 20:24 catrope@deploy1002: catrope: Backport for Link to translations of CC BY-SA 4.0 where possible (T319064) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 20:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 20:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T336886)', diff saved to https://phabricator.wikimedia.org/P49178 and previous config saved to /var/cache/conftool/dbconfig/20230607-202408-ladsgroup.json
  • 20:23 catrope@deploy1002: Started scap: Backport for Link to translations of CC BY-SA 4.0 where possible (T319064)
  • 20:18 catrope@deploy1002: Finished scap: Backport for Deploy GDI safety survey to JA and RU wikis. (T337728) (duration: 10m 53s)
  • 20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P49177 and previous config saved to /var/cache/conftool/dbconfig/20230607-201640-ladsgroup.json
  • 20:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: attempting WDQS stack on bullseye
  • 20:15 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: attempting WDQS stack on bullseye
  • 20:15 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: attempting WDQS stack on bullseye
  • 20:14 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs1016.eqiad.wmnet with reason: attempting WDQS stack on bullseye
  • 20:11 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2012.codfw.wmnet
  • 20:11 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2012.codfw.wmnet
  • 20:09 catrope@deploy1002: catrope and essexigyan: Backport for Deploy GDI safety survey to JA and RU wikis. (T337728) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P49176 and previous config saved to /var/cache/conftool/dbconfig/20230607-200902-ladsgroup.json
  • 20:07 catrope@deploy1002: Started scap: Backport for Deploy GDI safety survey to JA and RU wikis. (T337728)
  • 20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T336886)', diff saved to https://phabricator.wikimedia.org/P49175 and previous config saved to /var/cache/conftool/dbconfig/20230607-200134-ladsgroup.json
  • 19:54 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P49174 and previous config saved to /var/cache/conftool/dbconfig/20230607-195356-ladsgroup.json
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T336886)', diff saved to https://phabricator.wikimedia.org/P49173 and previous config saved to /var/cache/conftool/dbconfig/20230607-195316-ladsgroup.json
  • 19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T336886)', diff saved to https://phabricator.wikimedia.org/P49172 and previous config saved to /var/cache/conftool/dbconfig/20230607-195255-ladsgroup.json
  • 19:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 19:41 taavi: manually created 3 global accounts T338197
  • 19:40 bblack: cp*: disabling puppet temporarily out of an abundance of caution
  • 19:40 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
  • 19:40 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
  • 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T336886)', diff saved to https://phabricator.wikimedia.org/P49171 and previous config saved to /var/cache/conftool/dbconfig/20230607-193850-ladsgroup.json
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P49170 and previous config saved to /var/cache/conftool/dbconfig/20230607-193749-ladsgroup.json
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T336886)', diff saved to https://phabricator.wikimedia.org/P49169 and previous config saved to /var/cache/conftool/dbconfig/20230607-193357-ladsgroup.json
  • 19:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 19:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49168 and previous config saved to /var/cache/conftool/dbconfig/20230607-193326-ladsgroup.json
  • 19:23 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 19:23 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P49167 and previous config saved to /var/cache/conftool/dbconfig/20230607-192243-ladsgroup.json
  • 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P49166 and previous config saved to /var/cache/conftool/dbconfig/20230607-191820-ladsgroup.json
  • 19:16 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool sessionstore in codfw: maintenance
  • 19:11 eevans@cumin1001: START - Cookbook sre.discovery.service-route pool sessionstore in codfw: maintenance
  • 19:11 urandom: (Re)pooling codfw sessionstore — T337426
  • 19:09 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore2001.codfw.wmnet
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T336886)', diff saved to https://phabricator.wikimedia.org/P49165 and previous config saved to /var/cache/conftool/dbconfig/20230607-190737-ladsgroup.json
  • 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T336886)', diff saved to https://phabricator.wikimedia.org/P49164 and previous config saved to /var/cache/conftool/dbconfig/20230607-190514-ladsgroup.json
  • 19:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P49163 and previous config saved to /var/cache/conftool/dbconfig/20230607-190314-ladsgroup.json
  • 19:02 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore2001.codfw.wmnet
  • 18:59 pt1979@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 18:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 18:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49162 and previous config saved to /var/cache/conftool/dbconfig/20230607-184808-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 18:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T336886)', diff saved to https://phabricator.wikimedia.org/P49161 and previous config saved to /var/cache/conftool/dbconfig/20230607-184712-ladsgroup.json
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T336886)', diff saved to https://phabricator.wikimedia.org/P49160 and previous config saved to /var/cache/conftool/dbconfig/20230607-184411-ladsgroup.json
  • 18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49159 and previous config saved to /var/cache/conftool/dbconfig/20230607-184351-ladsgroup.json
  • 18:41 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P49158 and previous config saved to /var/cache/conftool/dbconfig/20230607-183206-ladsgroup.json
  • 18:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3052.esams.wmnet
  • 18:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P49157 and previous config saved to /var/cache/conftool/dbconfig/20230607-182845-ladsgroup.json
  • 18:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1135.eqiad.wmnet with reason: T338354
  • 18:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1135.eqiad.wmnet with reason: T338354
  • 18:22 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3052.esams.wmnet
  • 18:20 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.12 refs T337526 (duration: 06m 05s)
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P49156 and previous config saved to /var/cache/conftool/dbconfig/20230607-181700-ladsgroup.json
  • 18:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.12 refs T337526
  • 18:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P49155 and previous config saved to /var/cache/conftool/dbconfig/20230607-181339-ladsgroup.json
  • 18:08 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@d90d5c8]: (no justification provided) (duration: 00m 33s)
  • 18:07 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@d90d5c8]: (no justification provided)
  • 18:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs2014.codfw.wmnet with OS bullseye
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T336886)', diff saved to https://phabricator.wikimedia.org/P49154 and previous config saved to /var/cache/conftool/dbconfig/20230607-180154-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49153 and previous config saved to /var/cache/conftool/dbconfig/20230607-175833-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T336886)', diff saved to https://phabricator.wikimedia.org/P49152 and previous config saved to /var/cache/conftool/dbconfig/20230607-175347-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T336886)', diff saved to https://phabricator.wikimedia.org/P49151 and previous config saved to /var/cache/conftool/dbconfig/20230607-175337-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T336886)', diff saved to https://phabricator.wikimedia.org/P49150 and previous config saved to /var/cache/conftool/dbconfig/20230607-175327-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49149 and previous config saved to /var/cache/conftool/dbconfig/20230607-175316-ladsgroup.json
  • 17:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=ats-be
  • 17:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3050.esams.wmnet,service=cdn
  • 17:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=ats-be
  • 17:50 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp3051.esams.wmnet,service=cdn
  • 17:46 inflatador: bking@wdqs depool wdqs2012 T321605
  • 17:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3051.esams.wmnet
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P49148 and previous config saved to /var/cache/conftool/dbconfig/20230607-173821-ladsgroup.json
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P49147 and previous config saved to /var/cache/conftool/dbconfig/20230607-173810-ladsgroup.json
  • 17:34 cwhite@cumin2002: dbctl commit (dc=all): 'depool db1135', diff saved to https://phabricator.wikimedia.org/P49146 and previous config saved to /var/cache/conftool/dbconfig/20230607-173453-cwhite.json
  • 17:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3051.esams.wmnet
  • 17:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P49145 and previous config saved to /var/cache/conftool/dbconfig/20230607-172315-ladsgroup.json
  • 17:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P49144 and previous config saved to /var/cache/conftool/dbconfig/20230607-172304-ladsgroup.json
  • 17:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 17:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T336886)', diff saved to https://phabricator.wikimedia.org/P49143 and previous config saved to /var/cache/conftool/dbconfig/20230607-170808-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49142 and previous config saved to /var/cache/conftool/dbconfig/20230607-170758-ladsgroup.json
  • 17:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2014.codfw.wmnet with OS bullseye
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T336886)', diff saved to https://phabricator.wikimedia.org/P49141 and previous config saved to /var/cache/conftool/dbconfig/20230607-170551-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T336886)', diff saved to https://phabricator.wikimedia.org/P49140 and previous config saved to /var/cache/conftool/dbconfig/20230607-170530-ladsgroup.json
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49139 and previous config saved to /var/cache/conftool/dbconfig/20230607-170252-ladsgroup.json
  • 17:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49138 and previous config saved to /var/cache/conftool/dbconfig/20230607-165934-ladsgroup.json
  • 16:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:52 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:52 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:51 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P49137 and previous config saved to /var/cache/conftool/dbconfig/20230607-165024-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P49135 and previous config saved to /var/cache/conftool/dbconfig/20230607-164428-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P49134 and previous config saved to /var/cache/conftool/dbconfig/20230607-163518-ladsgroup.json
  • 16:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P49133 and previous config saved to /var/cache/conftool/dbconfig/20230607-162922-ladsgroup.json
  • 16:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp3050.esams.wmnet
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:21 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2014']
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T336886)', diff saved to https://phabricator.wikimedia.org/P49132 and previous config saved to /var/cache/conftool/dbconfig/20230607-162012-ladsgroup.json
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T336886)', diff saved to https://phabricator.wikimedia.org/P49131 and previous config saved to /var/cache/conftool/dbconfig/20230607-161800-ladsgroup.json
  • 16:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 16:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T336886)', diff saved to https://phabricator.wikimedia.org/P49130 and previous config saved to /var/cache/conftool/dbconfig/20230607-161740-ladsgroup.json
  • 16:15 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp3050.esams.wmnet
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49129 and previous config saved to /var/cache/conftool/dbconfig/20230607-161416-ladsgroup.json
  • 16:13 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:12 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:12 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2014']
  • 16:11 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2014']
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T336886)', diff saved to https://phabricator.wikimedia.org/P49128 and previous config saved to /var/cache/conftool/dbconfig/20230607-160912-ladsgroup.json
  • 16:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T336886)', diff saved to https://phabricator.wikimedia.org/P49127 and previous config saved to /var/cache/conftool/dbconfig/20230607-160851-ladsgroup.json
  • 16:07 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 16:04 jbond@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin2002"
  • 16:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs2014.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P49126 and previous config saved to /var/cache/conftool/dbconfig/20230607-160234-ladsgroup.json
  • 16:00 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host lists1003.wikimedia.org
  • 15:57 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 15:56 urandom: Beginning (3 hour) generated traffic testing of sessionstore.svc.codfw.wmnet — T337426
  • 15:56 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P49125 and previous config saved to /var/cache/conftool/dbconfig/20230607-155345-ladsgroup.json
  • 15:52 urandom: Upgrading Cassandra to 4.1.1, sessionstore2003 — T337426
  • 15:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1003.wikimedia.org
  • 15:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P49124 and previous config saved to /var/cache/conftool/dbconfig/20230607-154727-ladsgroup.json
  • 15:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
  • 15:44 urandom: Upgrading Cassandra to 4.1.1, sessionstore2002 — T337426
  • 15:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs2014.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:42 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:42 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for lvs2014 - pt1979@cumin2002"
  • 15:41 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entry for lvs2014 - pt1979@cumin2002"
  • 15:40 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver2001.codfw.wmnet with reason: host reimage
  • 15:39 moritzm: installing isc-dhcp bugfixes updates from Bullseye 11.7 point release
  • 15:38 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P49123 and previous config saved to /var/cache/conftool/dbconfig/20230607-153839-ladsgroup.json
  • 15:37 jbond@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver2001.codfw.wmnet with reason: host reimage
  • 15:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:33 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 15:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T336886)', diff saved to https://phabricator.wikimedia.org/P49122 and previous config saved to /var/cache/conftool/dbconfig/20230607-153221-ladsgroup.json
  • 15:26 moritzm: rolling restart of FPM on mw canaries to pick up libwebp security updates
  • 15:26 pt1979@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T336886)', diff saved to https://phabricator.wikimedia.org/P49121 and previous config saved to /var/cache/conftool/dbconfig/20230607-152456-ladsgroup.json
  • 15:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49120 and previous config saved to /var/cache/conftool/dbconfig/20230607-152425-ladsgroup.json
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T336886)', diff saved to https://phabricator.wikimedia.org/P49119 and previous config saved to /var/cache/conftool/dbconfig/20230607-152333-ladsgroup.json
  • 15:23 elukey: all varnishkafka instances on caching nodes are getting restarted due to https://gerrit.wikimedia.org/r/c/operations/puppet/+/928087 - T337825
  • 15:22 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 15:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:22 elukey: re-enable puppet on caching nodes
  • 15:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:21 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:21 claime: Bumping prewarmparsoid concurrency to 45 in changeprop-jobqueue - T320534
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T336886)', diff saved to https://phabricator.wikimedia.org/P49118 and previous config saved to /var/cache/conftool/dbconfig/20230607-151835-ladsgroup.json
  • 15:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T336886)', diff saved to https://phabricator.wikimedia.org/P49117 and previous config saved to /var/cache/conftool/dbconfig/20230607-151815-ladsgroup.json
  • 15:17 moritzm: installing libwebp security updates on buster
  • 15:17 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetserver2001.codfw.wmnet with OS bookworm
  • 15:17 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetserver2001.codfw.wmnet with OS bookworm
  • 15:14 urandom: Upgrading Cassandra to 4.1.1, sessionstore2001 — T337426
  • 15:14 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:10 elukey: disable puppet on all caching nodes to rollout a varnishakfka change (ref: https://gerrit.wikimedia.org/r/c/operations/puppet/+/928087)
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P49116 and previous config saved to /var/cache/conftool/dbconfig/20230607-150919-ladsgroup.json
  • 15:08 jbond@cumin2002: START - Cookbook sre.hosts.reimage for host puppetserver2001.codfw.wmnet with OS bookworm
  • 15:07 eevans@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool sessionstore in codfw: maintenance
  • 15:06 jbond@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetserver2001.mgmt.codfw.wmnet on all recursors
  • 15:06 jbond@cumin2002: START - Cookbook sre.dns.wipe-cache puppetserver2001.mgmt.codfw.wmnet on all recursors
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P49115 and previous config saved to /var/cache/conftool/dbconfig/20230607-150309-ladsgroup.json
  • 15:02 eevans@cumin1001: START - Cookbook sre.discovery.service-route depool sessionstore in codfw: maintenance
  • 15:02 urandom: de-pooling sessionstore/codfw — T337426
  • 14:56 sukhe: homer "cr*-codfw*" commit "Gerrit: 928068 remove decommissioned host lvs2010"
  • 14:54 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver1001.eqiad.wmnet with OS bookworm
  • 14:54 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P49114 and previous config saved to /var/cache/conftool/dbconfig/20230607-145413-ladsgroup.json
  • 14:54 moritzm: installing postgresql 13 security updates (clients/libs, server instances all updated already)
  • 14:53 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jbond@cumin1001"
  • 14:51 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:50 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 14:49 jbond@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs2010.codfw.wmnet
  • 14:49 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:48 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P49112 and previous config saved to /var/cache/conftool/dbconfig/20230607-144803-ladsgroup.json
  • 14:43 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 14:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver1001.eqiad.wmnet with reason: host reimage
  • 14:40 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-upload_eqiad and A:cp
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49111 and previous config saved to /var/cache/conftool/dbconfig/20230607-143907-ladsgroup.json
  • 14:39 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs2010.codfw.wmnet
  • 14:37 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver1001.eqiad.wmnet with reason: host reimage
  • 14:36 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:33 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-text_eqiad and A:cp
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T336886)', diff saved to https://phabricator.wikimedia.org/P49110 and previous config saved to /var/cache/conftool/dbconfig/20230607-143256-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49109 and previous config saved to /var/cache/conftool/dbconfig/20230607-143235-ladsgroup.json
  • 14:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T336886)', diff saved to https://phabricator.wikimedia.org/P49108 and previous config saved to /var/cache/conftool/dbconfig/20230607-143215-ladsgroup.json
  • 14:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T336886)', diff saved to https://phabricator.wikimedia.org/P49107 and previous config saved to /var/cache/conftool/dbconfig/20230607-142756-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T336886)', diff saved to https://phabricator.wikimedia.org/P49106 and previous config saved to /var/cache/conftool/dbconfig/20230607-142736-ladsgroup.json
  • 14:26 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:25 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1001.eqiad.wmnet with OS bookworm
  • 14:24 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetserver1001.eqiad.wmnet with OS bookworm
  • 14:23 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P49104 and previous config saved to /var/cache/conftool/dbconfig/20230607-141709-ladsgroup.json
  • 14:17 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet2006-dev
  • 14:16 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2006-dev
  • 14:14 aborrero@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet2005-dev
  • 14:14 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2005-dev
  • 14:14 aborrero@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudnet2006-dev
  • 14:13 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2006-dev
  • 14:13 aborrero@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudnet2005-dev
  • 14:13 aborrero@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet2005-dev
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P49103 and previous config saved to /var/cache/conftool/dbconfig/20230607-141230-ladsgroup.json
  • 14:10 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable 'multi-line' mode in preg_match() for wikitextToHTML regex (T338264) (duration: 09m 16s)
  • 14:05 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1001.eqiad.wmnet with OS bookworm
  • 14:03 lucaswerkmeister-wmde@deploy1002: d3r1ck01 and lucaswerkmeister-wmde: Backport for Enable 'multi-line' mode in preg_match() for wikitextToHTML regex (T338264) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P49102 and previous config saved to /var/cache/conftool/dbconfig/20230607-140203-ladsgroup.json
  • 14:01 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable 'multi-line' mode in preg_match() for wikitextToHTML regex (T338264)
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P49101 and previous config saved to /var/cache/conftool/dbconfig/20230607-135724-ladsgroup.json
  • 13:47 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable cache warming jobs for parsoid per default. (T329366) (duration: 10m 27s)
  • 13:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T336886)', diff saved to https://phabricator.wikimedia.org/P49100 and previous config saved to /var/cache/conftool/dbconfig/20230607-134656-ladsgroup.json
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T336886)', diff saved to https://phabricator.wikimedia.org/P49099 and previous config saved to /var/cache/conftool/dbconfig/20230607-134218-ladsgroup.json
  • 13:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1027.eqiad.wmnet']
  • 13:39 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027.eqiad.wmnet']
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T336886)', diff saved to https://phabricator.wikimedia.org/P49098 and previous config saved to /var/cache/conftool/dbconfig/20230607-133933-ladsgroup.json
  • 13:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['dbproxy1027.eqiad.wmnet']
  • 13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49097 and previous config saved to /var/cache/conftool/dbconfig/20230607-133854-ladsgroup.json
  • 13:38 lucaswerkmeister-wmde@deploy1002: daniel and lucaswerkmeister-wmde: Backport for Enable cache warming jobs for parsoid per default. (T329366) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:38 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbproxy1027.eqiad.wmnet']
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T336886)', diff saved to https://phabricator.wikimedia.org/P49096 and previous config saved to /var/cache/conftool/dbconfig/20230607-133725-ladsgroup.json
  • 13:37 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable cache warming jobs for parsoid per default. (T329366)
  • 13:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49095 and previous config saved to /var/cache/conftool/dbconfig/20230607-133704-ladsgroup.json
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 13:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:29 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 13:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1024.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P49093 and previous config saved to /var/cache/conftool/dbconfig/20230607-132348-ladsgroup.json
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P49092 and previous config saved to /var/cache/conftool/dbconfig/20230607-132158-ladsgroup.json
  • 13:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:20 topranks: removing remote vlan configuration from lsw1-f1-eqiad T322937
  • 13:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host dbproxy1027.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:10 ladsgroup@deploy1002: Finished scap: Backport for Revert "Revert "Remove legacy encoding option from dawiktionary"" (duration: 07m 11s)
  • 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P49090 and previous config saved to /var/cache/conftool/dbconfig/20230607-130841-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P49089 and previous config saved to /var/cache/conftool/dbconfig/20230607-130651-ladsgroup.json
  • 13:04 ladsgroup@deploy1002: ladsgroup: Backport for Revert "Revert "Remove legacy encoding option from dawiktionary"" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 13:03 ladsgroup@deploy1002: Started scap: Backport for Revert "Revert "Remove legacy encoding option from dawiktionary""
  • 13:02 cmooney@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937 (duration: 11m 45s)
  • 12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49088 and previous config saved to /var/cache/conftool/dbconfig/20230607-125335-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49087 and previous config saved to /var/cache/conftool/dbconfig/20230607-125145-ladsgroup.json
  • 12:51 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetserver1001.eqiad.wmnet with OS bookworm
  • 12:50 topranks: Depooling lvs1019 to move link from lsw1-f1-eqiad to ssw1-f1-eqiad
  • 12:50 cmooney@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in eqiad, blocking deploys T322937
  • 12:46 Amir1: mwscript maintenance/storage/moveToExternal.php --iconv DB cluster27 on dawiktionary and svwiktionary (T128155 and T128156)
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T336886)', diff saved to https://phabricator.wikimedia.org/P49086 and previous config saved to /var/cache/conftool/dbconfig/20230607-124543-ladsgroup.json
  • 12:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T336886)', diff saved to https://phabricator.wikimedia.org/P49085 and previous config saved to /var/cache/conftool/dbconfig/20230607-123926-ladsgroup.json
  • 12:37 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:37 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet - aborrero@cumin2002"
  • 12:36 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet - aborrero@cumin2002"
  • 12:33 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T336886)', diff saved to https://phabricator.wikimedia.org/P49084 and previous config saved to /var/cache/conftool/dbconfig/20230607-123002-ladsgroup.json
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P49083 and previous config saved to /var/cache/conftool/dbconfig/20230607-122420-ladsgroup.json
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P49082 and previous config saved to /var/cache/conftool/dbconfig/20230607-121456-ladsgroup.json
  • 12:13 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host puppetserver1001.eqiad.wmnet with OS bookworm
  • 12:12 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetserver1001.eqiad.wmnet on all recursors
  • 12:12 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetserver1001.eqiad.wmnet on all recursors
  • 12:11 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetserver.eqiad.wmnet on all recursors
  • 12:11 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetserver.eqiad.wmnet on all recursors
  • 12:11 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:10 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster1005 -> puppetserver1001 - jbond@cumin1001"
  • 12:09 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster1005 -> puppetserver1001 - jbond@cumin1001"
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P49081 and previous config saved to /var/cache/conftool/dbconfig/20230607-120914-ladsgroup.json
  • 12:07 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 12:07 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver1001
  • 12:06 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver1001
  • 12:06 jbond@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host puppetserver2001
  • 12:04 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2001
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P49080 and previous config saved to /var/cache/conftool/dbconfig/20230607-115950-ladsgroup.json
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T336886)', diff saved to https://phabricator.wikimedia.org/P49079 and previous config saved to /var/cache/conftool/dbconfig/20230607-115408-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T336886)', diff saved to https://phabricator.wikimedia.org/P49078 and previous config saved to /var/cache/conftool/dbconfig/20230607-115156-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T336886)', diff saved to https://phabricator.wikimedia.org/P49077 and previous config saved to /var/cache/conftool/dbconfig/20230607-115124-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 11:48 jbond@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host puppetserver2001
  • 11:46 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver2001
  • 11:46 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:46 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster1005 -> puppetserver1001 - jbond@cumin1001"
  • 11:45 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: rename puppetmaster1005 -> puppetserver1001 - jbond@cumin1001"
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T336886)', diff saved to https://phabricator.wikimedia.org/P49076 and previous config saved to /var/cache/conftool/dbconfig/20230607-114444-ladsgroup.json
  • 11:44 jbond@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host puppetserver1001
  • 11:43 jbond@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host puppetserver1001
  • 11:43 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T336886)', diff saved to https://phabricator.wikimedia.org/P49075 and previous config saved to /var/cache/conftool/dbconfig/20230607-114120-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 11:41 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
  • 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49074 and previous config saved to /var/cache/conftool/dbconfig/20230607-114059-ladsgroup.json
  • 11:40 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
  • 11:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:35 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:30 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster2005
  • 11:30 jbond@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster1005
  • 11:30 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:30 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1005 decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 11:29 jbond@cumin2002: START - Cookbook sre.dns.netbox
  • 11:27 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster1005 decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P49073 and previous config saved to /var/cache/conftool/dbconfig/20230607-112553-ladsgroup.json
  • 11:24 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 11:24 jbond@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetmaster2005
  • 11:23 jbond@cumin2002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts puppetmaster1005
  • 11:22 jbond@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetmaster1005
  • 11:17 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts puppetmaster1005
  • 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P49072 and previous config saved to /var/cache/conftool/dbconfig/20230607-111047-ladsgroup.json
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49071 and previous config saved to /var/cache/conftool/dbconfig/20230607-105541-ladsgroup.json
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49070 and previous config saved to /var/cache/conftool/dbconfig/20230607-105215-ladsgroup.json
  • 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49069 and previous config saved to /var/cache/conftool/dbconfig/20230607-105154-ladsgroup.json
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P49068 and previous config saved to /var/cache/conftool/dbconfig/20230607-103648-ladsgroup.json
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P49066 and previous config saved to /var/cache/conftool/dbconfig/20230607-102141-ladsgroup.json
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49065 and previous config saved to /var/cache/conftool/dbconfig/20230607-100635-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T336886)', diff saved to https://phabricator.wikimedia.org/P49064 and previous config saved to /var/cache/conftool/dbconfig/20230607-100307-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T336886)', diff saved to https://phabricator.wikimedia.org/P49063 and previous config saved to /var/cache/conftool/dbconfig/20230607-100247-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P49062 and previous config saved to /var/cache/conftool/dbconfig/20230607-094740-ladsgroup.json
  • 09:33 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling reboot on A:ncredir
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P49061 and previous config saved to /var/cache/conftool/dbconfig/20230607-093234-ladsgroup.json
  • 09:21 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T336886)', diff saved to https://phabricator.wikimedia.org/P49060 and previous config saved to /var/cache/conftool/dbconfig/20230607-091728-ladsgroup.json
  • 09:17 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T336886)', diff saved to https://phabricator.wikimedia.org/P49059 and previous config saved to /var/cache/conftool/dbconfig/20230607-091402-ladsgroup.json
  • 09:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 09:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T336886)', diff saved to https://phabricator.wikimedia.org/P49058 and previous config saved to /var/cache/conftool/dbconfig/20230607-091341-ladsgroup.json
  • 09:07 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 09:06 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 09:00 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 08:59 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-upload_eqiad and A:cp
  • 08:59 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_eqiad and A:cp
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P49057 and previous config saved to /var/cache/conftool/dbconfig/20230607-085835-ladsgroup.json
  • 08:49 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P49056 and previous config saved to /var/cache/conftool/dbconfig/20230607-084329-ladsgroup.json
  • 08:34 fabfur: disable puppet on A:cp-eqiad for varnish <-> haproxy port 80 swap
  • 08:29 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling reboot on A:ncredir
  • 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T336886)', diff saved to https://phabricator.wikimedia.org/P49055 and previous config saved to /var/cache/conftool/dbconfig/20230607-082823-ladsgroup.json
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T336886)', diff saved to https://phabricator.wikimedia.org/P49054 and previous config saved to /var/cache/conftool/dbconfig/20230607-082500-ladsgroup.json
  • 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T336886)', diff saved to https://phabricator.wikimedia.org/P49053 and previous config saved to /var/cache/conftool/dbconfig/20230607-082434-ladsgroup.json
  • 08:22 moritzm: uploaded ruby 2.5.5-3+deb10u5+wmf1 to apt.wikimedia.org, unbreaking Puppet runs after latest Ruby update for Buster T338294
  • 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P49052 and previous config saved to /var/cache/conftool/dbconfig/20230607-080928-ladsgroup.json
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P49051 and previous config saved to /var/cache/conftool/dbconfig/20230607-075422-ladsgroup.json
  • 07:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T336886)', diff saved to https://phabricator.wikimedia.org/P49050 and previous config saved to /var/cache/conftool/dbconfig/20230607-073916-ladsgroup.json
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T336886)', diff saved to https://phabricator.wikimedia.org/P49049 and previous config saved to /var/cache/conftool/dbconfig/20230607-073554-ladsgroup.json
  • 07:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 07:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T336886)', diff saved to https://phabricator.wikimedia.org/P49048 and previous config saved to /var/cache/conftool/dbconfig/20230607-073533-ladsgroup.json
  • 07:22 kartik@deploy1002: Finished scap: Backport for Use direct Parsoid in Small and Medium Wikis for Content Translation (T337922) (duration: 18m 06s)
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P49047 and previous config saved to /var/cache/conftool/dbconfig/20230607-072027-ladsgroup.json
  • 07:06 kartik@deploy1002: kartik: Backport for Use direct Parsoid in Small and Medium Wikis for Content Translation (T337922) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P49046 and previous config saved to /var/cache/conftool/dbconfig/20230607-070521-ladsgroup.json
  • 07:04 kartik@deploy1002: Started scap: Backport for Use direct Parsoid in Small and Medium Wikis for Content Translation (T337922)
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T336886)', diff saved to https://phabricator.wikimedia.org/P49045 and previous config saved to /var/cache/conftool/dbconfig/20230607-065015-ladsgroup.json
  • 06:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T336886)', diff saved to https://phabricator.wikimedia.org/P49044 and previous config saved to /var/cache/conftool/dbconfig/20230607-064652-ladsgroup.json
  • 06:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 06:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 06:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T336886)', diff saved to https://phabricator.wikimedia.org/P49043 and previous config saved to /var/cache/conftool/dbconfig/20230607-064631-ladsgroup.json
  • 06:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T336886)', diff saved to https://phabricator.wikimedia.org/P49042 and previous config saved to /var/cache/conftool/dbconfig/20230607-064215-ladsgroup.json
  • 06:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P49041 and previous config saved to /var/cache/conftool/dbconfig/20230607-063125-ladsgroup.json
  • 06:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49040 and previous config saved to /var/cache/conftool/dbconfig/20230607-062709-ladsgroup.json
  • 06:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P49039 and previous config saved to /var/cache/conftool/dbconfig/20230607-061618-ladsgroup.json
  • 06:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P49038 and previous config saved to /var/cache/conftool/dbconfig/20230607-061203-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T336886)', diff saved to https://phabricator.wikimedia.org/P49037 and previous config saved to /var/cache/conftool/dbconfig/20230607-060112-ladsgroup.json
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T336886)', diff saved to https://phabricator.wikimedia.org/P49036 and previous config saved to /var/cache/conftool/dbconfig/20230607-055746-ladsgroup.json
  • 05:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 05:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T336886)', diff saved to https://phabricator.wikimedia.org/P49035 and previous config saved to /var/cache/conftool/dbconfig/20230607-055726-ladsgroup.json
  • 05:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T336886)', diff saved to https://phabricator.wikimedia.org/P49034 and previous config saved to /var/cache/conftool/dbconfig/20230607-055655-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T336886)', diff saved to https://phabricator.wikimedia.org/P49033 and previous config saved to /var/cache/conftool/dbconfig/20230607-055320-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 05:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49032 and previous config saved to /var/cache/conftool/dbconfig/20230607-055259-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P49031 and previous config saved to /var/cache/conftool/dbconfig/20230607-054220-ladsgroup.json
  • 05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49030 and previous config saved to /var/cache/conftool/dbconfig/20230607-053753-ladsgroup.json
  • 05:28 kart_: Updated cxserver to 2023-06-07-044025-production (T337290, T337669, T337834)
  • 05:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P49029 and previous config saved to /var/cache/conftool/dbconfig/20230607-052713-ladsgroup.json
  • 05:25 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:25 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:22 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P49028 and previous config saved to /var/cache/conftool/dbconfig/20230607-052247-ladsgroup.json
  • 05:22 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:17 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:17 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T336886)', diff saved to https://phabricator.wikimedia.org/P49027 and previous config saved to /var/cache/conftool/dbconfig/20230607-051207-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T336886)', diff saved to https://phabricator.wikimedia.org/P49026 and previous config saved to /var/cache/conftool/dbconfig/20230607-050844-ladsgroup.json
  • 05:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 05:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 05:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T336886)', diff saved to https://phabricator.wikimedia.org/P49025 and previous config saved to /var/cache/conftool/dbconfig/20230607-050823-ladsgroup.json
  • 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49024 and previous config saved to /var/cache/conftool/dbconfig/20230607-050740-ladsgroup.json
  • 05:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49023 and previous config saved to /var/cache/conftool/dbconfig/20230607-050258-ladsgroup.json
  • 05:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 05:02 kart_: Updated MinT to 2023-06-06-120533-production (T337910, T337686, T337708)
  • 05:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 05:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T336886)', diff saved to https://phabricator.wikimedia.org/P49022 and previous config saved to /var/cache/conftool/dbconfig/20230607-050237-ladsgroup.json
  • 04:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P49021 and previous config saved to /var/cache/conftool/dbconfig/20230607-045317-ladsgroup.json
  • 04:51 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 04:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P49020 and previous config saved to /var/cache/conftool/dbconfig/20230607-044731-ladsgroup.json
  • 04:45 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 04:39 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 04:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P49019 and previous config saved to /var/cache/conftool/dbconfig/20230607-043810-ladsgroup.json
  • 04:36 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 04:32 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 04:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P49018 and previous config saved to /var/cache/conftool/dbconfig/20230607-043225-ladsgroup.json
  • 04:31 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 04:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T336886)', diff saved to https://phabricator.wikimedia.org/P49017 and previous config saved to /var/cache/conftool/dbconfig/20230607-042304-ladsgroup.json
  • 04:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T336886)', diff saved to https://phabricator.wikimedia.org/P49016 and previous config saved to /var/cache/conftool/dbconfig/20230607-042040-ladsgroup.json
  • 04:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 04:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 04:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 04:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T336886)', diff saved to https://phabricator.wikimedia.org/P49015 and previous config saved to /var/cache/conftool/dbconfig/20230607-041719-ladsgroup.json
  • 04:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 04:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 04:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 04:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 04:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 04:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T336886)', diff saved to https://phabricator.wikimedia.org/P49014 and previous config saved to /var/cache/conftool/dbconfig/20230607-041357-ladsgroup.json
  • 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T336886)', diff saved to https://phabricator.wikimedia.org/P49013 and previous config saved to /var/cache/conftool/dbconfig/20230607-041347-ladsgroup.json
  • 04:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 04:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49012 and previous config saved to /var/cache/conftool/dbconfig/20230607-041326-ladsgroup.json
  • 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P49011 and previous config saved to /var/cache/conftool/dbconfig/20230607-035851-ladsgroup.json
  • 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P49010 and previous config saved to /var/cache/conftool/dbconfig/20230607-035820-ladsgroup.json
  • 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P49009 and previous config saved to /var/cache/conftool/dbconfig/20230607-034345-ladsgroup.json
  • 03:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P49008 and previous config saved to /var/cache/conftool/dbconfig/20230607-034314-ladsgroup.json
  • 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T336886)', diff saved to https://phabricator.wikimedia.org/P49007 and previous config saved to /var/cache/conftool/dbconfig/20230607-032839-ladsgroup.json
  • 03:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49006 and previous config saved to /var/cache/conftool/dbconfig/20230607-032808-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1214 (T336886)', diff saved to https://phabricator.wikimedia.org/P49005 and previous config saved to /var/cache/conftool/dbconfig/20230607-032522-ladsgroup.json
  • 03:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 03:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T336886)', diff saved to https://phabricator.wikimedia.org/P49004 and previous config saved to /var/cache/conftool/dbconfig/20230607-032501-ladsgroup.json
  • 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P49003 and previous config saved to /var/cache/conftool/dbconfig/20230607-032428-ladsgroup.json
  • 03:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 03:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 03:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T336886)', diff saved to https://phabricator.wikimedia.org/P49002 and previous config saved to /var/cache/conftool/dbconfig/20230607-032407-ladsgroup.json
  • 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P49001 and previous config saved to /var/cache/conftool/dbconfig/20230607-030955-ladsgroup.json
  • 03:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P49000 and previous config saved to /var/cache/conftool/dbconfig/20230607-030901-ladsgroup.json
  • 02:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P48999 and previous config saved to /var/cache/conftool/dbconfig/20230607-025449-ladsgroup.json
  • 02:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P48998 and previous config saved to /var/cache/conftool/dbconfig/20230607-025355-ladsgroup.json
  • 02:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T336886)', diff saved to https://phabricator.wikimedia.org/P48997 and previous config saved to /var/cache/conftool/dbconfig/20230607-023943-ladsgroup.json
  • 02:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T336886)', diff saved to https://phabricator.wikimedia.org/P48996 and previous config saved to /var/cache/conftool/dbconfig/20230607-023848-ladsgroup.json
  • 02:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T336886)', diff saved to https://phabricator.wikimedia.org/P48995 and previous config saved to /var/cache/conftool/dbconfig/20230607-023624-ladsgroup.json
  • 02:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 02:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T336886)', diff saved to https://phabricator.wikimedia.org/P48994 and previous config saved to /var/cache/conftool/dbconfig/20230607-023613-ladsgroup.json
  • 02:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 02:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 02:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T336886)', diff saved to https://phabricator.wikimedia.org/P48993 and previous config saved to /var/cache/conftool/dbconfig/20230607-023603-ladsgroup.json
  • 02:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 02:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 02:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 02:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T336886)', diff saved to https://phabricator.wikimedia.org/P48992 and previous config saved to /var/cache/conftool/dbconfig/20230607-023537-ladsgroup.json
  • 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P48991 and previous config saved to /var/cache/conftool/dbconfig/20230607-022057-ladsgroup.json
  • 02:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P48990 and previous config saved to /var/cache/conftool/dbconfig/20230607-022031-ladsgroup.json
  • 02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P48989 and previous config saved to /var/cache/conftool/dbconfig/20230607-020550-ladsgroup.json
  • 02:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P48988 and previous config saved to /var/cache/conftool/dbconfig/20230607-020518-ladsgroup.json
  • 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T336886)', diff saved to https://phabricator.wikimedia.org/P48987 and previous config saved to /var/cache/conftool/dbconfig/20230607-015043-ladsgroup.json
  • 01:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T336886)', diff saved to https://phabricator.wikimedia.org/P48986 and previous config saved to /var/cache/conftool/dbconfig/20230607-015012-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T336886)', diff saved to https://phabricator.wikimedia.org/P48985 and previous config saved to /var/cache/conftool/dbconfig/20230607-014635-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T336886)', diff saved to https://phabricator.wikimedia.org/P48984 and previous config saved to /var/cache/conftool/dbconfig/20230607-014626-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 01:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T336886)', diff saved to https://phabricator.wikimedia.org/P48983 and previous config saved to /var/cache/conftool/dbconfig/20230607-014614-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T336886)', diff saved to https://phabricator.wikimedia.org/P48982 and previous config saved to /var/cache/conftool/dbconfig/20230607-014605-ladsgroup.json
  • 01:39 sukhe: run authdns-update: T338280
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P48981 and previous config saved to /var/cache/conftool/dbconfig/20230607-013108-ladsgroup.json
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P48980 and previous config saved to /var/cache/conftool/dbconfig/20230607-013059-ladsgroup.json
  • 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P48979 and previous config saved to /var/cache/conftool/dbconfig/20230607-011602-ladsgroup.json
  • 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P48978 and previous config saved to /var/cache/conftool/dbconfig/20230607-011553-ladsgroup.json
  • 01:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T336886)', diff saved to https://phabricator.wikimedia.org/P48977 and previous config saved to /var/cache/conftool/dbconfig/20230607-010055-ladsgroup.json
  • 01:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T336886)', diff saved to https://phabricator.wikimedia.org/P48976 and previous config saved to /var/cache/conftool/dbconfig/20230607-010047-ladsgroup.json
  • 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T336886)', diff saved to https://phabricator.wikimedia.org/P48975 and previous config saved to /var/cache/conftool/dbconfig/20230607-005722-ladsgroup.json
  • 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T336886)', diff saved to https://phabricator.wikimedia.org/P48974 and previous config saved to /var/cache/conftool/dbconfig/20230607-005713-ladsgroup.json
  • 00:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 00:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 00:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T336886)', diff saved to https://phabricator.wikimedia.org/P48973 and previous config saved to /var/cache/conftool/dbconfig/20230607-005654-ladsgroup.json
  • 00:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 00:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 00:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 00:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 00:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 00:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48972 and previous config saved to /var/cache/conftool/dbconfig/20230607-005155-ladsgroup.json
  • 00:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P48971 and previous config saved to /var/cache/conftool/dbconfig/20230607-004148-ladsgroup.json
  • 00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P48970 and previous config saved to /var/cache/conftool/dbconfig/20230607-003649-ladsgroup.json
  • 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P48969 and previous config saved to /var/cache/conftool/dbconfig/20230607-002642-ladsgroup.json
  • 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315', diff saved to https://phabricator.wikimedia.org/P48968 and previous config saved to /var/cache/conftool/dbconfig/20230607-002143-ladsgroup.json
  • 00:14 urbanecm:: Deployed security patch for T338276
  • 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T336886)', diff saved to https://phabricator.wikimedia.org/P48967 and previous config saved to /var/cache/conftool/dbconfig/20230607-001136-ladsgroup.json
  • 00:08 urbanecm:: Deployed security patch for T338276
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T336886)', diff saved to https://phabricator.wikimedia.org/P48966 and previous config saved to /var/cache/conftool/dbconfig/20230607-000814-ladsgroup.json
  • 00:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 00:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 00:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T336886)', diff saved to https://phabricator.wikimedia.org/P48965 and previous config saved to /var/cache/conftool/dbconfig/20230607-000754-ladsgroup.json
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48964 and previous config saved to /var/cache/conftool/dbconfig/20230607-000637-ladsgroup.json
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48963 and previous config saved to /var/cache/conftool/dbconfig/20230607-000337-ladsgroup.json
  • 00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T336886)', diff saved to https://phabricator.wikimedia.org/P48962 and previous config saved to /var/cache/conftool/dbconfig/20230607-000316-ladsgroup.json
  • 00:01 urbanecm: Deploying security patch for T338276

2023-06-06

  • 23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P48961 and previous config saved to /var/cache/conftool/dbconfig/20230606-235248-ladsgroup.json
  • 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P48960 and previous config saved to /var/cache/conftool/dbconfig/20230606-234810-ladsgroup.json
  • 23:42 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 23:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P48959 and previous config saved to /var/cache/conftool/dbconfig/20230606-233742-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P48958 and previous config saved to /var/cache/conftool/dbconfig/20230606-233304-ladsgroup.json
  • 23:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T336886)', diff saved to https://phabricator.wikimedia.org/P48955 and previous config saved to /var/cache/conftool/dbconfig/20230606-232235-ladsgroup.json
  • 23:20 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a1-codfw - pt1979@cumin2002"
  • 23:19 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-a1-codfw - pt1979@cumin2002"
  • 23:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T336886)', diff saved to https://phabricator.wikimedia.org/P48954 and previous config saved to /var/cache/conftool/dbconfig/20230606-231913-ladsgroup.json
  • 23:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 23:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 23:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T336886)', diff saved to https://phabricator.wikimedia.org/P48953 and previous config saved to /var/cache/conftool/dbconfig/20230606-231853-ladsgroup.json
  • 23:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T336886)', diff saved to https://phabricator.wikimedia.org/P48952 and previous config saved to /var/cache/conftool/dbconfig/20230606-231758-ladsgroup.json
  • 23:16 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:16 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-a1-codfw.mgmt.codfw.wmnet
  • 23:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 23:16 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:16 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - pt1979@cumin2002"
  • 23:15 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - pt1979@cumin2002"
  • 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T336886)', diff saved to https://phabricator.wikimedia.org/P48951 and previous config saved to /var/cache/conftool/dbconfig/20230606-231408-ladsgroup.json
  • 23:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 23:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 23:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T336886)', diff saved to https://phabricator.wikimedia.org/P48950 and previous config saved to /var/cache/conftool/dbconfig/20230606-231347-ladsgroup.json
  • 23:13 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P48949 and previous config saved to /var/cache/conftool/dbconfig/20230606-230347-ladsgroup.json
  • 22:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P48948 and previous config saved to /var/cache/conftool/dbconfig/20230606-225841-ladsgroup.json
  • 22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
  • 22:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
  • 22:48 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P48947 and previous config saved to /var/cache/conftool/dbconfig/20230606-224841-ladsgroup.json
  • 22:48 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 22:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P48946 and previous config saved to /var/cache/conftool/dbconfig/20230606-224334-ladsgroup.json
  • 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T336886)', diff saved to https://phabricator.wikimedia.org/P48945 and previous config saved to /var/cache/conftool/dbconfig/20230606-223335-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T336886)', diff saved to https://phabricator.wikimedia.org/P48944 and previous config saved to /var/cache/conftool/dbconfig/20230606-223011-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 22:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48943 and previous config saved to /var/cache/conftool/dbconfig/20230606-222950-ladsgroup.json
  • 22:29 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T336886)', diff saved to https://phabricator.wikimedia.org/P48942 and previous config saved to /var/cache/conftool/dbconfig/20230606-222828-ladsgroup.json
  • 22:27 zabe@deploy1002: Finished scap: Backport for Stop writing to revision_comment_temp everywhere (T299954) (duration: 07m 33s)
  • 22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T336886)', diff saved to https://phabricator.wikimedia.org/P48941 and previous config saved to /var/cache/conftool/dbconfig/20230606-222534-ladsgroup.json
  • 22:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 22:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 22:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T336886)', diff saved to https://phabricator.wikimedia.org/P48940 and previous config saved to /var/cache/conftool/dbconfig/20230606-222513-ladsgroup.json
  • 22:21 zabe@deploy1002: zabe: Backport for Stop writing to revision_comment_temp everywhere (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 22:19 zabe@deploy1002: Started scap: Backport for Stop writing to revision_comment_temp everywhere (T299954)
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P48939 and previous config saved to /var/cache/conftool/dbconfig/20230606-221444-ladsgroup.json
  • 22:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P48938 and previous config saved to /var/cache/conftool/dbconfig/20230606-221007-ladsgroup.json
  • 21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P48937 and previous config saved to /var/cache/conftool/dbconfig/20230606-215938-ladsgroup.json
  • 21:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P48936 and previous config saved to /var/cache/conftool/dbconfig/20230606-215501-ladsgroup.json
  • 21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48935 and previous config saved to /var/cache/conftool/dbconfig/20230606-214432-ladsgroup.json
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48934 and previous config saved to /var/cache/conftool/dbconfig/20230606-214109-ladsgroup.json
  • 21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 21:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 21:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T336886)', diff saved to https://phabricator.wikimedia.org/P48933 and previous config saved to /var/cache/conftool/dbconfig/20230606-214048-ladsgroup.json
  • 21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T336886)', diff saved to https://phabricator.wikimedia.org/P48932 and previous config saved to /var/cache/conftool/dbconfig/20230606-213954-ladsgroup.json
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T336886)', diff saved to https://phabricator.wikimedia.org/P48931 and previous config saved to /var/cache/conftool/dbconfig/20230606-213702-ladsgroup.json
  • 21:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 21:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T336886)', diff saved to https://phabricator.wikimedia.org/P48930 and previous config saved to /var/cache/conftool/dbconfig/20230606-213641-ladsgroup.json
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P48929 and previous config saved to /var/cache/conftool/dbconfig/20230606-212542-ladsgroup.json
  • 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P48928 and previous config saved to /var/cache/conftool/dbconfig/20230606-212135-ladsgroup.json
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P48927 and previous config saved to /var/cache/conftool/dbconfig/20230606-211036-ladsgroup.json
  • 21:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P48926 and previous config saved to /var/cache/conftool/dbconfig/20230606-210629-ladsgroup.json
  • 21:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 21:03 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1026.eqiad.wmnet with OS bullseye
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T336886)', diff saved to https://phabricator.wikimedia.org/P48925 and previous config saved to /var/cache/conftool/dbconfig/20230606-205530-ladsgroup.json
  • 20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T336886)', diff saved to https://phabricator.wikimedia.org/P48924 and previous config saved to /var/cache/conftool/dbconfig/20230606-205206-ladsgroup.json
  • 20:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 20:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 20:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T336886)', diff saved to https://phabricator.wikimedia.org/P48923 and previous config saved to /var/cache/conftool/dbconfig/20230606-205123-ladsgroup.json
  • 20:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 20:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P48922 and previous config saved to /var/cache/conftool/dbconfig/20230606-205002-ladsgroup.json
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1183 (T336886)', diff saved to https://phabricator.wikimedia.org/P48921 and previous config saved to /var/cache/conftool/dbconfig/20230606-204527-ladsgroup.json
  • 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T336886)', diff saved to https://phabricator.wikimedia.org/P48920 and previous config saved to /var/cache/conftool/dbconfig/20230606-204506-ladsgroup.json
  • 20:41 urbanecm@deploy1002: Finished scap: Backport for PersonalizedPraiseLogger: Only include mentee_id if not null (T338078), PersonalizedPraiseLogger: Only include mentee_id if not null (T338078) (duration: 07m 23s)
  • 20:35 urbanecm@deploy1002: urbanecm: Backport for PersonalizedPraiseLogger: Only include mentee_id if not null (T338078), PersonalizedPraiseLogger: Only include mentee_id if not null (T338078) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P48919 and previous config saved to /var/cache/conftool/dbconfig/20230606-203456-ladsgroup.json
  • 20:34 urbanecm@deploy1002: Started scap: Backport for PersonalizedPraiseLogger: Only include mentee_id if not null (T338078), PersonalizedPraiseLogger: Only include mentee_id if not null (T338078)
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P48917 and previous config saved to /var/cache/conftool/dbconfig/20230606-203000-ladsgroup.json
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P48916 and previous config saved to /var/cache/conftool/dbconfig/20230606-201950-ladsgroup.json
  • 20:16 mutante: miscweb1003, miscweb2003 - rm -rf /srv/org/wikimedia/sitemaps after removing httpd virtual host T338064
  • 20:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P48915 and previous config saved to /var/cache/conftool/dbconfig/20230606-201454-ladsgroup.json
  • 20:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 20:09 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1026.eqiad.wmnet with OS bullseye
  • 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P48914 and previous config saved to /var/cache/conftool/dbconfig/20230606-200444-ladsgroup.json
  • 19:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T336886)', diff saved to https://phabricator.wikimedia.org/P48913 and previous config saved to /var/cache/conftool/dbconfig/20230606-195948-ladsgroup.json
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T336886)', diff saved to https://phabricator.wikimedia.org/P48912 and previous config saved to /var/cache/conftool/dbconfig/20230606-195557-ladsgroup.json
  • 19:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 19:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48911 and previous config saved to /var/cache/conftool/dbconfig/20230606-195320-ladsgroup.json
  • 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P48910 and previous config saved to /var/cache/conftool/dbconfig/20230606-193814-ladsgroup.json
  • 19:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P48909 and previous config saved to /var/cache/conftool/dbconfig/20230606-192308-ladsgroup.json
  • 19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48908 and previous config saved to /var/cache/conftool/dbconfig/20230606-190802-ladsgroup.json
  • 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T336886)', diff saved to https://phabricator.wikimedia.org/P48907 and previous config saved to /var/cache/conftool/dbconfig/20230606-190420-ladsgroup.json
  • 19:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 19:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T336886)', diff saved to https://phabricator.wikimedia.org/P48906 and previous config saved to /var/cache/conftool/dbconfig/20230606-190402-ladsgroup.json
  • 19:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 19:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 18:10 mutante: disabling https://sitemaps.wikimedia.org - T338064 T332101
  • 18:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.12 refs T337526
  • 18:01 sukhe: cumin 'A:cp-text' 'enable-puppet "CR 926611" && run-puppet-agent -q'
  • 18:01 sukhe: re-enable puppet on A:cp-text and force puppet run: T338064
  • 17:54 sukhe: enable puppet on cp4037 to test CR 926611
  • 17:50 sukhe: disable puppet on A:cp-text to roll out CR 926611
  • 17:39 sukhe: sudo cumin 'P:ntp' 'enable-puppet "testing CR 926598" && run-puppet-agent'
  • 17:27 sukhe: sudo cumin 'P:ntp' 'disable-puppet "testing CR 926598"'
  • 17:05 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 17:04 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 17:04 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 17:01 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:51 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:41 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:40 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:39 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:37 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:37 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:36 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:30 sukhe: low-traffic/codfw: set routing-options static route 10.2.1.0/24 next-hop 10.192.32.14
  • 16:27 sukhe: restart pybal on lvs2013 to remove bgp-med override
  • 16:23 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:12 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
  • 16:12 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:06 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:03 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T336886)', diff saved to https://phabricator.wikimedia.org/P48904 and previous config saved to /var/cache/conftool/dbconfig/20230606-160151-ladsgroup.json
  • 15:54 jbond@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 15:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:52 jbond@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P48902 and previous config saved to /var/cache/conftool/dbconfig/20230606-154645-ladsgroup.json
  • 15:46 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:40 cdanis@deploy1002: Finished scap: Backport for Revert "EventStreamConfig - development.network.probe- disable canary events and hadoop ingestion" (duration: 08m 13s)
  • 15:38 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:37 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:35 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:35 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:34 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:34 cdanis@deploy1002: cdanis and otto: Backport for Revert "EventStreamConfig - development.network.probe- disable canary events and hadoop ingestion" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:32 zabe: purge wikimaniawiki logos # T337044
  • 15:32 cdanis@deploy1002: Started scap: Backport for Revert "EventStreamConfig - development.network.probe- disable canary events and hadoop ingestion"
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P48901 and previous config saved to /var/cache/conftool/dbconfig/20230606-153139-ladsgroup.json
  • 15:30 zabe@deploy1002: Finished scap: Backport for Change project logo for Wikimania to Wikimania 2023 version (T337044) (duration: 08m 02s)
  • 15:26 sukhe: homer "cr*-codfw*" commit "Gerrit: 927725 add new LVS host lvs2013" : T326767
  • 15:24 zabe@deploy1002: robertsky and zabe: Backport for Change project logo for Wikimania to Wikimania 2023 version (T337044) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 15:22 zabe@deploy1002: Started scap: Backport for Change project logo for Wikimania to Wikimania 2023 version (T337044)
  • 15:21 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs2013
  • 15:21 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2013
  • 15:20 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
  • 15:19 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T336886)', diff saved to https://phabricator.wikimedia.org/P48900 and previous config saved to /var/cache/conftool/dbconfig/20230606-151633-ladsgroup.json
  • 15:12 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-text_esams and A:cp
  • 15:08 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 15:07 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:06 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:06 mforns@deploy1002: Finished deploy [airflow-dags/analytics@72d9b87]: (no justification provided) (duration: 00m 10s)
  • 15:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:06 mforns@deploy1002: Started deploy [airflow-dags/analytics@72d9b87]: (no justification provided)
  • 15:03 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
  • 15:02 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:02 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T336886)', diff saved to https://phabricator.wikimedia.org/P48899 and previous config saved to /var/cache/conftool/dbconfig/20230606-150141-ladsgroup.json
  • 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48898 and previous config saved to /var/cache/conftool/dbconfig/20230606-150120-ladsgroup.json
  • 15:00 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 14:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1026.eqiad.wmnet with OS bullseye
  • 14:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 14:56 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:53 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:53 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:53 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change entries for moved links eqiad row e f switches - cmooney@cumin1001"
  • 14:51 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Change entries for moved links eqiad row e f switches - cmooney@cumin1001"
  • 14:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2013.codfw.wmnet with OS bullseye
  • 14:49 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P48897 and previous config saved to /var/cache/conftool/dbconfig/20230606-144614-ladsgroup.json
  • 14:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
  • 14:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
  • 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P48896 and previous config saved to /var/cache/conftool/dbconfig/20230606-143107-ladsgroup.json
  • 14:25 oblivian@deploy1002: Finished scap: Backport for Load and enable parsoid everywhere (T334980) (duration: 15m 00s)
  • 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48895 and previous config saved to /var/cache/conftool/dbconfig/20230606-141601-ladsgroup.json
  • 14:16 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2013.codfw.wmnet with OS bullseye
  • 14:15 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
  • 14:12 oblivian@deploy1002: oblivian: Backport for Load and enable parsoid everywhere (T334980) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:10 oblivian@deploy1002: Started scap: Backport for Load and enable parsoid everywhere (T334980)
  • 14:08 eoghan@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
  • 14:06 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad.mgmt,lsw1-f[1,3]-eqiad.mgmt with reason: Migrate lsw1-f2-eqiad uplinks to spine
  • 14:06 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad.mgmt,lsw1-f[1,3]-eqiad.mgmt with reason: Migrate lsw1-f2-eqiad uplinks to spine
  • 14:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1026.eqiad.wmnet with OS bullseye
  • 14:03 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS bullseye
  • 14:01 oblivian@deploy1002: Finished scap: Backport for Enable parser cache warming jobs for parsoid on enwiki (T329366) (duration: 07m 57s)
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48894 and previous config saved to /var/cache/conftool/dbconfig/20230606-140051-ladsgroup.json
  • 14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48893 and previous config saved to /var/cache/conftool/dbconfig/20230606-140030-ladsgroup.json
  • 13:59 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 780 hosts
  • 13:58 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 780 hosts
  • 13:58 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AndyRussG out of all services on: 1259 hosts
  • 13:57 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AndyRussG out of all services on: 1259 hosts
  • 13:55 oblivian@deploy1002: oblivian and daniel: Backport for Enable parser cache warming jobs for parsoid on enwiki (T329366) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:53 oblivian@deploy1002: Started scap: Backport for Enable parser cache warming jobs for parsoid on enwiki (T329366)
  • 13:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 13:50 oblivian@deploy1002: Finished scap: Backport for Drop wmgMemoryLimitParsoid from IS.php (duration: 07m 21s)
  • 13:49 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P48891 and previous config saved to /var/cache/conftool/dbconfig/20230606-134524-ladsgroup.json
  • 13:45 oblivian@deploy1002: oblivian: Backport for Drop wmgMemoryLimitParsoid from IS.php synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:43 oblivian@deploy1002: Started scap: Backport for Drop wmgMemoryLimitParsoid from IS.php
  • 13:41 oblivian@deploy1002: Finished scap: Backport for Raise memory limit to match parsoid (T334980) (duration: 07m 53s)
  • 13:41 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 13:41 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 13:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad.mgmt,lsw1-f[1-2]-eqiad.mgmt with reason: Migrate lsw1-f2-eqiad uplinks to spine
  • 13:35 oblivian@deploy1002: oblivian: Backport for Raise memory limit to match parsoid (T334980) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:34 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad.mgmt,lsw1-f[1-2]-eqiad.mgmt with reason: Migrate lsw1-f2-eqiad uplinks to spine
  • 13:33 oblivian@deploy1002: Started scap: Backport for Raise memory limit to match parsoid (T334980)
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P48890 and previous config saved to /var/cache/conftool/dbconfig/20230606-133018-ladsgroup.json
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48889 and previous config saved to /var/cache/conftool/dbconfig/20230606-131512-ladsgroup.json
  • 13:11 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
  • 13:06 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: EventStreamConfig - Disable canary events and hadoop ingestion for development.network.probe - T332024 (duration: 07m 17s)
  • 13:00 eoghan@cumin1001: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48888 and previous config saved to /var/cache/conftool/dbconfig/20230606-125944-ladsgroup.json
  • 12:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T336886)', diff saved to https://phabricator.wikimedia.org/P48887 and previous config saved to /var/cache/conftool/dbconfig/20230606-125923-ladsgroup.json
  • 12:56 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-upload_esams and A:cp
  • 12:55 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS bullseye
  • 12:53 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS bullseye
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P48886 and previous config saved to /var/cache/conftool/dbconfig/20230606-124417-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P48885 and previous config saved to /var/cache/conftool/dbconfig/20230606-122911-ladsgroup.json
  • 12:21 cgoubert@deploy1002: Finished scap: (no justification provided) (duration: 02m 10s)
  • 12:19 cgoubert@deploy1002: Started scap: (no justification provided)
  • 12:19 claime: redeploying 927218 to mw-on-k8s - T338121
  • 12:15 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T336886)', diff saved to https://phabricator.wikimedia.org/P48884 and previous config saved to /var/cache/conftool/dbconfig/20230606-121405-ladsgroup.json
  • 12:09 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
  • 12:00 kamila@deploy1002: Finished scap: Backport for OAuthRateLimiter: Add rate limiting class for WME using LiftWing (T338121) (duration: 08m 54s)
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T336886)', diff saved to https://phabricator.wikimedia.org/P48881 and previous config saved to /var/cache/conftool/dbconfig/20230606-115911-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 11:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T336886)', diff saved to https://phabricator.wikimedia.org/P48880 and previous config saved to /var/cache/conftool/dbconfig/20230606-115833-ladsgroup.json
  • 11:53 kamila@deploy1002: kamila and klausman: Backport for OAuthRateLimiter: Add rate limiting class for WME using LiftWing (T338121) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 11:51 kamila@deploy1002: Started scap: Backport for OAuthRateLimiter: Add rate limiting class for WME using LiftWing (T338121)
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P48879 and previous config saved to /var/cache/conftool/dbconfig/20230606-114327-ladsgroup.json
  • 11:38 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:37 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P48878 and previous config saved to /var/cache/conftool/dbconfig/20230606-112819-ladsgroup.json
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T336886)', diff saved to https://phabricator.wikimedia.org/P48877 and previous config saved to /var/cache/conftool/dbconfig/20230606-111313-ladsgroup.json
  • 11:03 eoghan@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrading Gitlab to 15.10.8
  • 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T336886)', diff saved to https://phabricator.wikimedia.org/P48876 and previous config saved to /var/cache/conftool/dbconfig/20230606-105756-ladsgroup.json
  • 10:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 10:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 10:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T336886)', diff saved to https://phabricator.wikimedia.org/P48875 and previous config saved to /var/cache/conftool/dbconfig/20230606-105724-ladsgroup.json
  • 10:53 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 10:53 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 10:52 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 10:51 zabe@deploy1002: Finished scap: Backport for Stop writing to revision_comment_temp in group1 wikis (T299954) (duration: 07m 03s)
  • 10:51 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 10:50 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 10:50 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 10:50 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 10:50 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 10:46 zabe@deploy1002: zabe: Backport for Stop writing to revision_comment_temp in group1 wikis (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 10:44 zabe@deploy1002: Started scap: Backport for Stop writing to revision_comment_temp in group1 wikis (T299954)
  • 10:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P48874 and previous config saved to /var/cache/conftool/dbconfig/20230606-104218-ladsgroup.json
  • 10:30 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
  • 10:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
  • 10:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:28 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P48873 and previous config saved to /var/cache/conftool/dbconfig/20230606-102712-ladsgroup.json
  • 10:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
  • 10:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
  • 10:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:20 mwpresync@deploy1002: Pruned MediaWiki: 1.41.0-wmf.10 (duration: 02m 18s)
  • 10:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:19 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:17 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.12 refs T337526 (duration: 56m 25s)
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T336886)', diff saved to https://phabricator.wikimedia.org/P48872 and previous config saved to /var/cache/conftool/dbconfig/20230606-101205-ladsgroup.json
  • 10:07 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:07 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:02 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 10:01 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 10:00 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 09:59 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 09:58 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:58 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T336886)', diff saved to https://phabricator.wikimedia.org/P48871 and previous config saved to /var/cache/conftool/dbconfig/20230606-095512-ladsgroup.json
  • 09:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 09:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T336886)', diff saved to https://phabricator.wikimedia.org/P48870 and previous config saved to /var/cache/conftool/dbconfig/20230606-095451-ladsgroup.json
  • 09:41 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:41 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P48869 and previous config saved to /var/cache/conftool/dbconfig/20230606-093945-ladsgroup.json
  • 09:34 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_esams and A:cp
  • 09:31 fabfur@cumin1001: END (FAIL) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=1) rolling custom on A:cp-text_esams and A:cp
  • 09:27 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_esams and A:cp
  • 09:27 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:26 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P48867 and previous config saved to /var/cache/conftool/dbconfig/20230606-092439-ladsgroup.json
  • 09:21 mwpresync@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.12 refs T337526
  • 09:18 jynus: running systemctl start train-presync
  • 09:16 vgutierrez: restarting acme-chief and nginx on acme-chief instances
  • 09:11 claime: Building production images - T338014
  • 09:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T336886)', diff saved to https://phabricator.wikimedia.org/P48866 and previous config saved to /var/cache/conftool/dbconfig/20230606-090933-ladsgroup.json
  • 08:59 urbanecm: deploy1002: run /usr/local/sbin/fix-staging-perms (T338205)
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
  • 08:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
  • 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T336886)', diff saved to https://phabricator.wikimedia.org/P48865 and previous config saved to /var/cache/conftool/dbconfig/20230606-085337-ladsgroup.json
  • 08:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T336886)', diff saved to https://phabricator.wikimedia.org/P48864 and previous config saved to /var/cache/conftool/dbconfig/20230606-085317-ladsgroup.json
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
  • 08:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
  • 08:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P48863 and previous config saved to /var/cache/conftool/dbconfig/20230606-083810-ladsgroup.json
  • 08:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P48861 and previous config saved to /var/cache/conftool/dbconfig/20230606-082304-ladsgroup.json
  • 08:15 moritzm: installing openssl security updates on bullseye
  • 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T336886)', diff saved to https://phabricator.wikimedia.org/P48860 and previous config saved to /var/cache/conftool/dbconfig/20230606-080758-ladsgroup.json
  • 07:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T336886)', diff saved to https://phabricator.wikimedia.org/P48859 and previous config saved to /var/cache/conftool/dbconfig/20230606-075210-ladsgroup.json
  • 07:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 07:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 07:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T336886)', diff saved to https://phabricator.wikimedia.org/P48858 and previous config saved to /var/cache/conftool/dbconfig/20230606-075149-ladsgroup.json
  • 07:47 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-upload_esams and A:cp
  • 07:42 dcausse@deploy1002: Finished scap: Backport for ttm: use new config option to separate readable and writable services (T322284) (duration: 15m 20s)
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P48857 and previous config saved to /var/cache/conftool/dbconfig/20230606-073643-ladsgroup.json
  • 07:28 dcausse@deploy1002: dcausse: Backport for ttm: use new config option to separate readable and writable services (T322284) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 07:27 dcausse@deploy1002: Started scap: Backport for ttm: use new config option to separate readable and writable services (T322284)
  • 07:22 kharlan@deploy1002: Finished scap: Backport for checkuser: Disable client hints feature by default (T337944) (duration: 08m 14s)
  • 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P48856 and previous config saved to /var/cache/conftool/dbconfig/20230606-072137-ladsgroup.json
  • 07:16 kharlan@deploy1002: kharlan: Backport for checkuser: Disable client hints feature by default (T337944) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:14 kharlan@deploy1002: Started scap: Backport for checkuser: Disable client hints feature by default (T337944)
  • 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T336886)', diff saved to https://phabricator.wikimedia.org/P48855 and previous config saved to /var/cache/conftool/dbconfig/20230606-070631-ladsgroup.json
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T336886)', diff saved to https://phabricator.wikimedia.org/P48854 and previous config saved to /var/cache/conftool/dbconfig/20230606-065057-ladsgroup.json
  • 06:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 06:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 06:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 06:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 06:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T336886)', diff saved to https://phabricator.wikimedia.org/P48853 and previous config saved to /var/cache/conftool/dbconfig/20230606-060807-ladsgroup.json
  • 05:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P48852 and previous config saved to /var/cache/conftool/dbconfig/20230606-055301-ladsgroup.json
  • 05:50 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'configure' for AS: 2518
  • 05:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2518
  • 05:49 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 2518
  • 05:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2518
  • 05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P48851 and previous config saved to /var/cache/conftool/dbconfig/20230606-053755-ladsgroup.json
  • 05:34 Amir1: ladsgroup@clouddb1021:/srv/sqldata.s1$ sudo rm db1196* (T337961)
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T336886)', diff saved to https://phabricator.wikimedia.org/P48850 and previous config saved to /var/cache/conftool/dbconfig/20230606-052249-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T336886)', diff saved to https://phabricator.wikimedia.org/P48849 and previous config saved to /var/cache/conftool/dbconfig/20230606-051938-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 05:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T336886)', diff saved to https://phabricator.wikimedia.org/P48848 and previous config saved to /var/cache/conftool/dbconfig/20230606-051918-ladsgroup.json
  • 05:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P48847 and previous config saved to /var/cache/conftool/dbconfig/20230606-050410-ladsgroup.json
  • 04:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P48846 and previous config saved to /var/cache/conftool/dbconfig/20230606-044904-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T336886)', diff saved to https://phabricator.wikimedia.org/P48845 and previous config saved to /var/cache/conftool/dbconfig/20230606-043358-ladsgroup.json
  • 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T336886)', diff saved to https://phabricator.wikimedia.org/P48844 and previous config saved to /var/cache/conftool/dbconfig/20230606-043047-ladsgroup.json
  • 04:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 04:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 04:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T336886)', diff saved to https://phabricator.wikimedia.org/P48843 and previous config saved to /var/cache/conftool/dbconfig/20230606-043026-ladsgroup.json
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P48842 and previous config saved to /var/cache/conftool/dbconfig/20230606-041520-ladsgroup.json
  • 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P48841 and previous config saved to /var/cache/conftool/dbconfig/20230606-040013-ladsgroup.json
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T336886)', diff saved to https://phabricator.wikimedia.org/P48840 and previous config saved to /var/cache/conftool/dbconfig/20230606-034506-ladsgroup.json
  • 03:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T336886)', diff saved to https://phabricator.wikimedia.org/P48839 and previous config saved to /var/cache/conftool/dbconfig/20230606-034256-ladsgroup.json
  • 03:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 03:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 03:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T336886)', diff saved to https://phabricator.wikimedia.org/P48838 and previous config saved to /var/cache/conftool/dbconfig/20230606-034235-ladsgroup.json
  • 03:32 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 03:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 03:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - pt1979@cumin2002"
  • 03:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for ssw1-a1-codfw - pt1979@cumin2002"
  • 03:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P48837 and previous config saved to /var/cache/conftool/dbconfig/20230606-032729-ladsgroup.json
  • 03:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P48836 and previous config saved to /var/cache/conftool/dbconfig/20230606-031223-ladsgroup.json
  • 02:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T336886)', diff saved to https://phabricator.wikimedia.org/P48835 and previous config saved to /var/cache/conftool/dbconfig/20230606-025717-ladsgroup.json
  • 02:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T336886)', diff saved to https://phabricator.wikimedia.org/P48834 and previous config saved to /var/cache/conftool/dbconfig/20230606-025507-ladsgroup.json
  • 02:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 02:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48833 and previous config saved to /var/cache/conftool/dbconfig/20230606-021622-ladsgroup.json
  • 02:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 02:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 02:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48832 and previous config saved to /var/cache/conftool/dbconfig/20230606-020616-ladsgroup.json
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P48831 and previous config saved to /var/cache/conftool/dbconfig/20230606-020116-ladsgroup.json
  • 01:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P48830 and previous config saved to /var/cache/conftool/dbconfig/20230606-015110-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P48829 and previous config saved to /var/cache/conftool/dbconfig/20230606-014610-ladsgroup.json
  • 01:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P48828 and previous config saved to /var/cache/conftool/dbconfig/20230606-013604-ladsgroup.json
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48827 and previous config saved to /var/cache/conftool/dbconfig/20230606-013104-ladsgroup.json
  • 01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48826 and previous config saved to /var/cache/conftool/dbconfig/20230606-012058-ladsgroup.json
  • 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T336886)', diff saved to https://phabricator.wikimedia.org/P48825 and previous config saved to /var/cache/conftool/dbconfig/20230606-010704-ladsgroup.json
  • 01:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 01:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 01:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48824 and previous config saved to /var/cache/conftool/dbconfig/20230606-010643-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T336886)', diff saved to https://phabricator.wikimedia.org/P48823 and previous config saved to /var/cache/conftool/dbconfig/20230606-005357-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T336886)', diff saved to https://phabricator.wikimedia.org/P48822 and previous config saved to /var/cache/conftool/dbconfig/20230606-005336-ladsgroup.json
  • 00:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P48821 and previous config saved to /var/cache/conftool/dbconfig/20230606-005137-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P48820 and previous config saved to /var/cache/conftool/dbconfig/20230606-003830-ladsgroup.json
  • 00:37 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 00:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P48819 and previous config saved to /var/cache/conftool/dbconfig/20230606-003631-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P48818 and previous config saved to /var/cache/conftool/dbconfig/20230606-002324-ladsgroup.json
  • 00:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48817 and previous config saved to /var/cache/conftool/dbconfig/20230606-002125-ladsgroup.json
  • 00:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48816 and previous config saved to /var/cache/conftool/dbconfig/20230606-001914-ladsgroup.json
  • 00:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 00:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 00:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 00:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 00:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T336886)', diff saved to https://phabricator.wikimedia.org/P48815 and previous config saved to /var/cache/conftool/dbconfig/20230606-001836-ladsgroup.json
  • 00:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T336886)', diff saved to https://phabricator.wikimedia.org/P48814 and previous config saved to /var/cache/conftool/dbconfig/20230606-000818-ladsgroup.json
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P48813 and previous config saved to /var/cache/conftool/dbconfig/20230606-000330-ladsgroup.json

2023-06-05

  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T336886)', diff saved to https://phabricator.wikimedia.org/P48812 and previous config saved to /var/cache/conftool/dbconfig/20230605-235346-ladsgroup.json
  • 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T336886)', diff saved to https://phabricator.wikimedia.org/P48811 and previous config saved to /var/cache/conftool/dbconfig/20230605-235310-ladsgroup.json
  • 23:49 zabe@deploy1002: Finished scap: Backport for Stop writing to revision_comment_temp in group0 wikis (T299954) (duration: 07m 02s)
  • 23:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P48810 and previous config saved to /var/cache/conftool/dbconfig/20230605-234824-ladsgroup.json
  • 23:43 zabe@deploy1002: zabe: Backport for Stop writing to revision_comment_temp in group0 wikis (T299954) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 23:42 zabe@deploy1002: Started scap: Backport for Stop writing to revision_comment_temp in group0 wikis (T299954)
  • 23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P48809 and previous config saved to /var/cache/conftool/dbconfig/20230605-233804-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T336886)', diff saved to https://phabricator.wikimedia.org/P48808 and previous config saved to /var/cache/conftool/dbconfig/20230605-233318-ladsgroup.json
  • 23:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T336886)', diff saved to https://phabricator.wikimedia.org/P48807 and previous config saved to /var/cache/conftool/dbconfig/20230605-233107-ladsgroup.json
  • 23:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48806 and previous config saved to /var/cache/conftool/dbconfig/20230605-233046-ladsgroup.json
  • 23:25 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:25 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
  • 23:24 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
  • 23:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P48805 and previous config saved to /var/cache/conftool/dbconfig/20230605-232258-ladsgroup.json
  • 23:22 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:22 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 23:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.provision (exit_code=93) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P48804 and previous config saved to /var/cache/conftool/dbconfig/20230605-231540-ladsgroup.json
  • 23:15 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:15 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove mgmt DNS for ssw1-a1 for testing - pt1979@cumin2002"
  • 23:14 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove mgmt DNS for ssw1-a1 for testing - pt1979@cumin2002"
  • 23:12 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:11 jforrester@deploy1002: Finished deploy [integration/docroot@6eefe56]: I5c1b92 for T334492 (duration: 00m 05s)
  • 23:10 jforrester@deploy1002: Started deploy [integration/docroot@6eefe56]: I5c1b92 for T334492
  • 23:09 jforrester@deploy1002: Finished deploy [integration/docroot@ab77611]: Idf6c7a (duration: 00m 08s)
  • 23:09 jforrester@deploy1002: Started deploy [integration/docroot@ab77611]: Idf6c7a
  • 23:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T336886)', diff saved to https://phabricator.wikimedia.org/P48803 and previous config saved to /var/cache/conftool/dbconfig/20230605-230752-ladsgroup.json
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P48802 and previous config saved to /var/cache/conftool/dbconfig/20230605-230034-ladsgroup.json
  • 22:57 mutante: contint2001 - sudo systemctl restart apache2
  • 22:57 mutante: contint2001 - sudo apt-get remove --purge libapache2-mod-php7.3 php7.3-cli php7.3-common php7.3-json php7.3-opcache php7.3-readline
  • 22:55 jforrester@deploy1002: Finished deploy [integration/docroot@8255d99]: I6c7575 for T337425 (duration: 00m 13s)
  • 22:55 jforrester@deploy1002: Started deploy [integration/docroot@8255d99]: I6c7575 for T337425
  • 22:53 mutante: contint2001 (prod main CI server) - upgrading PHP 7.3 to 7.4
  • 22:49 zabe@deploy1002: Finished scap: Backport for Stop writing to revision_comment_temp in testwiki (T299954) (duration: 09m 13s)
  • 22:46 mutante: contint2002, contint1002 - upgrading PHP from 7.3 to 7.4
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48801 and previous config saved to /var/cache/conftool/dbconfig/20230605-224528-ladsgroup.json
  • 22:41 zabe@deploy1002: zabe: Backport for Stop writing to revision_comment_temp in testwiki (T299954) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 22:40 zabe@deploy1002: Started scap: Backport for Stop writing to revision_comment_temp in testwiki (T299954)
  • 22:37 ladsgroup@deploy1002: Finished scap: Backport for moveToExternal: Actually convert encoding of cur_text (T337700) (duration: 09m 04s)
  • 22:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T336886)', diff saved to https://phabricator.wikimedia.org/P48800 and previous config saved to /var/cache/conftool/dbconfig/20230605-223035-ladsgroup.json
  • 22:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 22:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 22:29 ladsgroup@deploy1002: ladsgroup: Backport for moveToExternal: Actually convert encoding of cur_text (T337700) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:28 ladsgroup@deploy1002: Started scap: Backport for moveToExternal: Actually convert encoding of cur_text (T337700)
  • 22:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48799 and previous config saved to /var/cache/conftool/dbconfig/20230605-222745-ladsgroup.json
  • 22:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 22:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 22:24 ladsgroup@deploy1002: Finished scap: Backport for Revert "Remove legacy encoding option from dawiktionary" (duration: 07m 40s)
  • 22:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P48798 and previous config saved to /var/cache/conftool/dbconfig/20230605-222339-ladsgroup.json
  • 22:18 ladsgroup@deploy1002: ladsgroup: Backport for Revert "Remove legacy encoding option from dawiktionary" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 22:17 ladsgroup@deploy1002: Started scap: Backport for Revert "Remove legacy encoding option from dawiktionary"
  • 22:13 ladsgroup@deploy1002: Finished scap: Backport for Help measure the impact of saneitizer jobs (T336698) (duration: 09m 48s)
  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P48797 and previous config saved to /var/cache/conftool/dbconfig/20230605-220833-ladsgroup.json
  • 22:05 ladsgroup@deploy1002: ladsgroup: Backport for Help measure the impact of saneitizer jobs (T336698) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 22:03 ladsgroup@deploy1002: Started scap: Backport for Help measure the impact of saneitizer jobs (T336698)
  • 22:01 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1016.eqiad.wmnet
  • 22:01 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1016.eqiad.wmnet
  • 21:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48796 and previous config saved to /var/cache/conftool/dbconfig/20230605-215345-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P48795 and previous config saved to /var/cache/conftool/dbconfig/20230605-215326-ladsgroup.json
  • 21:51 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs1016.eqiad.wmnet
  • 21:50 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1016.eqiad.wmnet
  • 21:42 urbanecm@deploy1002: Finished scap: Backport for NewImpact: Fix renderMode parsing for Special:Impact (T338085) (duration: 25m 38s)
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P48794 and previous config saved to /var/cache/conftool/dbconfig/20230605-213839-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P48793 and previous config saved to /var/cache/conftool/dbconfig/20230605-213819-ladsgroup.json
  • 21:35 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs1015.eqiad.wmnet
  • 21:35 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1015.eqiad.wmnet
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T335845)', diff saved to https://phabricator.wikimedia.org/P48792 and previous config saved to /var/cache/conftool/dbconfig/20230605-213202-ladsgroup.json
  • 21:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 21:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 21:30 urbanecm@deploy1002: urbanecm: Backport for NewImpact: Fix renderMode parsing for Special:Impact (T338085) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 21:29 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 21:29 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 21:25 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs1015.eqiad.wmnet
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P48791 and previous config saved to /var/cache/conftool/dbconfig/20230605-212333-ladsgroup.json
  • 21:23 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs1015.eqiad.wmnet
  • 21:18 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 21:17 urbanecm@deploy1002: Started scap: Backport for NewImpact: Fix renderMode parsing for Special:Impact (T338085)
  • 21:16 urbanecm@deploy1002: Finished scap: Backport for Update interwiki cache (T338093) (duration: 24m 34s)
  • 21:15 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48790 and previous config saved to /var/cache/conftool/dbconfig/20230605-210827-ladsgroup.json
  • 21:05 urbanecm@deploy1002: urbanecm: Backport for Update interwiki cache (T338093) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:51 urbanecm@deploy1002: Started scap: Backport for Update interwiki cache (T338093)
  • 20:48 cjming: end of UTC late backport window
  • 20:47 urbanecm: [urbanecm@deploy1002 ~]$ sudo /usr/local/sbin/fix-staging-perms # verify T338180 fix
  • away: payments-wiki upgraded from 2b4203df to f3b229c6
  • 20:46 cjming@deploy1002: Finished scap: Backport for Revert "Revert "VisualEditorFeatureUse sampling rate to 1 everywhere"" (duration: 09m 57s)
  • 20:38 cjming@deploy1002: cjming: Backport for Revert "Revert "VisualEditorFeatureUse sampling rate to 1 everywhere"" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:36 cjming@deploy1002: Started scap: Backport for Revert "Revert "VisualEditorFeatureUse sampling rate to 1 everywhere""
  • 20:35 cjming@deploy1002: Finished scap: Backport for Add initial stream configs for Android article events using Metrics Platform Java client library (T330355) (duration: 24m 57s)
  • 20:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T336886)', diff saved to https://phabricator.wikimedia.org/P48789 and previous config saved to /var/cache/conftool/dbconfig/20230605-202916-ladsgroup.json
  • 20:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 20:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T336886)', diff saved to https://phabricator.wikimedia.org/P48788 and previous config saved to /var/cache/conftool/dbconfig/20230605-202855-ladsgroup.json
  • 20:23 cjming@deploy1002: cjming: Backport for Add initial stream configs for Android article events using Metrics Platform Java client library (T330355) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P48787 and previous config saved to /var/cache/conftool/dbconfig/20230605-201349-ladsgroup.json
  • 20:10 cjming@deploy1002: Started scap: Backport for Add initial stream configs for Android article events using Metrics Platform Java client library (T330355)
  • 20:09 urbanecm: [urbanecm@deploy1002 ~]$ sudo /usr/local/sbin/fix-staging-perms # attempt to fix permission errors when doing a backport
  • 19:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P48786 and previous config saved to /var/cache/conftool/dbconfig/20230605-195842-ladsgroup.json
  • 19:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T336886)', diff saved to https://phabricator.wikimedia.org/P48785 and previous config saved to /var/cache/conftool/dbconfig/20230605-194336-ladsgroup.json
  • 19:32 brett: Maglev LVS scheduler rollout in eqiad finished (puppet re-enabled) - T263797
  • 19:12 bking@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts wdqs2011.codfw.wmnet
  • 19:12 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2011.codfw.wmnet
  • 19:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 19:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T336886)', diff saved to https://phabricator.wikimedia.org/P48784 and previous config saved to /var/cache/conftool/dbconfig/20230605-190702-ladsgroup.json
  • 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T336886)', diff saved to https://phabricator.wikimedia.org/P48783 and previous config saved to /var/cache/conftool/dbconfig/20230605-190528-ladsgroup.json
  • 19:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 19:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 19:03 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2011.codfw.wmnet
  • 18:58 bking@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts wdqs2011.codfw.wmnet
  • 18:56 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2011.codfw.wmnet
  • 18:52 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: no-op: revert - remove undeeded wgEventBusStreamNamesMap override setting (take 2) - T336817 (duration: 11m 54s)
  • 18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P48782 and previous config saved to /var/cache/conftool/dbconfig/20230605-185156-ladsgroup.json
  • 18:48 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2011.codfw.wmnet
  • 18:48 inflatador: bking@cumin1001 depooling wdqs2011for fw update T331297
  • 18:48 inflatador: bking@cumin1001 repooling wdqs2010 T331297
  • 18:45 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2010.codfw.wmnet
  • 18:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P48781 and previous config saved to /var/cache/conftool/dbconfig/20230605-183650-ladsgroup.json
  • 18:35 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wdqs2010.codfw.wmnet
  • 18:32 inflatador: bking@cumin1001 depooling wdqs2010 for fw update T331297
  • 18:30 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: revert - Remove unused page_change rc streams - T336817 (duration: 11m 23s)
  • 18:29 sukhe: homer "cr*-eqiad*" commit "Gerrit: 927246 remove old gerrit service IP"
  • 18:28 brett: Maglev LVS scheduler rollout in eqiad (puppet disabled) - T263797
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T336886)', diff saved to https://phabricator.wikimedia.org/P48780 and previous config saved to /var/cache/conftool/dbconfig/20230605-182144-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T336886)', diff saved to https://phabricator.wikimedia.org/P48779 and previous config saved to /var/cache/conftool/dbconfig/20230605-181935-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 18:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48778 and previous config saved to /var/cache/conftool/dbconfig/20230605-181915-ladsgroup.json
  • 18:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 18:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T336886)', diff saved to https://phabricator.wikimedia.org/P48777 and previous config saved to /var/cache/conftool/dbconfig/20230605-181219-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P48776 and previous config saved to /var/cache/conftool/dbconfig/20230605-180408-ladsgroup.json
  • 17:58 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 17:58 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48775 and previous config saved to /var/cache/conftool/dbconfig/20230605-175712-ladsgroup.json
  • 17:50 otto@deploy1002: Synchronized wmf-config/ext-EventStreamConfig.php: no-op: Remove unused page_change rc streams - T336817 (duration: 20m 11s)
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P48774 and previous config saved to /var/cache/conftool/dbconfig/20230605-174902-ladsgroup.json
  • 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P48773 and previous config saved to /var/cache/conftool/dbconfig/20230605-174206-ladsgroup.json
  • 17:38 cdanis@deploy1002: Finished scap: Backport for Enable user network probe events (T332024) (duration: 10m 02s)
  • 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48772 and previous config saved to /var/cache/conftool/dbconfig/20230605-173356-ladsgroup.json
  • 17:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T336886)', diff saved to https://phabricator.wikimedia.org/P48771 and previous config saved to /var/cache/conftool/dbconfig/20230605-173002-ladsgroup.json
  • 17:30 cdanis@deploy1002: cdanis: Backport for Enable user network probe events (T332024) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 17:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T336886)', diff saved to https://phabricator.wikimedia.org/P48770 and previous config saved to /var/cache/conftool/dbconfig/20230605-172942-ladsgroup.json
  • 17:28 cdanis@deploy1002: Started scap: Backport for Enable user network probe events (T332024)
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T336886)', diff saved to https://phabricator.wikimedia.org/P48769 and previous config saved to /var/cache/conftool/dbconfig/20230605-172700-ladsgroup.json
  • 17:26 cdanis@deploy1002: Backport cancelled.
  • 17:26 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: no-op: Remove undeeded wgEventBusStreamNamesMap override setting (take 2) - T336817 (duration: 09m 25s)
  • 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T336886)', diff saved to https://phabricator.wikimedia.org/P48768 and previous config saved to /var/cache/conftool/dbconfig/20230605-172124-ladsgroup.json
  • 17:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 17:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 17:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T336886)', diff saved to https://phabricator.wikimedia.org/P48767 and previous config saved to /var/cache/conftool/dbconfig/20230605-172103-ladsgroup.json
  • 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P48766 and previous config saved to /var/cache/conftool/dbconfig/20230605-171436-ladsgroup.json
  • 17:12 cdanis@deploy1002: Backport cancelled.
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48765 and previous config saved to /var/cache/conftool/dbconfig/20230605-170557-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P48764 and previous config saved to /var/cache/conftool/dbconfig/20230605-165929-ladsgroup.json
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P48763 and previous config saved to /var/cache/conftool/dbconfig/20230605-165051-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T336886)', diff saved to https://phabricator.wikimedia.org/P48762 and previous config saved to /var/cache/conftool/dbconfig/20230605-164423-ladsgroup.json
  • 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2013.codfw.wmnet with OS bullseye
  • 16:37 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T336886)', diff saved to https://phabricator.wikimedia.org/P48761 and previous config saved to /var/cache/conftool/dbconfig/20230605-163714-ladsgroup.json
  • 16:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T336886)', diff saved to https://phabricator.wikimedia.org/P48760 and previous config saved to /var/cache/conftool/dbconfig/20230605-163653-ladsgroup.json
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T336886)', diff saved to https://phabricator.wikimedia.org/P48759 and previous config saved to /var/cache/conftool/dbconfig/20230605-163545-ladsgroup.json
  • 16:35 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T336886)', diff saved to https://phabricator.wikimedia.org/P48758 and previous config saved to /var/cache/conftool/dbconfig/20230605-162707-ladsgroup.json
  • 16:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T336886)', diff saved to https://phabricator.wikimedia.org/P48757 and previous config saved to /var/cache/conftool/dbconfig/20230605-162629-ladsgroup.json
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P48756 and previous config saved to /var/cache/conftool/dbconfig/20230605-162147-ladsgroup.json
  • 16:21 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 16:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
  • 16:19 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 16:16 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2013.codfw.wmnet with reason: host reimage
  • 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P48755 and previous config saved to /var/cache/conftool/dbconfig/20230605-161123-ladsgroup.json
  • 16:08 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P48754 and previous config saved to /var/cache/conftool/dbconfig/20230605-160640-ladsgroup.json
  • 16:06 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 16:06 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 16:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:05 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 16:05 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:05 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:59 bblack: mw1419: manually executing a php restart to test new safe-service-restart
  • 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P48753 and previous config saved to /var/cache/conftool/dbconfig/20230605-155617-ladsgroup.json
  • 15:55 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2013.codfw.wmnet with OS bullseye
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T336886)', diff saved to https://phabricator.wikimedia.org/P48752 and previous config saved to /var/cache/conftool/dbconfig/20230605-155134-ladsgroup.json
  • 15:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['lvs2013']
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T336886)', diff saved to https://phabricator.wikimedia.org/P48751 and previous config saved to /var/cache/conftool/dbconfig/20230605-154926-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48750 and previous config saved to /var/cache/conftool/dbconfig/20230605-154905-ladsgroup.json
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T336886)', diff saved to https://phabricator.wikimedia.org/P48749 and previous config saved to /var/cache/conftool/dbconfig/20230605-154110-ladsgroup.json
  • 15:37 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2013']
  • 15:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['lvs2013']
  • 15:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['lvs2013']
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T336886)', diff saved to https://phabricator.wikimedia.org/P48748 and previous config saved to /var/cache/conftool/dbconfig/20230605-153542-ladsgroup.json
  • 15:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 15:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T336886)', diff saved to https://phabricator.wikimedia.org/P48747 and previous config saved to /var/cache/conftool/dbconfig/20230605-153521-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P48746 and previous config saved to /var/cache/conftool/dbconfig/20230605-153359-ladsgroup.json
  • 15:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance
  • 15:33 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance
  • 15:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs2013.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:27 Amir1: on s3 master: update `text` set old_text = 'O:18:"historyblobcurstub":1:{s:6:"mCurId";i:5532;}', old_flags = 'object' where old_id= 14484; (T337700)
  • 15:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P48745 and previous config saved to /var/cache/conftool/dbconfig/20230605-152015-ladsgroup.json
  • 15:19 moritzm: installing debian-archive-keyring updates on bullseye hosts
  • 15:19 mforns@deploy1002: Finished deploy [airflow-dags/analytics@674ec0a]: (no justification provided) (duration: 00m 17s)
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P48744 and previous config saved to /var/cache/conftool/dbconfig/20230605-151853-ladsgroup.json
  • 15:18 mforns@deploy1002: Started deploy [airflow-dags/analytics@674ec0a]: (no justification provided)
  • 15:18 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T326767 (duration: 102m 46s)
  • 15:07 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs2013.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:07 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Setup DNS for lvs2013 - pt1979@cumin2002"
  • 15:06 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Setup DNS for lvs2013 - pt1979@cumin2002"
  • 15:05 moritzm: installing avahi security updates
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P48742 and previous config saved to /var/cache/conftool/dbconfig/20230605-150509-ladsgroup.json
  • 15:04 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48741 and previous config saved to /var/cache/conftool/dbconfig/20230605-150347-ladsgroup.json
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T336886)', diff saved to https://phabricator.wikimedia.org/P48740 and previous config saved to /var/cache/conftool/dbconfig/20230605-150138-ladsgroup.json
  • 15:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T336886)', diff saved to https://phabricator.wikimedia.org/P48739 and previous config saved to /var/cache/conftool/dbconfig/20230605-150117-ladsgroup.json
  • 14:55 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:55 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:52 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:52 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:50 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:50 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T336886)', diff saved to https://phabricator.wikimedia.org/P48738 and previous config saved to /var/cache/conftool/dbconfig/20230605-145003-ladsgroup.json
  • 14:48 sukhe: homer "cr*-codfw*" commit "Gerrit: 927208 remove decommissioned host lvs2009": T335777
  • 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs2009.codfw.wmnet
  • 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:47 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P48737 and previous config saved to /var/cache/conftool/dbconfig/20230605-144611-ladsgroup.json
  • 14:45 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs2009.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T336886)', diff saved to https://phabricator.wikimedia.org/P48736 and previous config saved to /var/cache/conftool/dbconfig/20230605-144438-ladsgroup.json
  • 14:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 14:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T336886)', diff saved to https://phabricator.wikimedia.org/P48735 and previous config saved to /var/cache/conftool/dbconfig/20230605-144417-ladsgroup.json
  • 14:42 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:32 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts lvs2009.codfw.wmnet
  • 14:31 ejegg: payments-wiki upgraded from c2f9f8b5 to 2b4203df
  • 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P48734 and previous config saved to /var/cache/conftool/dbconfig/20230605-143105-ladsgroup.json
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P48733 and previous config saved to /var/cache/conftool/dbconfig/20230605-142911-ladsgroup.json
  • 14:28 sukhe: codfw low-traffic LVS: set routing-options static route 10.2.1.0/24 next-hop 10.192.49.7
  • 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T336886)', diff saved to https://phabricator.wikimedia.org/P48732 and previous config saved to /var/cache/conftool/dbconfig/20230605-141559-ladsgroup.json
  • 14:15 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:15 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T336886)', diff saved to https://phabricator.wikimedia.org/P48731 and previous config saved to /var/cache/conftool/dbconfig/20230605-141451-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T336886)', diff saved to https://phabricator.wikimedia.org/P48730 and previous config saved to /var/cache/conftool/dbconfig/20230605-141430-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P48729 and previous config saved to /var/cache/conftool/dbconfig/20230605-141405-ladsgroup.json
  • 14:08 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:08 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P48728 and previous config saved to /var/cache/conftool/dbconfig/20230605-135924-ladsgroup.json
  • 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T336886)', diff saved to https://phabricator.wikimedia.org/P48727 and previous config saved to /var/cache/conftool/dbconfig/20230605-135859-ladsgroup.json
  • 13:57 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:56 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T336886)', diff saved to https://phabricator.wikimedia.org/P48726 and previous config saved to /var/cache/conftool/dbconfig/20230605-135332-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T336886)', diff saved to https://phabricator.wikimedia.org/P48725 and previous config saved to /var/cache/conftool/dbconfig/20230605-135311-ladsgroup.json
  • 13:46 moritzm: installing python-ipaddress security updates
  • 13:45 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance
  • 13:44 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: Host under maintenance
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P48724 and previous config saved to /var/cache/conftool/dbconfig/20230605-134418-ladsgroup.json
  • 13:44 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: Host under maintenance
  • 13:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1002.eqiad.wmnet with reason: Host under maintenance
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T335845)', diff saved to https://phabricator.wikimedia.org/P48723 and previous config saved to /var/cache/conftool/dbconfig/20230605-134313-ladsgroup.json
  • 13:41 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:41 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P48722 and previous config saved to /var/cache/conftool/dbconfig/20230605-133805-ladsgroup.json
  • 13:36 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T326767
  • 13:35 sukhe@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T322937 (duration: 01m 06s)
  • 13:35 sukhe@deploy1002: Locking from deployment [ALL REPOSITORIES]: LVS maintenance in codfw, blocking deploys T322937
  • 13:35 bblack@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: temporary lock for LVS resarts in core DCs (duration: 05m 54s)
  • 13:32 bblack: lvs1* (eqiad) - restart pybal for T334703 IPs
  • 13:29 bblack: lvs2* (codfw) - restart pybal for T334703 IPs
  • 13:29 bblack@deploy1002: Locking from deployment [ALL REPOSITORIES]: temporary lock for LVS resarts in core DCs
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T336886)', diff saved to https://phabricator.wikimedia.org/P48721 and previous config saved to /var/cache/conftool/dbconfig/20230605-132911-ladsgroup.json
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P48720 and previous config saved to /var/cache/conftool/dbconfig/20230605-132807-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T336886)', diff saved to https://phabricator.wikimedia.org/P48719 and previous config saved to /var/cache/conftool/dbconfig/20230605-132703-ladsgroup.json
  • 13:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T336886)', diff saved to https://phabricator.wikimedia.org/P48718 and previous config saved to /var/cache/conftool/dbconfig/20230605-132642-ladsgroup.json
  • 13:25 hashar: Restarted Zuul due to stall ssh connection # T309376
  • 13:25 bblack: lvs3* (esams) - restart pybal for T334703 IPs
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P48717 and previous config saved to /var/cache/conftool/dbconfig/20230605-132259-ladsgroup.json
  • 13:19 bblack: lvs5* (eqsin) - restart pybal for T334703 IPs
  • 13:17 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:15 bblack: lvs6* (drmrs) - restart pybal for T334703 IPs
  • 13:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Make outreachwiki a multilingual Wikidata client (T171140) (duration: 10m 06s)
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P48716 and previous config saved to /var/cache/conftool/dbconfig/20230605-131301-ladsgroup.json
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P48715 and previous config saved to /var/cache/conftool/dbconfig/20230605-131136-ladsgroup.json
  • 13:09 bblack: lvs4* (ulsfo) - restart pybal for T334703 IPs
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T336886)', diff saved to https://phabricator.wikimedia.org/P48714 and previous config saved to /var/cache/conftool/dbconfig/20230605-130753-ladsgroup.json
  • 13:05 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Make outreachwiki a multilingual Wikidata client (T171140) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:04 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Make outreachwiki a multilingual Wikidata client (T171140)
  • 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T336886)', diff saved to https://phabricator.wikimedia.org/P48713 and previous config saved to /var/cache/conftool/dbconfig/20230605-130228-ladsgroup.json
  • 13:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 13:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T335845)', diff saved to https://phabricator.wikimedia.org/P48712 and previous config saved to /var/cache/conftool/dbconfig/20230605-125754-ladsgroup.json
  • 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P48711 and previous config saved to /var/cache/conftool/dbconfig/20230605-125630-ladsgroup.json
  • 12:52 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 12:51 Amir1: killed prioritizeFilesWithTemplate.php, stopping depool maint.
  • 12:49 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 12:44 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T335845)', diff saved to https://phabricator.wikimedia.org/P48710 and previous config saved to /var/cache/conftool/dbconfig/20230605-124444-ladsgroup.json
  • 12:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T336886)', diff saved to https://phabricator.wikimedia.org/P48709 and previous config saved to /var/cache/conftool/dbconfig/20230605-124124-ladsgroup.json
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T336886)', diff saved to https://phabricator.wikimedia.org/P48708 and previous config saved to /var/cache/conftool/dbconfig/20230605-123915-ladsgroup.json
  • 12:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:39 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
  • 12:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 12:17 jynus: creating a copy of db1157 binlogs on dbprov1004 T338128
  • 12:15 bblack: lvs*: disabling puppet to roll out new LVS IPs in https://gerrit.wikimedia.org/r/c/operations/puppet/+/924593 - T334703
  • 12:15 bblack: lvs*: disabling puppet to roll out new LVS IPs in https://gerrit.wikimedia.org/r/c/operations/puppet/+/924593 - T334703
  • 12:15 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetboard-next
  • 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:relforge
  • 11:45 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:relforge
  • 11:39 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetboard-next
  • 11:21 moritzm: restarting Exim on MXes to pick up OpenSSL updates
  • 11:15 jmm@cumin2002: END (PASS) - Cookbook sre.cdn.roll-restart-reboot-ncredir (exit_code=0) rolling restart_daemons on A:ncredir
  • 11:13 moritzm: bounced ferm on ml-serve2006 (race caused by firewall profile change)
  • 11:08 jmm@cumin2002: START - Cookbook sre.cdn.roll-restart-reboot-ncredir rolling restart_daemons on A:ncredir
  • 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas
  • 10:29 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas
  • 10:14 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:14 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirts - aborrero@cumin1001"
  • 10:13 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirts - aborrero@cumin1001"
  • 10:11 moritzm: installing openssl security updates on Bullseye
  • 10:08 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 10:06 godog: truncate xff.log and JobExecutor.log on mwlog1002 to reclaim space - T338127
  • 09:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 09:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 09:39 claime: roll-restart thumbor in eqiad - T337649
  • 09:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 09:38 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=thumbor.*
  • 09:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 09:37 claime: roll-restart thumbor in codfw - T337649
  • 08:40 claime: power-cycling restbase1027 - T338122
  • 07:54 moritzm: installing containerd security updates
  • 07:38 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337669) (duration: 09m 58s)
  • 07:30 kartik@deploy1002: kartik: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337669) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:28 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation for 10 Wikipedias (T337669)
  • 07:25 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 07:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 07:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 07:23 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 07:21 taavi@deploy1002: Finished scap: Backport for [SearchVue] Enable on Norwegian, Hungarian, Catalan, Dutch, and Ukrainian (T336870) (duration: 18m 27s)
  • 07:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
  • 07:12 taavi@deploy1002: mlitn and taavi: Backport for [SearchVue] Enable on Norwegian, Hungarian, Catalan, Dutch, and Ukrainian (T336870) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 07:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
  • 07:02 taavi@deploy1002: Started scap: Backport for [SearchVue] Enable on Norwegian, Hungarian, Catalan, Dutch, and Ukrainian (T336870)
  • 06:20 _joe_: killing a pod with consistently high haproxy queue for thumbor in codfw
  • 06:16 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 60427
  • 06:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 60427

2023-06-03

  • 13:41 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Host under testing/upgrade
  • 13:41 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 30 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Host under testing/upgrade
  • 13:28 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2012.codfw.wmnet
  • 13:28 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2012.codfw.wmnet

2023-06-02

  • 20:16 apergos: rsync in ariel screen session, bwlimit 100000, running on dumpsdata1003, pulling from dumpsdata1002, copying over 'other dumps'
  • 18:42 bblack: dns*: puppets are all re-enabled, ntp restarts are done, etc
  • 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
  • 17:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - pt1979@cumin2002"
  • 17:45 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 17:45 pt1979@cumin2002: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 17:27 bblack: dns*: disabling puppet to control rollout of NTP config fixups
  • 16:03 bblack: dns*: removed faulty authdns[12]001 lines from /etc/hosts via cumin+sed
  • 15:35 sukhe: restart ntp.service on dns1002
  • 13:26 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:26 otto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:25 otto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:25 otto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:25 ottomata: deploying flink-operator change to dse-k8s and wikikube to add ingress for health check port - https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/926479
  • 13:24 otto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:24 otto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:24 otto@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:24 otto@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:22 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:22 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:03 moritzm: installing at-spi2-core bugfix updates from Bullseye point release
  • 09:35 moritzm: installing texlive-security updates on buster
  • 09:18 akosiaris: update kubernetes-node to 1.23.14-2 on all P:kubernetes::node hosts (88 in total) T337836. Reload systemd for unit changes to take effect
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5016.eqsin.wmnet
  • 08:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5016.eqsin.wmnet
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5015.eqsin.wmnet
  • 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5015.eqsin.wmnet
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5014.eqsin.wmnet
  • 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5014.eqsin.wmnet
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: cp5013.eqsin.wmnet
  • 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: cp5013.eqsin.wmnet
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 0 hosts:
  • 08:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 0 hosts:
  • 08:42 moritzm: installing traceroute bugfix updates from Bullseye point release
  • 07:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast6002.wikimedia.org
  • 07:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast6002.wikimedia.org
  • 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast3006.wikimedia.org
  • 07:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast3006.wikimedia.org
  • 07:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad or A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 07:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1003.wikimedia.org
  • 07:22 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad or A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 07:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1003.wikimedia.org
  • 01:53 ejegg: fundraising python tools upgraded from 759d4c89 to 2ca83336
  • 01:22 cstone: civicrm upgraded from 3819d6d1 to bcc8fccc

2023-06-01

  • 21:06 samtar@deploy1002: Finished scap: Backport for Remove deleted config wgVectorStickyHeaderEdit (T337955) (duration: 08m 30s)
  • 20:59 samtar@deploy1002: esanders and samtar: Backport for Remove deleted config wgVectorStickyHeaderEdit (T337955) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:57 samtar@deploy1002: Started scap: Backport for Remove deleted config wgVectorStickyHeaderEdit (T337955)
  • 20:54 samtar@deploy1002: Finished scap: Backport for Remove config and AB test code for edit buttons in sticky header (T337955) (duration: 10m 29s)
  • 20:45 samtar@deploy1002: samtar and ksarabia: Backport for Remove config and AB test code for edit buttons in sticky header (T337955) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:44 samtar@deploy1002: Started scap: Backport for Remove config and AB test code for edit buttons in sticky header (T337955)
  • 20:21 samtar@deploy1002: Finished scap: Backport for Deploy Research Incentive survey on enwiki (T336092) (duration: 07m 56s)
  • 20:15 samtar@deploy1002: dani and samtar: Backport for Deploy Research Incentive survey on enwiki (T336092) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:13 samtar@deploy1002: Started scap: Backport for Deploy Research Incentive survey on enwiki (T336092)
  • 20:12 samtar@deploy1002: Finished scap: Backport for Always collapse by default the CheckUserHelper on loginwiki (T328726) (duration: 08m 20s)
  • 20:05 samtar@deploy1002: samtar and dreamyjazz: Backport for Always collapse by default the CheckUserHelper on loginwiki (T328726) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:04 samtar@deploy1002: Started scap: Backport for Always collapse by default the CheckUserHelper on loginwiki (T328726)
  • 19:51 ejegg: fundraising python tools upgraded from 72570bdd to 759d4c89
  • 19:12 mforns@deploy1002: Finished deploy [airflow-dags/analytics@21e7354]: (no justification provided) (duration: 02m 42s)
  • 19:11 mforns@deploy1002: Started deploy [airflow-dags/analytics@21e7354]: (no justification provided)
  • 19:11 bblack@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: temporary lock for LVS/pybal upgrade work (duration: 03m 27s)
  • 19:09 bblack: lvs1* (eqiad): upgrade pybal to 1.15.13 - T334703
  • 19:08 bblack@deploy1002: Locking from deployment [ALL REPOSITORIES]: temporary lock for LVS/pybal upgrade work
  • 18:45 bblack: lvs6* (drmrs): upgrade pybal to 1.15.13 - T334703
  • 18:33 bblack: lvs3* (esams): upgrade pybal to 1.15.13 - T334703
  • 18:32 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.11 refs T337525
  • 17:50 mforns@deploy1002: Finished deploy [airflow-dags/analytics@03ca1c1]: (no justification provided) (duration: 00m 10s)
  • 17:50 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-upload_drmrs and A:cp
  • 17:50 mforns@deploy1002: Started deploy [airflow-dags/analytics@03ca1c1]: (no justification provided)
  • 17:49 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 17:48 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 17:48 fabfur@cumin1001: END (PASS) - Cookbook sre.cdn.run-puppet-restart-varnish (exit_code=0) rolling custom on A:cp-text_drmrs and A:cp
  • 17:47 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 17:47 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 17:45 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 17:45 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 17:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 17:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 16:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 16:55 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: revert: Remove undeeded wgEventBusStreamNamesMap override setting. Recent EventBus changes are not deployed yet? - T336817 (duration: 07m 24s)
  • 16:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 16:53 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
  • 16:53 aborrero@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
  • 16:52 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin2002"
  • 16:44 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: no-op: Remove undeeded wgEventBusStreamNamesMap override setting - T336817 (duration: 08m 18s)
  • 16:42 bblack: lvs2* (codfw): upgrade pybal to 1.15.13 - T334703
  • 16:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 16:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1002.eqiad.wmnet with OS bullseye
  • 16:35 bblack: lvs5* (eqsin): upgrade pybal to 1.15.13 - T334703
  • 16:32 bblack: lvs400[89]: upgrade pybal to 1.15.13 - T334703 (round 2!)
  • 16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 16:10 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
  • 16:07 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol2004-dev.codfw.wmnet with reason: host reimage
  • 16:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudswift1001.eqiad.wmnet with reason: host reimage
  • 16:06 mutante: gerrit - set repo wikimedia/annualreport to readonly (from active) - T337041
  • 16:04 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudswift1001.eqiad.wmnet with reason: host reimage
  • 16:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 16:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 15:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 15:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 15:45 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
  • 15:44 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
  • 15:33 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
  • 15:33 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
  • 15:21 fabfur: running run-puppet-agent on cp6010.drmrs.wmnet to fix icinga check from cookbook
  • 15:15 bblack: lvs400[89]: upgrade pybal to 1.15.13 - T334703
  • 15:11 sukhe: reprepro -C component/pybal bullseye-wikimedia pybal_1.15.13_source.changes
  • 15:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog1002.eqiad.wmnet with OS bullseye
  • 14:59 moritzm: installing python-sqlparse security updates
  • 14:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 14:56 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
  • 14:55 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
  • 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudswift1001.eqiad.wmnet with OS bullseye
  • 14:53 moritzm: installing jackson-databind security updates
  • 14:49 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 14:45 fabfur: running run-puppet-agent on cp6009.drmrs.wmnet to fix icinga check from cookbook
  • 14:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog1002.eqiad.wmnet with reason: host reimage
  • 14:41 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog1002.eqiad.wmnet with reason: host reimage
  • 14:40 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-upload_drmrs and A:cp
  • 14:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 14:39 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 14:36 fabfur@cumin1001: START - Cookbook sre.cdn.run-puppet-restart-varnish rolling custom on A:cp-text_drmrs and A:cp
  • 14:34 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
  • 14:29 moritzm: installing imagemagick security updates on buster
  • 14:16 herron@cumin1001: START - Cookbook sre.hosts.reimage for host mwlog1002.eqiad.wmnet with OS bullseye
  • 14:14 fabfur: Disabled puppet on A:cp-drmrs for T323557
  • 14:13 mforns@deploy1002: Finished deploy [airflow-dags/analytics@3c9cc85]: (no justification provided) (duration: 00m 11s)
  • 14:13 mforns@deploy1002: Started deploy [airflow-dags/analytics@3c9cc85]: (no justification provided)
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48700 and previous config saved to /var/cache/conftool/dbconfig/20230601-141317-ladsgroup.json
  • 14:11 claime: Removing obsolete mediawiki-services-function-evaluator from registry - T337505
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P48699 and previous config saved to /var/cache/conftool/dbconfig/20230601-135811-ladsgroup.json
  • 13:52 moritzm: installing sysstat security updates
  • 13:52 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:51 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 13:50 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 13:50 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 13:49 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:49 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P48698 and previous config saved to /var/cache/conftool/dbconfig/20230601-134304-ladsgroup.json
  • 13:29 moritzm: installing openssl security updates on bullseye
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48697 and previous config saved to /var/cache/conftool/dbconfig/20230601-132758-ladsgroup.json
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T336886)', diff saved to https://phabricator.wikimedia.org/P48695 and previous config saved to /var/cache/conftool/dbconfig/20230601-132319-ladsgroup.json
  • 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 13:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 13:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T336886)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20230601-132238-ladsgroup.json
  • 13:21 claime: Removing obsolete mediawiki-services-function-orchestrator from registry - T337505
  • 13:13 urbanecm@deploy1002: Finished scap: Backport for beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362), Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364) (duration: 11m 08s)
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P48694 and previous config saved to /var/cache/conftool/dbconfig/20230601-130732-ladsgroup.json
  • 13:04 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 13:04 urbanecm@deploy1002: urbanecm and daimona: Backport for beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362), Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:03 bking@cumin1001: START - Cookbook sre.hosts.downtime for 20 days, 0:00:00 on wdqs2021.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 13:02 urbanecm@deploy1002: Started scap: Backport for beta: Stop setting unused $wgCampaignEventsUseNewTrackingToolsSchema (T336362), Set $wgCampaignEventsUseNewTrackingToolsSchema to true in prod (T336364)
  • 12:58 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:52 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 12:52 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P48693 and previous config saved to /var/cache/conftool/dbconfig/20230601-125226-ladsgroup.json
  • 12:50 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 12:49 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T336886)', diff saved to https://phabricator.wikimedia.org/P48692 and previous config saved to /var/cache/conftool/dbconfig/20230601-123720-ladsgroup.json
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T336886)', diff saved to https://phabricator.wikimedia.org/P48691 and previous config saved to /var/cache/conftool/dbconfig/20230601-123236-ladsgroup.json
  • 12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 12:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T336886)', diff saved to https://phabricator.wikimedia.org/P48690 and previous config saved to /var/cache/conftool/dbconfig/20230601-122900-ladsgroup.json
  • 12:17 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:17 jelto@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:16 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:16 jelto@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P48689 and previous config saved to /var/cache/conftool/dbconfig/20230601-121354-ladsgroup.json
  • 12:03 Daimona: Creating ce_tracking_tools table for the CampaignEvents extension on testwiki, test2wiki, officewiki, and metawiki # T336365
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P48688 and previous config saved to /var/cache/conftool/dbconfig/20230601-115848-ladsgroup.json
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T336886)', diff saved to https://phabricator.wikimedia.org/P48687 and previous config saved to /var/cache/conftool/dbconfig/20230601-114342-ladsgroup.json
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T336886)', diff saved to https://phabricator.wikimedia.org/P48686 and previous config saved to /var/cache/conftool/dbconfig/20230601-113843-ladsgroup.json
  • 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T336886)', diff saved to https://phabricator.wikimedia.org/P48685 and previous config saved to /var/cache/conftool/dbconfig/20230601-113822-ladsgroup.json
  • 11:28 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 11:28 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 11:26 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:25 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P48684 and previous config saved to /var/cache/conftool/dbconfig/20230601-112316-ladsgroup.json
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P48683 and previous config saved to /var/cache/conftool/dbconfig/20230601-110810-ladsgroup.json
  • 11:04 jayme: disabling puppet on all kubernestes control planes for https://gerrit.wikimedia.org/r/c/operations/puppet/+/925707
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T336886)', diff saved to https://phabricator.wikimedia.org/P48682 and previous config saved to /var/cache/conftool/dbconfig/20230601-105303-ladsgroup.json
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T336886)', diff saved to https://phabricator.wikimedia.org/P48681 and previous config saved to /var/cache/conftool/dbconfig/20230601-104803-ladsgroup.json
  • 10:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 10:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 10:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T336886)', diff saved to https://phabricator.wikimedia.org/P48680 and previous config saved to /var/cache/conftool/dbconfig/20230601-104742-ladsgroup.json
  • 10:45 cmooney@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P48679 and previous config saved to /var/cache/conftool/dbconfig/20230601-103236-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P48678 and previous config saved to /var/cache/conftool/dbconfig/20230601-101730-ladsgroup.json
  • 10:17 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:17 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
  • 10:16 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev.private.codfw.wikimedia.cloud - aborrero@cumin2002"
  • 10:14 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T336886)', diff saved to https://phabricator.wikimedia.org/P48677 and previous config saved to /var/cache/conftool/dbconfig/20230601-100224-ladsgroup.json
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T336886)', diff saved to https://phabricator.wikimedia.org/P48676 and previous config saved to /var/cache/conftool/dbconfig/20230601-100011-ladsgroup.json
  • 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 09:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 09:56 moritzm: installing systemd security updates on bullseye
  • 09:53 Amir1: ladsgroup@mwmaint1002:~$ foreachwikiindblist group2 extensions/AbuseFilter/maintenance/MigrateActorsAF.php (T336224)
  • 09:52 gehel: cleaning apt archives on an-test-worker1002: `sudo apt-get clean`, recovering 14G
  • 09:49 cmooney@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
  • 09:43 cmooney@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcontrol2004-dev']
  • 09:36 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2004-dev']
  • 09:36 cmooney@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcontrol2004-dev']
  • 09:35 cmooney@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcontrol2004-dev']
  • 09:32 volans: installed spicerack v7.2.0 on cumin2002
  • 09:30 aborrero@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
  • 09:21 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet
  • 09:18 godog: remove lv prometheus-global - T288196
  • 09:17 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet
  • 09:17 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet
  • 09:16 volans@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest1001.eqiad.wmnet
  • 09:16 volans@cumin1001: START - Cookbook sre.hosts.dhcp for host sretest1001.eqiad.wmnet
  • 09:13 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet
  • 09:12 volans: installed spicerack v7.2.0 on cumin1001
  • 09:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1008.eqiad.wmnet
  • 09:07 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1008.eqiad.wmnet
  • 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1007.eqiad.wmnet
  • 09:02 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1007.eqiad.wmnet
  • 09:01 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1006.eqiad.wmnet
  • 08:57 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1006.eqiad.wmnet
  • 08:56 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcontrol2004-dev.codfw.wmnet with OS bullseye
  • 08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev - aborrero@cumin1001"
  • 08:53 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol2004-dev - aborrero@cumin1001"
  • 08:49 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 08:48 apergos: UTC morning backport and config training window done
  • 08:30 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 08:29 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 08:28 jelto@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:28 daniel@deploy1002: Finished scap: Backport for ORES: add model versions configuration and thresholds (T319170) (duration: 10m 12s)
  • 08:28 jelto@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:19 daniel@deploy1002: daniel and isaranto: Backport for ORES: add model versions configuration and thresholds (T319170) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 08:18 daniel@deploy1002: Started scap: Backport for ORES: add model versions configuration and thresholds (T319170)
  • 07:55 daniel@deploy1002: Finished scap: Backport for Enable parser cache warming jobs for parsoid on frwiki (T329366) (duration: 09m 09s)
  • 07:48 daniel@deploy1002: daniel: Backport for Enable parser cache warming jobs for parsoid on frwiki (T329366) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:46 daniel@deploy1002: Started scap: Backport for Enable parser cache warming jobs for parsoid on frwiki (T329366)
  • 07:42 mlitn@deploy1002: Finished scap: Backport for Add $wgInterwikiLogoOverride (T315269) (duration: 33m 02s)
  • 07:35 moritzm: installing libssh security updates
  • 07:29 mlitn@deploy1002: mlitn: Backport for Add $wgInterwikiLogoOverride (T315269) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 07:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
  • 07:20 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: Setup in progress
  • 07:09 mlitn@deploy1002: Started scap: Backport for Add $wgInterwikiLogoOverride (T315269)
  • 06:16 kart_: Updated MinT to 2023-06-01-041041-production (T336525)
  • 06:01 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: applied
  • 05:56 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 05:49 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 05:46 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 05:44 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 05:42 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 05:39 kart_: Updated cxserver to 2023-06-01-041016-production (T337669)
  • 05:34 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 05:34 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 05:32 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 05:32 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 05:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 00:11 eileen: civicrm upgraded from 885208ca to 3819d6d1


Othe archives

2000s

2010s

2020s