Jump to content

Server Admin Log/Archive 65

From Wikitech

2023-04-30

  • 14:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db2184.codfw.wmnet with reason: Host down T335640
  • 14:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db2184.codfw.wmnet with reason: Host down T335640
  • 08:06 elukey: powercycle ores1002 (mgmt console tty not usable, host frozen)

2023-04-29

  • 23:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: Maint
  • 23:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1132.eqiad.wmnet with reason: Maint
  • 22:54 rzl@cumin2002: dbctl commit (dc=all): 'Depool db1132', diff saved to https://phabricator.wikimedia.org/P47290 and previous config saved to /var/cache/conftool/dbconfig/20230429-225457-rzl.json

2023-04-28

  • 22:46 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for new frack nodes - pt1979@cumin2002"
  • 22:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for new frack nodes - pt1979@cumin2002"
  • 22:31 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 21:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: setup
  • 21:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: setup
  • 20:25 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2002.wikimedia.org with reason: setup
  • 20:25 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit2002.wikimedia.org with reason: setup
  • 20:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: setup
  • 20:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: setup
  • 20:24 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1001.wikimedia.org with reason: setup
  • 20:24 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1001.wikimedia.org with reason: setup
  • 19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:20 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:16 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:16 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:10 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:10 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:07 otto@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 19:07 otto@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 17:50 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@d56b7fb]: (no justification provided) (duration: 00m 10s)
  • 17:50 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@d56b7fb]: (no justification provided)
  • 15:39 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:23 jynus: update schema for backup1-codfw (mediabackups) T327157
  • 15:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2519
  • 15:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2519
  • 14:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['stat1004']
  • 14:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1004']
  • 14:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['stat1004']
  • 14:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['stat1004']
  • 13:21 vgutierrez: import haproxy 2.7.7 on apt.wm.o thirdparty/haproxy27 for bullseye
  • 12:36 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:35 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:35 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:34 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:31 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:30 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:29 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:29 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 12:08 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add new server sretest1003 - jclark@cumin1001"
  • 12:06 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 10:43 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2003.codfw.wmnet with OS bullseye
  • 10:28 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2003.codfw.wmnet with reason: host reimage
  • 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:25 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2003.codfw.wmnet with reason: host reimage
  • 10:13 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2002.codfw.wmnet with OS bullseye
  • 10:11 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2003.codfw.wmnet with OS bullseye
  • 10:01 vgutierrez: restarting varnish on cp5017 and cp5025 to drop port 80 - T322774
  • 09:58 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2002.codfw.wmnet with reason: host reimage
  • 09:55 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2002.codfw.wmnet with reason: host reimage
  • 09:42 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2002.codfw.wmnet with OS bullseye
  • 09:31 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2001.codfw.wmnet with OS bullseye
  • 09:24 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:13 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2001.codfw.wmnet with reason: host reimage
  • 09:11 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2001.codfw.wmnet with reason: host reimage
  • 08:57 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2001.codfw.wmnet with OS bullseye
  • 08:47 jnuche@deploy1002: Installing scap version "4.51.0" for 593 hosts
  • 08:29 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2003.codfw.wmnet with OS buster
  • 08:23 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:14 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2003.codfw.wmnet with reason: host reimage
  • 08:11 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2003.codfw.wmnet with reason: host reimage
  • 07:57 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2003.codfw.wmnet with OS buster
  • 07:55 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2002.codfw.wmnet with OS buster
  • 07:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 07:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 07:41 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2002.codfw.wmnet with reason: host reimage
  • 07:37 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2002.codfw.wmnet with reason: host reimage
  • 07:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:23 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2002.codfw.wmnet with OS buster
  • 07:22 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache2001.codfw.wmnet with OS buster
  • 07:04 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache2001.codfw.wmnet with reason: host reimage
  • 07:00 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache2001.codfw.wmnet with reason: host reimage
  • 06:46 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-cache2001.codfw.wmnet with OS buster
  • 05:57 XioNoX: push pfw policies - T335554
  • 05:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
  • 05:29 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
  • 05:28 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 393731
  • 05:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 393731
  • 04:08 eileen: config revision changed from b33fa934 to 2eef4039
  • 03:16 ejegg: SmashPig upgraded from db9fa965 to a9fa7a2c
  • 03:08 ejegg: payments-wiki upgraded from 91582d93 to 61951572
  • 03:05 eileen: config revision changed from 98f2afbb to b33fa934
  • 02:55 eileen: civicrm upgraded from b4a05476 to e7904ea6
  • 02:13 eileen: civicrm upgraded from 601d223e to b4a05476

2023-04-27

  • 22:17 zabe@deploy1002: Finished scap: T334295 (duration: 06m 58s)
  • 22:10 zabe@deploy1002: Started scap: T334295
  • 20:29 TheresNoTime: close UTC late backport window
  • 20:27 samtar@deploy1002: Finished scap: Backport for [cawikisource] Add a wordmark (Vector 2022) (T331823), [cawiktionary] Add a wordmark (Vector 2022) (T331823) (duration: 07m 19s)
  • 20:21 samtar@deploy1002: superpes and samtar: Backport for [cawikisource] Add a wordmark (Vector 2022) (T331823), [cawiktionary] Add a wordmark (Vector 2022) (T331823) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:20 samtar@deploy1002: Started scap: Backport for [cawikisource] Add a wordmark (Vector 2022) (T331823), [cawiktionary] Add a wordmark (Vector 2022) (T331823)
  • 20:20 samtar@deploy1002: Finished scap: Backport for [cawikibooks] Add a wordmark (Vector 2022) (T331823), [cawikinews] Add a wordmark (Vector 2022) (T331823), [cawikiquote] Add a wordmark (Vector 2022) (T331823) (duration: 09m 43s)
  • 20:11 samtar@deploy1002: samtar and superpes: Backport for [cawikibooks] Add a wordmark (Vector 2022) (T331823), [cawikinews] Add a wordmark (Vector 2022) (T331823), [cawikiquote] Add a wordmark (Vector 2022) (T331823) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:10 samtar@deploy1002: Started scap: Backport for [cawikibooks] Add a wordmark (Vector 2022) (T331823), [cawikinews] Add a wordmark (Vector 2022) (T331823), [cawikiquote] Add a wordmark (Vector 2022) (T331823)
  • 19:27 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@bc37201]: (no justification provided) (duration: 00m 10s)
  • 19:27 ejegg: payments-wiki upgraded from 7fa25437 to 91582d93
  • 19:27 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@bc37201]: (no justification provided)
  • 19:16 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@f162f4d]: Deploying T333001 on platform_eng Airflow instance. (duration: 12m 01s)
  • 19:04 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@f162f4d]: Deploying T333001 on platform_eng Airflow instance.
  • 18:47 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.6 refs T330212
  • 18:37 jhuneidi@deploy1002: Finished scap: Backport for Replace references to actionsToolbar (T335469) (duration: 16m 10s)
  • 18:22 jhuneidi@deploy1002: jhuneidi and jforrester: Backport for Replace references to actionsToolbar (T335469) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 18:21 jhuneidi@deploy1002: Started scap: Backport for Replace references to actionsToolbar (T335469)
  • 17:51 herron@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=1) for host kafkamon1003.eqiad.wmnet with OS bullseye
  • 17:39 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage
  • 17:35 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafkamon1003.eqiad.wmnet with reason: host reimage
  • 17:27 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@5a46db1] (releasing): (no justification provided) (duration: 00m 40s)
  • 17:27 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@5a46db1] (releasing): (no justification provided)
  • 17:14 hnowlan@deploy1002: Finished deploy [restbase/deploy@a08f56d]: Deploying new wikis: T333272 T334460 T334741 T335020 (duration: 03m 29s)
  • 17:11 hnowlan@deploy1002: Started deploy [restbase/deploy@a08f56d]: Deploying new wikis: T333272 T334460 T334741 T335020
  • 17:06 mutante: deploy2002 - armed the keyholder (sudo keyholder arm and enter passphrase from deployment-key-passphrase in pwstore) - monitoring alert should resolve - T335435
  • 17:01 herron@cumin1001: START - Cookbook sre.ganeti.reimage for host kafkamon1003.eqiad.wmnet with OS bullseye
  • 16:56 volans: uploaded python3-wmflib_1.2.2 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
  • 16:20 herron@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host kafkamon1003.eqiad.wmnet
  • 16:20 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kafkamon1003.eqiad.wmnet - herron@cumin1001"
  • 16:19 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM kafkamon1003.eqiad.wmnet - herron@cumin1001"
  • 16:05 herron@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) kafkamon1003.eqiad.wmnet on all recursors
  • 16:05 herron@cumin1001: START - Cookbook sre.dns.wipe-cache kafkamon1003.eqiad.wmnet on all recursors
  • 16:05 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:05 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kafkamon1003.eqiad.wmnet - herron@cumin1001"
  • 16:01 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM kafkamon1003.eqiad.wmnet - herron@cumin1001"
  • 15:59 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 15:59 herron@cumin1001: START - Cookbook sre.ganeti.makevm for new host kafkamon1003.eqiad.wmnet
  • 15:58 vgutierrez: restarting varnish on cp5018 and cp5026 to drop port 80 - T322774
  • 15:55 jbond: upload puppetboard_4.3.0-1_all.deb to bookworm-wikimedia
  • 15:37 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 15:35 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 15:35 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:35 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:34 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 15:34 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 15:34 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:33 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 15:33 legoktm@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 15:32 legoktm@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 15:29 krinkle@deploy1002: Synchronized wmf-config/mc.php: Ia174ea2b0645 (duration: 06m 05s)
  • 15:25 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 15:23 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:22 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 15:22 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 15:22 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:22 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:22 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 15:21 claime: repooled mw2331.codfw.wmnet - T335486
  • 15:21 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2331.codfw.wmnet
  • 15:21 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2331.codfw.wmnet
  • 15:21 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 15:21 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:21 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 15:20 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 15:20 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 15:18 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 15:17 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 15:14 legoktm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:13 legoktm@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:10 vgutierrez: restarting varnish on cp5019 and cp5027 to drop port 80 - T322774
  • 15:01 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:59 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 14:58 claime: repooling mw2330.codfw.wmnet - T335487
  • 14:58 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2330.codfw.wmnet
  • 14:58 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2330.codfw.wmnet
  • 14:56 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:55 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Add language codes cal and tpv to wmgExtraLanguageNames (T308062) (duration: 07m 55s)
  • 14:49 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and noa: Backport for Add language codes cal and tpv to wmgExtraLanguageNames (T308062) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 14:47 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Add language codes cal and tpv to wmgExtraLanguageNames (T308062)
  • 14:46 ejegg: payments-wiki upgraded from f30bc859 to 7fa25437
  • 14:46 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for lowiki: Use Western style (0-9) numerals (T335345) (duration: 08m 53s)
  • 14:38 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for lowiki: Use Western style (0-9) numerals (T335345) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:37 ejegg: disabled fundraising job ingenico_recurring_fill_scheme_ids (it's all done)
  • 14:37 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for lowiki: Use Western style (0-9) numerals (T335345)
  • 14:36 vgutierrez: restarting varnish on cp5020 and cp5028 to drop port 80 - T322774
  • 14:35 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Close cnwikimedia (T274083) (duration: 11m 05s)
  • 14:29 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 14:28 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 14:28 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:28 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:27 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 14:27 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 14:27 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:26 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 14:26 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 14:26 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 14:25 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 14:25 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for Close cnwikimedia (T274083) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 14:25 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 14:25 moritzm: restarting apache/FPM on mw canaries to pick up curl update
  • 14:24 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Close cnwikimedia (T274083)
  • 14:20 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for labtestwiki: disable cirrus completion index (duration: 09m 31s)
  • 14:13 moritzm: installing curl security updates on buster
  • 14:12 lucaswerkmeister-wmde@deploy1002: dcausse and lucaswerkmeister-wmde: Backport for labtestwiki: disable cirrus completion index synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:11 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for labtestwiki: disable cirrus completion index
  • 14:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1003.eqiad.wmnet with OS bullseye
  • 14:05 samtar@deploy1002: Finished scap: Backport for Enable $wgCampaignEventsEnableMultipleOrganizers in production (T334088) (duration: 38m 35s)
  • 14:00 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:59 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:59 legoktm@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 13:59 legoktm@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 13:48 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
  • 13:45 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1003.eqiad.wmnet with reason: host reimage
  • 13:35 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 13:33 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1003.eqiad.wmnet with OS bullseye
  • 13:31 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 13:30 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 13:28 samtar@deploy1002: samtar and cmelo: Backport for Enable $wgCampaignEventsEnableMultipleOrganizers in production (T334088) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:26 samtar@deploy1002: Started scap: Backport for Enable $wgCampaignEventsEnableMultipleOrganizers in production (T334088)
  • 13:20 samtar@deploy1002: Finished scap: Backport for metawiki: Give campaignevents-organize-events to campaignevents-beta-tester only (T334088) (duration: 15m 07s)
  • 13:20 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 13:06 samtar@deploy1002: samtar and cmelo: Backport for metawiki: Give campaignevents-organize-events to campaignevents-beta-tester only (T334088) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:05 samtar@deploy1002: Started scap: Backport for metawiki: Give campaignevents-organize-events to campaignevents-beta-tester only (T334088)
  • 13:04 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 12:56 vgutierrez: restarting varnish on cp5021 and cp5029 to drop port 80 - T322774
  • 12:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
  • 12:40 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1002.eqiad.wmnet with reason: host reimage
  • 12:29 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1002.eqiad.wmnet with OS bullseye
  • 12:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-cache1001.eqiad.wmnet with OS bullseye
  • 12:12 moritzm: imported puppet 5.5.22-2+deb13u3 to bookworm-wikimedia T330495
  • 11:56 jbond: upload python3-pypuppetdb_3.1.0-1_all.deb to bookworm
  • 11:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 23951
  • 11:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 23951
  • 11:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 54994
  • 11:41 krinkle@deploy1002: Synchronized wmf-config/: I195978 (duration: 06m 29s)
  • 11:14 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 11:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 54994
  • 11:09 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:09 vgutierrez: restarting varnish on cp5022 and cp5030 to drop port 80 - T322774
  • 11:07 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:03 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:00 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 10:59 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:59 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:33 vgutierrez: restarting varnish on cp5023 and cp5031 to drop port 80 - T322774
  • 10:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
  • 10:20 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-cache1001.eqiad.wmnet with reason: host reimage
  • 10:09 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS bullseye
  • 10:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1002.wikimedia.org
  • 10:04 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ml-cache1001.eqiad.wmnet with OS bullseye
  • 10:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1002.wikimedia.org
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 09:55 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS bullseye
  • 09:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 09:54 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-cache1001.eqiad.wmnet with OS bullseye
  • 09:43 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:42 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 09:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2331.codfw.wmnet with reason: PSU failure
  • 09:42 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2331.codfw.wmnet with reason: PSU failure
  • 09:41 cgoubert@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 7 days, 0:00:00 on mw2331.codfw.wmnet with reason: PSU failure
  • 09:41 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2331.codfw.wmnet with reason: PSU failure
  • 09:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2330.codfw.wmnet with reason: PSU failure
  • 09:41 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2330.codfw.wmnet with reason: PSU failure
  • 09:40 claime: depooling mw2330.codfw.wmnet for HW troubleshooting - T335487
  • 09:39 godog: delete all 2023 replica=unset blocks from thanos - T335406
  • 09:37 claime: depooling mw2331.codfw.wmnet for HW troubleshooting - T335486
  • 09:36 vgutierrez: restarting varnish on cp5024 and cp5032 to drop port 80 - T322774
  • 09:34 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-cache1001.eqiad.wmnet with OS bullseye
  • 09:29 moritzm: imported prometheus-rsyslog-exporter to bookworm-wikimedia T330495
  • 09:29 moritzm: imported wmf-certificates to bookworm-wikimedia T330495
  • 09:14 vgutierrez: restarting varnish on cp4037 and cp4045 to drop port 80 - T322774
  • 09:11 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@fb6f0ea] (releasing): (no justification provided) (duration: 00m 40s)
  • 09:10 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@fb6f0ea] (releasing): (no justification provided)
  • 09:09 godog: restart thanos-compact on thanos-fe2001 - T335406
  • 09:06 moritzm: uploaded debdeploy 0.0.99.13+deb12u1 to bookworm-wikimedia T330495
  • 09:00 godog: delete overlapping block 01GY1CQ4EAKRV9BQ8D9JB1VWGJ from thanos - T335406
  • 08:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 112
  • 08:39 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 112
  • 08:24 vgutierrez: restarting varnish on cp4038 and cp4046 to drop port 80 - T322774
  • 08:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 199524
  • 08:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 199524
  • 07:50 apergos: UTC morning backport and config training window complete
  • 07:45 jnuche@deploy1002: Finished scap: Backport for Hide wrong "this reference is used 0 times" in citation dialog (T241885 T335410) (duration: 08m 33s)
  • 07:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15169
  • 07:38 jnuche@deploy1002: thiemowmde and jnuche: Backport for Hide wrong "this reference is used 0 times" in citation dialog (T241885 T335410) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:37 jnuche@deploy1002: Started scap: Backport for Hide wrong "this reference is used 0 times" in citation dialog (T241885 T335410)
  • 07:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15169
  • 07:23 moritzm: uploaded debmonitor-client 0.3.2-1+deb12u1 to bookworm-wikimedia T330495
  • 05:56 XioNoX: Configure 1:1 NAT for new fr-tech hosts - T335441
  • 05:51 XioNoX: downgrade SGIX RS BGP sessions to non-primary
  • 00:01 zabe@deploy1002: Finished scap: T334295 (duration: 06m 53s)

2023-04-26

  • 23:54 zabe@deploy1002: Started scap: T334295
  • 23:32 zabe@deploy1002: Finished scap: Backport for Fix `a.image:not(.noviewer,.metadata),a.thumbimage:not(.noviewer,.metadata)' is not a valid selector` bug (T335451) (duration: 07m 07s)
  • 23:26 zabe@deploy1002: zabe and nray: Backport for Fix `a.image:not(.noviewer,.metadata),a.thumbimage:not(.noviewer,.metadata)' is not a valid selector` bug (T335451) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 23:25 zabe@deploy1002: Started scap: Backport for Fix `a.image:not(.noviewer,.metadata),a.thumbimage:not(.noviewer,.metadata)' is not a valid selector` bug (T335451)
  • 22:06 samtar@deploy1002: Finished scap: Backport for interwiki: update URL to XTools (duration: 09m 43s)
  • 21:57 samtar@deploy1002: musikanimal and samtar: Backport for interwiki: update URL to XTools synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:56 samtar@deploy1002: Started scap: Backport for interwiki: update URL to XTools
  • 21:39 brett: Re-enable Puppet on LVS[4008-4010] - T263797
  • 21:02 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=labweb-ssl
  • 21:00 bblack@cumin1001: conftool action : set/pooled=yes; selector: service=labweb
  • 20:37 jhuneidi@deploy1002: Finished scap: Backport for Set Vector 2022 as default skin on Polish Wikipedia (T335311) (duration: 09m 22s)
  • 20:29 jhuneidi@deploy1002: jhuneidi and jdrewniak: Backport for Set Vector 2022 as default skin on Polish Wikipedia (T335311) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:28 jhuneidi@deploy1002: Started scap: Backport for Set Vector 2022 as default skin on Polish Wikipedia (T335311)
  • 19:47 brett: Disable Puppet on LVS[4008-4010] for rollout of LVS maglev hashing scheduler - T263797
  • 19:16 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@eb07d71]: fetch_conda: path globs must not be quoted (duration: 00m 27s)
  • 19:15 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@eb07d71]: fetch_conda: path globs must not be quoted
  • 19:10 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@5f2ec35]: repoint shebang lines of conda env (duration: 00m 23s)
  • 19:10 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@5f2ec35]: repoint shebang lines of conda env
  • 18:34 ebernhardson@deploy1002: Finished deploy [search/mjolnir/deploy@ba52b43]: replace python env deployment method with conda env from gitlab (duration: 00m 24s)
  • 18:33 ebernhardson@deploy1002: Started deploy [search/mjolnir/deploy@ba52b43]: replace python env deployment method with conda env from gitlab
  • 18:16 jhuneidi@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.6 refs T330212 (duration: 06m 04s)
  • 18:10 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.6 refs T330212
  • 17:37 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5016
  • 17:37 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:37 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5016 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
  • 17:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36351
  • 17:31 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5016 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
  • 17:29 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:24 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5016
  • 17:23 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5015
  • 17:23 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:23 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5015 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
  • 17:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36351
  • 17:21 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5015 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
  • 17:17 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:13 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5015
  • 17:12 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5014
  • 17:12 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:12 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5014 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
  • 17:11 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5014 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
  • 16:56 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 16:50 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5014
  • 16:48 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5013
  • 16:48 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:48 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5013 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
  • 16:46 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cp5013 decommissioned, removing all IPs except the asset tag one - robh@cumin1001"
  • 16:44 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 16:36 robh@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5013
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2002.codfw.wmnet with OS bullseye
  • 16:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5014.eqsin.wmnet with OS bullseye
  • 16:29 robh@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5013.eqsin.wmnet with OS bullseye
  • 16:29 robh@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5015.eqsin.wmnet with OS bullseye
  • 16:29 robh@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 16:29 robh@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5016.eqsin.wmnet with OS bullseye
  • 16:29 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 16:17 vgutierrez: restarting varnish on cp4039 and cp4047 to drop port 80 - T322774
  • 16:10 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 16:08 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 16:05 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 15:51 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
  • 15:49 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5015.eqsin.wmnet with reason: host reimage
  • 15:46 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
  • 15:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5016.eqsin.wmnet with reason: host reimage
  • 15:44 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5015.eqsin.wmnet with reason: host reimage
  • 15:43 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5014.eqsin.wmnet with reason: host reimage
  • 15:43 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - robh@cumin1001"
  • 15:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
  • 15:38 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2002.codfw.wmnet with reason: host reimage
  • 15:34 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 15:31 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 15:21 robh@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5013.eqsin.wmnet with reason: host reimage
  • 15:20 htriedman@deploy1002: Finished deploy [airflow-dags/platform_eng@5061681]: (no justification provided) (duration: 00m 20s)
  • 15:19 htriedman@deploy1002: Started deploy [airflow-dags/platform_eng@5061681]: (no justification provided)
  • 15:18 robh@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5013.eqsin.wmnet with reason: host reimage
  • 15:14 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp5016.eqsin.wmnet with OS bullseye
  • 15:14 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp5015.eqsin.wmnet with OS bullseye
  • 15:13 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp5014.eqsin.wmnet with OS bullseye
  • 14:45 robh@cumin1001: START - Cookbook sre.hosts.reimage for host cp5013.eqsin.wmnet with OS bullseye
  • 14:41 vgutierrez: restarting varnish on cp4040 and cp4048 to drop port 80 - T322774
  • 14:34 cgoubert@deploy1002: Finished scap: Backport for Revert "debug.json: List primary DC servers first" (duration: 08m 07s)
  • 14:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
  • 14:28 cgoubert@deploy1002: cgoubert: Backport for Revert "debug.json: List primary DC servers first" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:26 cgoubert@deploy1002: Started scap: Backport for Revert "debug.json: List primary DC servers first"
  • 14:24 cgoubert@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchback - T327920 (duration: 69m 03s)
  • 14:16 marostegui: Update dns for parsercache T327920
  • 14:10 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
  • 14:08 claime: Phase 9.5 Update DNS records for new database masters
  • 14:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
  • 14:07 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
  • 14:07 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:05 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:05 claime: Restarting maintenance jobs - T327920
  • 14:04 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 14:04 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 14:03 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:03 cgoubert@cumin1001: MediaWiki read-only period ends at: 2023-04-26 14:03:01.527715
  • 14:00 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 13:59 claime: Going to read-only for mediawiki datacenter switchback - T327920
  • 13:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 13:55 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 1239
  • 13:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 1239
  • 13:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 136106
  • 13:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 136106
  • 13:47 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
  • 13:46 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:45 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
  • 13:45 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:45 claime: Stopping maintenance scripts for datacenter switchback - T327920
  • 13:43 vgutierrez: restarting varnish on cp4041 and cp4049 to drop port 80 - T322774
  • 13:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
  • 13:35 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks
  • 13:35 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=0)
  • 13:31 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
  • 13:31 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 13:25 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 13:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 13:25 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 13:23 claime: Starting mediawiki datacenter switchback preparation - T327920
  • 13:15 cgoubert@deploy1002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchback - T327920
  • 13:14 claime: Locking scap for datacenter switchback - T327920
  • 13:13 vgutierrez: restarting varnish on cp4042 and cp4050 to drop port 80 - T322774
  • 13:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 13:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 13:06 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 12:56 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 12:55 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp5013.mgmt.eqsin.wmnet with reboot policy FORCED
  • 12:52 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:49 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 12:49 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 12:49 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 12:40 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp5013.mgmt.eqsin.wmnet with reboot policy FORCED
  • 12:13 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided) (duration: 00m 36s)
  • 12:13 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided)
  • 12:11 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided) (duration: 00m 33s)
  • 12:10 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided)
  • 12:10 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided) (duration: 01m 15s)
  • 12:09 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided)
  • 12:03 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided) (duration: 00m 34s)
  • 12:03 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@93a04bd] (releasing): (no justification provided)
  • 11:37 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 11:27 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 11:25 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 11:16 moritzm: import php-excimer 1.0.2-1+wmf3+buster1+icu67 to component/icu67 T332964
  • 11:15 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 10:54 btullis@deploy1002: Finished deploy [analytics/refinery@571f955] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@571f955] (duration: 01m 30s)
  • 10:52 btullis@deploy1002: Started deploy [analytics/refinery@571f955] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@571f955]
  • 10:52 btullis@deploy1002: Finished deploy [analytics/refinery@571f955] (thin): Regular analytics weekly train THIN [analytics/refinery@571f955] (duration: 02m 08s)
  • 10:50 btullis@deploy1002: Started deploy [analytics/refinery@571f955] (thin): Regular analytics weekly train THIN [analytics/refinery@571f955]
  • 10:49 btullis@deploy1002: Finished deploy [analytics/refinery@571f955]: Regular analytics weekly train [analytics/refinery@571f955] (duration: 05m 23s)
  • 10:44 btullis@deploy1002: Started deploy [analytics/refinery@571f955]: Regular analytics weekly train [analytics/refinery@571f955]
  • 10:25 vgutierrez: restarting varnish on cp4043 and cp4051 to drop port 80 - T322774
  • 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 1828
  • 10:07 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 10:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 1828
  • 09:57 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 09:54 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 09:54 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 09:49 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 09:49 btullis@cumin1001: Added views for new wiki: kbdwiktionary T333270
  • 09:31 vgutierrez: restarting varnish on cp4044 and cp4052 to drop port 80 - T322774
  • 09:26 btullis@deploy1002: Finished deploy [analytics/refinery@571f955] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@571f955] (duration: 00m 04s)
  • 09:26 btullis@deploy1002: Started deploy [analytics/refinery@571f955] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@571f955]
  • 09:25 btullis@deploy1002: Finished deploy [analytics/refinery@571f955] (thin): Regular analytics weekly train THIN [analytics/refinery@571f955] (duration: 00m 05s)
  • 09:25 btullis@deploy1002: Started deploy [analytics/refinery@571f955] (thin): Regular analytics weekly train THIN [analytics/refinery@571f955]
  • 09:24 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 09:20 btullis@deploy1002: Finished deploy [analytics/refinery@571f955]: Regular analytics weekly train [analytics/refinery@571f955] (duration: 00m 46s)
  • 09:19 btullis@deploy1002: Started deploy [analytics/refinery@571f955]: Regular analytics weekly train [analytics/refinery@571f955]
  • 09:05 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2139.codfw.wmnet with reason: T335396
  • 09:05 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2139.codfw.wmnet with reason: T335396
  • 08:53 moritzm: installing golang-1.11 security updates
  • 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host deploy2002.codfw.wmnet
  • 08:52 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32934
  • 08:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host deploy2002.codfw.wmnet
  • 08:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32934
  • 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 08:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 08:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:17 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:17 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:12 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:00 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:41 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:39 Emperor: start to load new swift backends, drain old ones T335278 T335279 T335280 T335281
  • 07:39 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:35 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync
  • 07:34 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: sync
  • 07:33 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
  • 07:33 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
  • 07:32 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: sync
  • 07:32 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: sync
  • 07:28 taavi@deploy1002: Finished scap: Backport for Beta-Wikidata: Enable Labels in Wikidata edit summaries (T327062) (duration: 07m 48s)
  • 07:22 taavi@deploy1002: taavi and migr: Backport for Beta-Wikidata: Enable Labels in Wikidata edit summaries (T327062) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 07:20 taavi@deploy1002: Started scap: Backport for Beta-Wikidata: Enable Labels in Wikidata edit summaries (T327062)
  • 07:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 38082
  • 07:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 38082
  • 07:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9584
  • 07:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9584
  • 07:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4826
  • 07:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4826
  • 07:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 55818
  • 07:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 55818
  • 07:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 49544
  • 07:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 06:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 49544
  • 06:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 133840
  • 06:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 133840
  • 06:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4826
  • 06:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4826
  • 06:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18106
  • 06:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 18106
  • 06:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7552
  • 06:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 7552
  • 06:49 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45796
  • 06:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45796
  • 06:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 140407
  • 06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 140407
  • 06:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1828
  • 06:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1828
  • 06:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38082
  • 06:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38082
  • 06:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4657
  • 06:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4657
  • 06:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1239
  • 06:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1239
  • 06:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36351
  • 06:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36351
  • 06:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 17676
  • 06:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 17676
  • 06:37 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45498
  • 06:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45498
  • 06:37 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 134823
  • 06:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 134823
  • 06:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9583
  • 06:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9583
  • 06:35 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 24482
  • 06:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 24482
  • 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 137831
  • 06:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 137831
  • 06:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9002
  • 06:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9002
  • 06:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 23951
  • 06:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 23951
  • 06:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9299
  • 06:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9299
  • 06:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8529
  • 06:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8529
  • 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38040
  • 06:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38040
  • 06:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4651
  • 06:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4651
  • 06:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 132132
  • 06:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 132132
  • 06:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58552
  • 06:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58552
  • 06:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 23947
  • 06:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 23947
  • 06:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 17961
  • 06:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 17961
  • 06:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 54994
  • 06:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 54994
  • 06:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55818
  • 06:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55818
  • 06:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9009
  • 06:17 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9009
  • 06:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4773
  • 06:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4773
  • 06:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 133840
  • 06:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 133840
  • 06:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 140951
  • 06:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 140951
  • 06:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4761
  • 06:14 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4761
  • 06:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 49544
  • 06:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 49544
  • 06:12 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 6939
  • 06:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6939
  • 06:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 136907
  • 06:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 136907
  • 06:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4775
  • 06:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4775
  • 06:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199524
  • 06:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 199524
  • 06:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 23824
  • 06:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 23824
  • 06:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18403
  • 06:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 18403
  • 06:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 136106
  • 06:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 136106
  • 06:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35280
  • 06:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35280
  • 06:03 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10089
  • 06:02 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 10089
  • 06:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 906
  • 06:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 906
  • 06:01 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9584
  • 06:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9584
  • 06:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 139836
  • 05:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 139836
  • 05:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10030
  • 05:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 10030
  • 05:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38158
  • 05:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38158
  • 05:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 63199
  • 05:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 63199
  • 05:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 131285
  • 05:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 131285
  • 05:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2518
  • 05:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2518
  • 05:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55967
  • 05:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55967
  • 05:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2519
  • 05:51 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2519
  • 05:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45430
  • 05:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45430
  • 05:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 703
  • 05:46 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 703
  • 05:45 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 703
  • 05:44 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 703
  • 05:44 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 703
  • 05:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 703
  • 05:33 XioNoX: bounce SGIX RS BGP - T327284
  • 05:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 59369
  • 05:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 59369
  • 05:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 59360
  • 05:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 59360
  • 05:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 59360
  • 05:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 59360
  • 04:55 eileen: civicrm upgraded from 2bc9f372 to 601d223e

2023-04-25

  • 21:40 mutante: gerrit1003 - chown -R gerrit2:gerrit2 /var/lib/gerrit2/review_site/ - T326368
  • 21:19 mutante: gerrit1003 - chown -R gerrit2:gerrit2 /srv/gerrit T333143 T326368
  • 21:17 mutante: gerrit1003 - mv /srv/gerrit/plugins/lfs /srv/gerrit/data/ T333143 T326368
  • 21:14 mutante: gerrit1003 - manually replacing deploy2002 with deploy1002 in /srv/deployment/gerrit/gerrit-cache/.config to fix initial scap deployment T257317 T326368
  • 21:12 mutante: once again running into T257317 when applying gerrit role to new hardware
  • 21:06 mutante: adding production gerrit role to new machine gerrit1003 - monitoring downtimed - but it has a service IP that is going to be added by this and cant be downtimed ? (Bug: T326368)
  • 21:04 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on gerrit1003.wikimedia.org with reason: setup
  • 21:04 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on gerrit1003.wikimedia.org with reason: setup
  • 19:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 19:48 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on wdqs2012.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 19:48 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2006.codfw.wmnet
  • 19:48 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2006.codfw.wmnet
  • 19:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wdqs2009.codfw.wmnet
  • 19:46 bking@cumin1001: START - Cookbook sre.hosts.remove-downtime for wdqs2009.codfw.wmnet
  • 19:46 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on wdqs2006.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 19:46 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on wdqs2006.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 19:23 inflatador: bking@cumin1001 finishing WDQS deploy...restarting `wdqs-categories` across lvs-managed hosts
  • 18:57 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: 0.3.123 (duration: 17m 29s)
  • 18:39 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: 0.3.123
  • 18:18 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.6 refs T330212
  • 16:55 ejegg: payments-wiki upgraded from 2a4c450d to f30bc859
  • 15:39 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:39 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:34 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:33 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:32 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:31 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:31 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:31 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:30 dancy@deploy1002: Installation of scap version "4.50.0" completed for 1 hosts
  • 15:30 dancy@deploy1002: Installing scap version "4.50.0" for 1 hosts
  • 15:28 XioNoX: update cr2-eqsin BBIX interface
  • 15:27 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 15:27 btullis@cumin1001: Added views for new wiki: azwikimedia T330442
  • 15:25 dancy@deploy1002: Installing scap version "4.50.0" for 592 hosts
  • 15:24 cgoubert@cumin2002: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in eqiad: T335015
  • 15:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on irc2002.wikimedia.org with reason: Non-functional, WIP for Bullseye update
  • 15:22 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on irc2002.wikimedia.org with reason: Non-functional, WIP for Bullseye update
  • 15:22 claime: Datacenter Service Switchback concluded - T335015
  • 15:21 cgoubert@deploy1002: Synchronized README: check the deployment server after switchback - T335015 (duration: 19m 55s)
  • 15:19 cgoubert@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
  • 15:19 cgoubert@cumin2002: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
  • 15:19 cgoubert@cumin2002: START - Cookbook sre.discovery.service-route depool restbase-async in eqiad: T335015
  • 15:19 claime: Restoring restbase-async to codfw only - T335015
  • 15:18 cgoubert@deploy1002: Finished deploy [restbase/deploy@a08f56d]: (no justification provided) (duration: 13m 06s)
  • 15:08 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS bullseye
  • 15:05 cgoubert@deploy1002: Started deploy [restbase/deploy@a08f56d]: (no justification provided)
  • 15:02 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 15:02 inflatador: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 15:02 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 15:02 btullis@cumin1001: Added views for new wiki: vewikimedia T330704
  • 15:01 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 15:01 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 15:01 btullis@cumin1001: Added views for new wiki: ckbwiktionary T331834
  • 15:01 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 15:01 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 15:00 btullis@cumin1001: Added views for new wiki: fatwiki T335018
  • 15:00 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 15:00 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 15:00 btullis@cumin1001: Added views for new wiki: kcgwiktionary T334739
  • 15:00 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 14:59 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 14:59 btullis@cumin1001: Added views for new wiki: guwwikinews T334408
  • 14:59 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 14:58 bking@deploy1002: Finished deploy [wdqs/wdqs@0e051d8]: 0.3.123 (duration: 07m 38s)
  • 14:54 cgoubert@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Service Switchback - T335015 (duration: 81m 19s)
  • 14:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
  • 14:50 bking@deploy1002: Started deploy [wdqs/wdqs@0e051d8]: 0.3.123
  • 14:48 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
  • 14:45 claime: Running authdns-update - T335015
  • 14:45 inflatador: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.123`. Pre-deploy tests passing on canary `wdqs1003`
  • 14:44 claime: Switch deployment server back to eqiad - T335015
  • 14:43 claime: All active/active services repooled in codfw - T335015
  • 14:43 cgoubert@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in codfw: Datacenter Services Switchback - T335015
  • 14:36 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS bullseye
  • 14:35 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS bullseye
  • 14:26 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: Datacenter Services Switchback - T335015
  • 14:26 claime: All services pooled in eqiad, all depooled in codfw, proceeding with repooling active/active services in codfw - T335015
  • 14:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
  • 14:25 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
  • 14:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in codfw: Datacenter Services Switchback - T335015
  • 14:19 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in codfw: Datacenter Services Switchback - T335015
  • 14:19 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
  • 14:18 cgoubert@cumin1001: END (ERROR) - Cookbook sre.discovery.datacenter (exit_code=93) depool all services in codfw: Datacenter Services Switchback - T335015
  • 14:16 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
  • 14:04 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in codfw: Datacenter Services Switchback - T335015
  • 14:04 cgoubert@cumin1001: END (ERROR) - Cookbook sre.discovery.datacenter (exit_code=93) depool all services in codfw: Datacenter Services Switchback - T335015
  • 14:02 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS bullseye
  • 14:01 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter depool all services in codfw: Datacenter Services Switchback - T335015
  • 14:00 claime: Starting Datacenter Services Switchback - T335015
  • 13:53 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-worker1002.eqiad.wmnet
  • 13:47 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-test-worker1002.eqiad.wmnet
  • 13:33 cgoubert@deploy2002: Locking from deployment [ALL REPOSITORIES]: Datacenter Service Switchback - T335015
  • 13:30 inflatador: bking@cumin1001 transfer.py wdqs2009.codfw.wmnet:/srv/wdqs wdqs2022.codfw.wmnet:/srv/wdqs
  • 13:26 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wdqs2009.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 13:26 bking@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wdqs2009.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 13:06 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 13:05 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 13:05 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 13:05 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 13:04 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 13:03 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 13:03 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 13:02 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 11:44 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-thanos-fe (exit_code=0) rolling restart_daemons on A:thanos-fe
  • 11:40 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-thanos-fe rolling restart_daemons on A:thanos-fe
  • 10:58 cgoubert@cumin1001: conftool action : set/weight=20; selector: name=mw2394.codfw.wmnet
  • 10:57 cgoubert@cumin1001: conftool action : set/weight=20; selector: name=mw2395.codfw.wmnet
  • 10:57 cgoubert@cumin1001: conftool action : set/weight=20; selector: name=mw2410.codfw.wmnet
  • 10:56 cgoubert@cumin1001: conftool action : set/weight=20; selector: name=mw2411.codfw.wmnet
  • 10:52 cgoubert@cumin1001: conftool action : set/weight=25; selector: dc=codfw,cluster=videoscaler,service=canary
  • 10:52 cgoubert@cumin1001: conftool action : set/weight=25; selector: dc=codfw,cluster=jobrunner,service=canary
  • 10:21 moritzm: installing libxml2 security updates on bullseye
  • 09:34 moritzm: upgrade php-excimer on remaining mediawiki hosts to 1.0.2-1+wmf3+buster1 (which rebases Excimer to 1.1.1) T332964
  • 08:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1003.eqiad.wmnet
  • 08:43 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
  • 07:53 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 06:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46887
  • 06:12 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46887
  • 06:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4557
  • 06:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 4557
  • 04:08 ejegg: re-enabled fundraising scheduled jobs
  • 04:07 ejegg: civicrm upgraded from fa5265bf to 2bc9f372
  • 03:55 ejegg: civicrm upgraded from 14644f30 to fa5265bf
  • 03:52 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.4 (duration: 02m 06s)
  • 03:50 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.6 refs T330212 (duration: 48m 05s)
  • 03:16 eileen: config revision changed from 554bb874 to d1462a30
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.6 refs T330212

2023-04-24

  • 23:15 eileen: civicrm upgraded from c17c8db2 to 26150ed4
  • 22:00 eileen: civicrm upgraded from 3466c2d3 to c17c8db2
  • 20:53 cjming: end of UTC late backport window
  • 20:52 cjming@deploy2002: Finished scap: Backport for Fix InvalidCharacterError: Failed to execute 'add' on 'DOMTokenList' (T335149) (duration: 11m 25s)
  • 20:42 cjming@deploy2002: cjming and nray: Backport for Fix InvalidCharacterError: Failed to execute 'add' on 'DOMTokenList' (T335149) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:42 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@6e76561]: (no justification provided) (duration: 00m 23s)
  • 20:41 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@6e76561]: (no justification provided)
  • 20:41 cjming@deploy2002: Started scap: Backport for Fix InvalidCharacterError: Failed to execute 'add' on 'DOMTokenList' (T335149)
  • 20:38 cjming@deploy2002: Finished scap: Backport for [fywiki] Add portal and portal talk namespace (T334807) (duration: 07m 26s)
  • 20:32 cjming@deploy2002: cjming and superpes: Backport for [fywiki] Add portal and portal talk namespace (T334807) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:30 cjming@deploy2002: Started scap: Backport for [fywiki] Add portal and portal talk namespace (T334807)
  • 20:28 cjming@deploy2002: Finished scap: Backport for [guwwikinews] Add a HD logo for vector legacy (T335162) (duration: 07m 22s)
  • 20:22 cjming@deploy2002: superpes and cjming: Backport for [guwwikinews] Add a HD logo for vector legacy (T335162) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:21 cjming@deploy2002: Started scap: Backport for [guwwikinews] Add a HD logo for vector legacy (T335162)
  • 20:19 cjming@deploy2002: Finished scap: Backport for [kcgwiktionary] Add a HD logo for vector legacy (T335162) (duration: 07m 51s)
  • 20:13 cjming@deploy2002: superpes and cjming: Backport for [kcgwiktionary] Add a HD logo for vector legacy (T335162) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:11 cjming@deploy2002: Started scap: Backport for [kcgwiktionary] Add a HD logo for vector legacy (T335162)
  • 19:45 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on aphlict1001.eqiad.wmnet with reason: aphlict1002 is now active for testing
  • 19:42 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on aphlict1001.eqiad.wmnet with reason: aphlict1002 is now active for testing
  • 19:29 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aphlict.discovery.wmnet on all recursors
  • 19:29 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache aphlict.discovery.wmnet on all recursors
  • 18:44 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 17:51 wfan: payments-wiki upgraded from a6288840 to 2a4c450d
  • 17:43 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp5013.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:36 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp5013.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:35 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp5015.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:32 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp5015.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:26 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp5014.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:20 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp5014.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:19 robh@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp5013.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:04 robh@cumin1001: START - Cookbook sre.hosts.provision for host cp5013.mgmt.eqsin.wmnet with reboot policy FORCED
  • 17:03 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:39 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp5016
  • 16:37 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp5016
  • 16:37 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp5015
  • 16:36 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp5015
  • 16:36 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp5014
  • 16:35 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp5014
  • 16:35 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp5013
  • 16:34 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp5013
  • 15:48 ejegg: payments-wiki upgraded from 25d867dc to a6288840
  • 15:14 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:14 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: old cp server work - robh@cumin1001"
  • 15:11 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: old cp server work - robh@cumin1001"
  • 15:09 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 15:09 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:09 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: old cp server work - robh@cumin1001"
  • 15:08 vgutierrez: restarting haproxy on cp3064 - T334448
  • 15:07 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: old cp server work - robh@cumin1001"
  • 15:05 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 14:59 eoghan@cumin1001: END (PASS) - Cookbook sre.gitlab.failover (exit_code=0) Failover of gitlab from gitlab1003.wikimedia.org to gitlab1004.wikimedia.org
  • 14:58 inflatador: bking@wdqs1015 repool wdqs1015 as lag is back down
  • 14:56 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
  • 14:56 eoghan@cumin1001: START - Cookbook sre.dns.wipe-cache https://gitlab-replica.wikimedia.org/ https://gitlab-replica-old.wikimedia.org/ on all recursors
  • 14:47 mutante: DNS - new project language "btm" added - Mandailing language is spoken in Indonesia - https://en.wikipedia.org/wiki/Mandailing_language
  • 14:31 herron: re-enabled icinga meta monitoring on wikitech-static T333837
  • 14:07 herron: disabled icinga meta monitoring on wikitech-static T333837
  • 14:07 herron: beginning alert host failover from alert2001 to alert1001 T333837
  • 13:40 dcausse: repooling wdqs1005
  • 13:32 claime: Deployed push-notifications production for switch to mw-api-int - T334061
  • 13:32 moritzm: installing libxml2 security updates on bullseye
  • 13:27 urbanecm@deploy2002: Finished scap: Backport for Update InterwikiSortOrders (T335019) (duration: 06m 59s)
  • 13:24 eoghan@cumin1001: START - Cookbook sre.gitlab.failover Failover of gitlab from gitlab1003.wikimedia.org to gitlab1004.wikimedia.org
  • 13:24 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 13:23 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 13:20 urbanecm@deploy2002: Started scap: Backport for Update InterwikiSortOrders (T335019)
  • 13:15 urbanecm@deploy2002: Finished scap: Backport for Disable wmgNewUserMessageOnAutoCreate from Extension:NewUserMessage on knwikisource (T335090) (duration: 11m 02s)
  • 13:14 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 13:14 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 13:13 claime: Deploying push-notifications production for switch to mw-api-int - T334061
  • 13:05 urbanecm@deploy2002: urbanecm and anzx: Backport for Disable wmgNewUserMessageOnAutoCreate from Extension:NewUserMessage on knwikisource (T335090) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:04 urbanecm@deploy2002: Started scap: Backport for Disable wmgNewUserMessageOnAutoCreate from Extension:NewUserMessage on knwikisource (T335090)
  • 12:56 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 12:29 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 12:28 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 12:28 claime: Deploying push-notifications staging for switch to mw-api-int - T334061
  • 11:23 cgoubert@cumin1001: conftool action : set/weight=30; selector: dc=codfw,cluster=api_appserver,service=canary
  • 11:21 cgoubert@cumin1001: conftool action : set/weight=25; selector: dc=codfw,cluster=appserver,service=canary
  • 11:19 cgoubert@cumin1001: conftool action : set/weight=30; selector: dc=eqiad,cluster=appserver,service=canary
  • 11:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:17 cgoubert@cumin1001: START - Cookbook sre.dns.netbox
  • 11:14 cgoubert@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=parsoid,service=canary
  • 11:13 cgoubert@cumin1001: conftool action : set/weight=10; selector: dc=eqiad,cluster=parsoid,service=canary
  • 11:13 claime: Fixing appserver clusters canary weights
  • 10:56 jynus: deployed new ssh key for jcrespo on production cluster
  • 10:29 claime: Datacenter switchover live testing setting db to read-only and back in eqiad successful - T327920
  • 10:29 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite (exit_code=0)
  • 10:29 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.06-set-db-readwrite
  • 10:29 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.03-set-db-readonly (exit_code=0)
  • 10:29 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.03-set-db-readonly
  • 10:27 claime: Datacenter switchover live testing setting db to read-only and back in eqiad - T327920
  • 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ilooremeta out of all services on: 801 hosts
  • 10:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ilooremeta out of all services on: 801 hosts
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Ilooremeta out of all services on: 1262 hosts
  • 10:22 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Ilooremeta out of all services on: 1262 hosts
  • 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hghani out of all services on: 1262 hosts
  • 10:20 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hghani out of all services on: 1262 hosts
  • 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hghani out of all services on: 801 hosts
  • 10:18 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hghani out of all services on: 801 hosts
  • 10:17 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hibashaath out of all services on: 801 hosts
  • 10:17 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hibashaath out of all services on: 801 hosts
  • 10:16 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hibashaath out of all services on: 1262 hosts
  • 10:14 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Hibashaath out of all services on: 1262 hosts
  • 10:11 marostegui: Enable replication eqiad -> codfw on s1 dbmaint eqiad T335266
  • 10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 38 hosts with reason: Enabling replication T335266
  • 10:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 38 hosts with reason: Enabling replication T335266
  • 10:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 35 hosts with reason: Enabling replication T335266
  • 10:08 marostegui: Enable replication eqiad -> codfw on s4 dbmaint eqiad T335266
  • 10:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 35 hosts with reason: Enabling replication T335266
  • 10:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 24 hosts with reason: Enabling replication T335266
  • 10:06 marostegui: Enable replication eqiad -> codfw on s3 dbmaint eqiad T335266
  • 10:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 24 hosts with reason: Enabling replication T335266
  • 10:01 moritzm: installing git security updates
  • 09:55 slyngs: Update LDAP schema wmf-user: T148048
  • 09:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 28 hosts with reason: Enabling replication T335266
  • 09:55 marostegui: Enable replication eqiad -> codfw on s7 dbmaint eqiad T335266
  • 09:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 28 hosts with reason: Enabling replication T335266
  • 09:25 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-worker1110.eqiad.wmnet
  • 09:21 moritzm: upgrade php-excimer on mw canaries to 1.0.2-1+wmf3+buster1 (which rebases Excimer to 1.1.1) T332964
  • 08:45 moritzm: uploaded php-excimer 1.0.2-1+wmf3+buster1 (which rebases Excimer to 1.1.1) to component/php74 for buster-wikimedia T332964
  • 08:44 marostegui: Enable replication eqiad -> codfw on s8 dbmaint eqiad T335266
  • 08:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 34 hosts with reason: Enabling replication T335266
  • 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 34 hosts with reason: Enabling replication T335266
  • 08:33 marostegui: Enable replication eqiad -> codfw on s5 dbmaint eqiad T335266
  • 08:32 cgoubert@deploy2002: Finished scap: testing T329857 (duration: 14m 29s)
  • 08:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 26 hosts with reason: Enabling replication T335266
  • 08:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 26 hosts with reason: Enabling replication T335266
  • 08:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Enabling replication T335266
  • 08:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Enabling replication T335266
  • 08:28 marostegui: Enable replication eqiad -> codfw on s6 dbmaint eqiad T335266
  • 08:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 27 hosts with reason: Enabling replication T335266
  • 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 27 hosts with reason: Enabling replication T335266
  • 08:26 marostegui: Enable replication eqiad -> codfw on s2 dbmaint eqiad T335266
  • 08:25 btullis@cumin1001: START - Cookbook sre.hosts.dhcp for host an-worker1110.eqiad.wmnet
  • 08:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware
  • 08:21 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware
  • 08:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 10 hosts with reason: Enabling replication T335266
  • 08:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 10 hosts with reason: Enabling replication T335266
  • 08:20 marostegui: Enable replication eqiad -> codfw on x1 dbmaint eqiad T335266
  • 08:18 cgoubert@deploy2002: Started scap: testing T329857
  • 08:17 marostegui: Enable replication eqiad -> codfw on es5 dbmaint eqiad T335266
  • 08:14 claime: Deploying 909302 on deploy2002 for T329857
  • 08:10 claime: Disabling puppet on deploy2002 - T329857
  • 08:09 claime: Deploying 909302 on deploy1002 for T329857
  • 08:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on 6 hosts with reason: Enabling replication T335266
  • 08:08 marostegui: Enable replication eqiad -> codfw on es4 dbmaint eqiad T335266
  • 08:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on 6 hosts with reason: Enabling replication T335266
  • 08:07 marostegui: Enable replication eqiad -> codfw on pc3 dbmaint eqiad T335266
  • 08:06 marostegui: Enable replication eqiad -> codfw on pc2 dbmaint eqiad T335266
  • 08:05 marostegui: Enable replication eqiad -> codfw on pc1 dbmaint eqiad T335266
  • 07:53 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.41 in codfw
  • 07:51 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.41 in codfw
  • 07:45 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1004.wikimedia.org with OS bullseye
  • 07:44 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.59 in codfw
  • 07:42 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.59 in codfw
  • 07:39 dcausse: restarting blazegraph on wdqs1005 (stuck for 3+days)
  • 07:38 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.4a in codfw
  • 07:36 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.4a in codfw
  • 07:24 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
  • 07:21 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1004.wikimedia.org with reason: host reimage
  • 07:06 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab1004.wikimedia.org with OS bullseye

2023-04-22

  • 05:41 joe: <thumbor/codfw>$ helmfile --state-values-set roll_restart=1 -e codfw sync
  • 05:40 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 05:39 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 05:39 oblivian@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 05:39 oblivian@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 05:15 hashar@deploy2002: Finished deploy [integration/docroot@b816911]: Update Grafana URL (duration: 00m 11s)
  • 05:15 hashar@deploy2002: Started deploy [integration/docroot@b816911]: Update Grafana URL
  • 05:10 joe: sudo cumin -b 1 -s 20 'A:swift-fe-codfw' 'systemctl restart swift-proxy.service'
  • 04:33 vgutierrez: restart haproxy on cp1087 - T334448

2023-04-21

  • 18:27 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.remove-ghost-objects (exit_code=99) from container wikipedia-en-local-public.a8 in codfw
  • 18:25 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.a8 in codfw
  • 15:57 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Set wmgUseGraphWithJsonNamespace = true for mediawikiwiki (T124748 T335130) (duration: 10m 01s)
  • 15:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Set wmgUseGraphWithJsonNamespace = true for mediawikiwiki (T124748 T335130) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:47 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Set wmgUseGraphWithJsonNamespace = true for mediawikiwiki (T124748 T335130)
  • 12:18 duesen: reverted monky-patch, mwdebug2001 and deploy2002 are back to wmf/1.41.0-wmf.5 (T335183)
  • 11:56 duesen: monky-patching Ib11a871ff on mwdebug2001 to investigate T335183
  • 09:03 Amir1: finish of the wikibase populate sites table
  • 08:35 Amir1: start of foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https
  • 03:19 eileen: civicrm upgraded from 5b63c2b2 to 0fad720a
  • 03:11 eileen: civicrm upgraded from a2e7c079 to 5b63c2b2
  • 01:41 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2011.codfw.wmnet with OS bullseye
  • 01:41 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:39 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:22 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2011.codfw.wmnet with reason: host reimage
  • 01:19 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2011.codfw.wmnet with reason: host reimage
  • 00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2010.codfw.wmnet with OS bullseye
  • 00:37 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:35 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2010.codfw.wmnet with reason: host reimage
  • 00:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2010.codfw.wmnet with reason: host reimage
  • 00:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host backup2011.codfw.wmnet with OS bullseye

2023-04-20

  • 22:48 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup2011']
  • 22:18 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2011']
  • 22:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup2011']
  • 22:17 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2011']
  • 21:47 zabe@deploy2002: Finished scap: Backport for Update interwiki cache (duration: 06m 26s)
  • 21:42 zabe@deploy2002: zabe: Backport for Update interwiki cache synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 21:41 zabe@deploy2002: Started scap: Backport for Update interwiki cache
  • 21:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 21:35 zabe@deploy2002: Finished scap: T334394 (duration: 07m 46s)
  • 21:28 zabe@deploy2002: zabe: T334394 synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 21:28 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 21:27 zabe@deploy2002: Started scap: T334394
  • 21:26 zabe: create Wikinews Gungbe # T334394
  • 21:22 inflatador: bking@cumin1001 repool wdqs2012 T331300
  • 21:19 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:19 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:18 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:18 inflatador: bking@cumin1001 depool wdqs2009 for data xfer T331300
  • 21:03 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 21:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2011.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:57 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 20:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 20:47 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 20:36 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 20:33 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
  • 20:31 thcipriani@deploy2002: Finished scap: Backport for Fix TypeError: trigger.attr is not a function (T335148) (duration: 09m 53s)
  • 20:22 thcipriani@deploy2002: nray and thcipriani: Backport for Fix TypeError: trigger.attr is not a function (T335148) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 20:21 thcipriani@deploy2002: Started scap: Backport for Fix TypeError: trigger.attr is not a function (T335148)
  • 19:58 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:57 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:54 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:47 zabe@deploy2002: Finished scap: Backport for Update interwiki cache (duration: 06m 47s)
  • 19:41 zabe@deploy2002: zabe: Backport for Update interwiki cache synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 19:40 zabe@deploy2002: Started scap: Backport for Update interwiki cache
  • 19:34 zabe@deploy2002: Finished scap: T333266 (duration: 07m 04s)
  • 19:29 zabe@deploy2002: zabe: T333266 synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 19:28 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:27 zabe@deploy2002: Started scap: T333266
  • 19:27 zabe: create Wiktionary Kabardian # T333266
  • 19:16 inflatador: bking@cumin1001 depool wdqs2012.codfw.wmnet for data xfer T331300
  • 19:16 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:15 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:13 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:58 zabe@deploy2002: Finished scap: Backport for Disable VE as default editor on kcgwiktionary (T334730), db-production: Fix indentation, Update interwiki cache (duration: 07m 06s)
  • 18:53 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host backup2011.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:52 zabe@deploy2002: zabe: Backport for Disable VE as default editor on kcgwiktionary (T334730), db-production: Fix indentation, Update interwiki cache synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add backup2011 DNS entries - pt1979@cumin2002"
  • 18:51 zabe@deploy2002: Started scap: Backport for Disable VE as default editor on kcgwiktionary (T334730), db-production: Fix indentation, Update interwiki cache
  • 18:50 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab1003.wikimedia.org with OS bullseye
  • 18:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add backup2011 DNS entries - pt1979@cumin2002"
  • 18:50 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host backup2010.codfw.wmnet with OS bullseye
  • 18:47 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:36 zabe@deploy2002: Finished scap: T335016 (duration: 07m 28s)
  • 18:30 zabe@deploy2002: zabe: T335016 synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 18:29 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
  • 18:29 zabe@deploy2002: Started scap: T335016
  • 18:29 zabe: create Wikipedia Fante # T335016
  • 18:26 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab1003.wikimedia.org with reason: host reimage
  • 18:17 zabe@deploy2002: Finished scap: Backport for Add messages for Fante Wikipedia (fatwiki) (T335016), Localisation updates from https://translatewiki.net., Localisation updates from https://translatewiki.net. (duration: 23m 58s)
  • 18:10 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab1003.wikimedia.org with OS bullseye
  • 18:05 zabe@deploy2002: zabe: Backport for Add messages for Fante Wikipedia (fatwiki) (T335016), Localisation updates from https://translatewiki.net., Localisation updates from https://translatewiki.net. synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 18:01 sukhe: enable puppet and run agent in A:lvs and A:eqiad CR 910563
  • 18:00 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast2003.wikimedia.org with OS bullseye
  • 18:00 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:59 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:54 sukhe: disable puppet in A:lvs and A:eqiad to test CR 910563
  • 17:53 zabe@deploy2002: Started scap: Backport for Add messages for Fante Wikipedia (fatwiki) (T335016), Localisation updates from https://translatewiki.net., Localisation updates from https://translatewiki.net.
  • 17:48 zabe@deploy2002: Finished scap: create kcgwiktionary (T334730) (duration: 08m 08s)
  • 17:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast2003.wikimedia.org with reason: host reimage
  • 17:41 zabe@deploy2002: zabe: create kcgwiktionary (T334730) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 17:39 zabe@deploy2002: Started scap: create kcgwiktionary (T334730)
  • 17:39 zabe: create Wiktionary Tyap # T334730
  • 17:39 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast2003.wikimedia.org with reason: host reimage
  • 17:24 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host bast2003.wikimedia.org with OS bullseye
  • 17:02 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs10[10,13,16,19].eqiad.wmnet: Testing rolling restart (rack1) — T334754 - eevans@cumin1001
  • 16:31 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs10[10,13,16,19].eqiad.wmnet: Testing rolling restart (rack1) — T334754 - eevans@cumin1001
  • 16:25 SandraEbele: Deployed refinery using scap, then deployed onto hdfs as part of weekly deployment train.
  • 16:23 claime: repooling parse2010 after fix - T335138
  • 16:22 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse2010.codfw.wmnet
  • 16:22 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse2010.codfw.wmnet
  • 16:20 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host an-airflow1006.eqiad.wmnet with OS buster
  • 16:16 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['bast2003']
  • 16:16 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['bast2003']
  • 16:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['bast2003']
  • 16:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['bast2003']
  • 16:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['bast2003']
  • 16:15 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['bast2003']
  • 16:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host bast2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:08 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setting sretest2001 back to offine - pt1979@cumin2002"
  • 16:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: setting sretest2001 back to offine - pt1979@cumin2002"
  • 16:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-airflow1006.eqiad.wmnet with reason: host reimage
  • 16:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:01 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-airflow1006.eqiad.wmnet with reason: host reimage
  • 15:59 ebysans@deploy2002: Finished deploy [analytics/refinery@1631dea] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1631dea] (duration: 01m 29s)
  • 15:58 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host bast2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:58 ebysans@deploy2002: Started deploy [analytics/refinery@1631dea] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@1631dea]
  • 15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add bast2003 DNS entries - pt1979@cumin2002"
  • 15:56 ebysans@deploy2002: Finished deploy [analytics/refinery@1631dea] (thin): Regular analytics weekly train THIN [analytics/refinery@1631dea] (duration: 00m 08s)
  • 15:56 ebysans@deploy2002: Started deploy [analytics/refinery@1631dea] (thin): Regular analytics weekly train THIN [analytics/refinery@1631dea]
  • 15:55 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add bast2003 DNS entries - pt1979@cumin2002"
  • 15:54 ebysans@deploy2002: Finished deploy [analytics/refinery@1631dea]: Regular analytics weekly train [analytics/refinery@1631dea] (duration: 08m 30s)
  • 15:51 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2006.wikimedia.org with OS bullseye
  • 15:48 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:48 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-airflow1006.eqiad.wmnet with OS buster
  • 15:47 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:47 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:46 ebysans@deploy2002: Started deploy [analytics/refinery@1631dea]: Regular analytics weekly train [analytics/refinery@1631dea]
  • 15:44 SandraEbele: deploying weekly deployment train for analytics refinery.
  • 15:38 sukhe: sudo cumin -b1 -s1200 'A:cp and A:eqsin' 'varnish-frontend-restart'
  • 15:37 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-airflow1006.eqiad.wmnet
  • 15:37 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM an-airflow1006.eqiad.wmnet - stevemunene@cumin1001"
  • 15:36 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM an-airflow1006.eqiad.wmnet - stevemunene@cumin1001"
  • 15:33 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 15:32 bking@cumin1001: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on wdqs2022.codfw.wmnet with reason: attempting WDQS stack on bullseye
  • 15:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
  • 15:31 ejegg: payments-wiki upgraded from 66be66e0 to 744d82c6
  • 15:28 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2006.wikimedia.org with reason: host reimage
  • 15:27 sukhe: run puppet manually in A:cp and A:eqsin to pick up CR 910005
  • 15:26 sukhe: re-enable puppet in A:cp and A:eqsin
  • 15:23 sukhe: varnish-frontend-restart cp5022
  • 15:21 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 15:15 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dns2006.wikimedia.org with OS bullseye
  • 14:56 sukhe: disable puppet in A:cp and A:eqsin to test CR 910005
  • 14:50 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Make $wmgUseGraphWithJsonNamespace depend on $wmgUseJsonConfig (T335130) (duration: 07m 40s)
  • 14:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-airflow1006.eqiad.wmnet on all recursors
  • 14:49 stevemunene@cumin1001: START - Cookbook sre.dns.wipe-cache an-airflow1006.eqiad.wmnet on all recursors
  • 14:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:49 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-airflow1006.eqiad.wmnet - stevemunene@cumin1001"
  • 14:47 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-airflow1006.eqiad.wmnet - stevemunene@cumin1001"
  • 14:45 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 14:45 stevemunene@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-airflow1006.eqiad.wmnet
  • 14:43 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Make $wmgUseGraphWithJsonNamespace depend on $wmgUseJsonConfig (T335130) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 14:42 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Make $wmgUseGraphWithJsonNamespace depend on $wmgUseJsonConfig (T335130)
  • 14:39 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on parse2010.codfw.wmnet with reason: PSU failure
  • 14:39 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on parse2010.codfw.wmnet with reason: PSU failure
  • 14:33 claime: depooling parse2010 for PSU failure
  • 13:35 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.remove-ghost-objects (exit_code=99) from container wikipedia-en-local-public.a8 in codfw
  • 13:33 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.a8 in codfw
  • 12:44 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.remove-ghost-objects (exit_code=99) from container wikipedia-en-local-public.a8 in codfw
  • 12:42 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.a8 in codfw
  • 12:12 ladsgroup@deploy2002: Finished scap: Backport for Set wmgUseGraphWithJsonNamespace = false for mediawikiwiki (T124748) (duration: 07m 48s)
  • 12:05 ladsgroup@deploy2002: aklapper and ladsgroup: Backport for Set wmgUseGraphWithJsonNamespace = false for mediawikiwiki (T124748) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 12:04 ladsgroup@deploy2002: Started scap: Backport for Set wmgUseGraphWithJsonNamespace = false for mediawikiwiki (T124748)
  • 10:57 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:57 moritzm: installing openvswitch security updates on bullseye
  • 10:57 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:43 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.remove-ghost-objects (exit_code=99) from container wikipedia-en-local-public.a8 in codfw
  • 10:41 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.a8 in codfw
  • 09:43 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:42 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:42 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:40 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:40 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:35 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:35 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:06 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.18 in codfw
  • 09:04 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.18 in codfw
  • 08:57 isaranto@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:17 jnuche@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.5 refs T330211
  • 07:24 moritzm: uploaded imagemagick 8:6.9.10.23+dfsg-2.1+deb10u1+wmf1 to apt.wikimedia.org for buster-wikimedia T328901
  • 06:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 14593
  • 06:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 14593
  • 06:19 moritzm: installing tomcat9 security updates
  • 06:15 joe: enabled requestctl rule for T332061
  • 06:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
  • 06:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on krb2002.codfw.wmnet with reason: Non-functional, WIP for Bullseye update
  • 03:49 eileen: civicrm upgraded from efdf9434 to a2e7c079
  • 00:02 mutante: LDAP - adding uid fnavas-foundation to group wmf - T331482

2023-04-19

  • 23:36 zabe@deploy2002: Finished scap: gerrit:910078 (duration: 06m 40s)
  • 23:29 zabe@deploy2002: Started scap: gerrit:910078
  • 23:15 tzatziki: removing 1 file for legal compliance
  • 23:02 tzatziki: removing 3 files for legal compliance
  • 22:34 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs2022.codfw.wmnet with OS bullseye
  • 22:10 tzatziki: removing 5 files for legal compliance
  • 21:38 bking@cumin1001: START - Cookbook sre.hosts.reimage for host wdqs2022.codfw.wmnet with OS bullseye
  • 20:16 zabe@deploy2002: Finished scap: Backport for Revert "Revert "dewiki: Allow 'crats to remove sysopship and manage importers"" (T331921) (duration: 07m 26s)
  • 20:10 zabe@deploy2002: zabe: Backport for Revert "Revert "dewiki: Allow 'crats to remove sysopship and manage importers"" (T331921) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:09 zabe@deploy2002: Started scap: Backport for Revert "Revert "dewiki: Allow 'crats to remove sysopship and manage importers"" (T331921)
  • 19:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dns2006.wikimedia.org with OS bullseye
  • 19:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2005.wikimedia.org with OS bullseye
  • 19:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
  • 19:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
  • 19:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
  • 19:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dns2005.wikimedia.org with OS bullseye
  • 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:02 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dns2006.wikimedia.org with OS bullseye
  • 18:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2004.wikimedia.org with OS bullseye
  • 18:52 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:50 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
  • 18:36 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2005.wikimedia.org with reason: host reimage
  • 18:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
  • 18:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2004.wikimedia.org with reason: host reimage
  • 18:28 sukhe@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in eqiad, blocking deploys T321309 (duration: 286m 39s)
  • 18:25 sukhe: restart pybal on lvs1017 to pick up bgp-med change: T321309
  • 18:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dns2005.wikimedia.org with OS bullseye
  • 18:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1017.eqiad.wmnet with OS bullseye
  • 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage
  • 18:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1017.eqiad.wmnet with reason: host reimage
  • 18:00 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dns2004.wikimedia.org with OS bullseye
  • 17:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns2006']
  • 17:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns2006']
  • 17:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns2005']
  • 17:57 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns2005']
  • 17:57 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns2005']
  • 17:56 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns2005']
  • 17:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns2006']
  • 17:56 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dns2005']
  • 17:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns2006']
  • 17:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns2005']
  • 17:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns2004']
  • 17:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns2004']
  • 17:46 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1017.eqiad.wmnet with OS bullseye
  • 17:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns2006.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:21 sukhe: stop pybal in lvs1017 for reimaging
  • 17:14 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host dns2006.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns2005.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:05 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host dns2005.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:41 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host dns2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:39 sukhe: restart pybal on lvs1018 to remove bgp-med change: T321309
  • 16:39 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:35 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs1018
  • 16:35 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs1018
  • 16:23 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1018.eqiad.wmnet with OS bullseye
  • 16:17 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup2010.codfw.wmnet']
  • 16:09 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010.codfw.wmnet']
  • 16:09 jbond@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup2010.codfw.wmnet']
  • 16:09 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010.codfw.wmnet']
  • 16:06 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup2010.codfw.wmnet']
  • 16:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1018.eqiad.wmnet with reason: host reimage
  • 16:06 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010.codfw.wmnet']
  • 16:05 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup2010.codfw.wmnet']
  • 16:05 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010.codfw.wmnet']
  • 16:04 jbond@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup2010.codfw.wmnet']
  • 16:04 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010.codfw.wmnet']
  • 16:04 jbond@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['backup2010.codfw.wmnet']
  • 16:04 jbond@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010.codfw.wmnet']
  • 16:02 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1018.eqiad.wmnet with reason: host reimage
  • 15:49 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host dns2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:48 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1018.eqiad.wmnet with OS bullseye
  • 15:47 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs1018
  • 15:47 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs1018
  • 15:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:42 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host dns2004.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:36 mutante: DNS - added new project language "fat" (fat.wikipedia.org) - the "Fante" language, a dialect of Akan, spoken by 2.8 million people in Ghana - https://en.wikipedia.org/wiki/Fante_dialect T335016
  • 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for dns200[4-6] - pt1979@cumin2002"
  • 15:33 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for dns200[4-6] - pt1979@cumin2002"
  • 15:30 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:20 sukhe: stop pybal on lvs1018 for reimaging: T321309
  • 14:54 sukhe: restart pybal on lvs1019 to pick up bpg-med change
  • 14:42 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs1019
  • 14:42 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs1019
  • 14:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1019.eqiad.wmnet with OS bullseye
  • 14:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1019.eqiad.wmnet with reason: host reimage
  • 14:19 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1019.eqiad.wmnet with reason: host reimage
  • 14:04 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1019.eqiad.wmnet with OS bullseye
  • 13:41 sukhe@deploy2002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in eqiad, blocking deploys T321309
  • 13:41 sukhe@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: LVS reimaging in eqiad, blocking deploys T321309 (duration: 00m 16s)
  • 13:41 sukhe@deploy2002: Locking from deployment [ALL REPOSITORIES]: LVS reimaging in eqiad, blocking deploys T321309
  • 13:28 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.remove-ghost-objects (exit_code=99) from container wikipedia-en-local-public.a8 in codfw
  • 13:25 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.a8 in codfw
  • 13:16 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:16 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:14 taavi@deploy2002: Finished scap: Backport for cleanup: Remove duplicate permission config of confirmed users (duration: 11m 32s)
  • 13:09 moritzm: installing lldpd security updates
  • 13:04 taavi@deploy2002: func and taavi: Backport for cleanup: Remove duplicate permission config of confirmed users synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:02 taavi@deploy2002: Started scap: Backport for cleanup: Remove duplicate permission config of confirmed users
  • 11:18 hnowlan@puppetmaster1001: conftool action : set/weight=7; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 10:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 10:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 10:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 10:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 10:46 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-en-local-public.1a in codfw
  • 10:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 10:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 10:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 10:43 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.1a in codfw
  • 10:42 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-en-local-public.1a in eqiad
  • 10:40 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.1a in eqiad
  • 10:37 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.e4 in eqiad
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T333332)', diff saved to https://phabricator.wikimedia.org/P47260 and previous config saved to /var/cache/conftool/dbconfig/20230419-103603-ladsgroup.json
  • 10:34 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.e4 in eqiad
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P47259 and previous config saved to /var/cache/conftool/dbconfig/20230419-102057-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 10:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P47258 and previous config saved to /var/cache/conftool/dbconfig/20230419-101614-root.json
  • 10:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P47257 and previous config saved to /var/cache/conftool/dbconfig/20230419-100746-root.json
  • 10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P47256 and previous config saved to /var/cache/conftool/dbconfig/20230419-100550-ladsgroup.json
  • 10:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 10:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 10:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P47255 and previous config saved to /var/cache/conftool/dbconfig/20230419-100109-root.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P47254 and previous config saved to /var/cache/conftool/dbconfig/20230419-095807-root.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P47253 and previous config saved to /var/cache/conftool/dbconfig/20230419-095316-root.json
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P47252 and previous config saved to /var/cache/conftool/dbconfig/20230419-095241-root.json
  • 09:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T333332)', diff saved to https://phabricator.wikimedia.org/P47250 and previous config saved to /var/cache/conftool/dbconfig/20230419-095044-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T333332)', diff saved to https://phabricator.wikimedia.org/P47249 and previous config saved to /var/cache/conftool/dbconfig/20230419-094836-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P47248 and previous config saved to /var/cache/conftool/dbconfig/20230419-094604-root.json
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P47247 and previous config saved to /var/cache/conftool/dbconfig/20230419-094302-root.json
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P47246 and previous config saved to /var/cache/conftool/dbconfig/20230419-093812-root.json
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P47245 and previous config saved to /var/cache/conftool/dbconfig/20230419-093737-root.json
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P47244 and previous config saved to /var/cache/conftool/dbconfig/20230419-093059-root.json
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P47243 and previous config saved to /var/cache/conftool/dbconfig/20230419-092757-root.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P47242 and previous config saved to /var/cache/conftool/dbconfig/20230419-092307-root.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P47241 and previous config saved to /var/cache/conftool/dbconfig/20230419-092232-root.json
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P47240 and previous config saved to /var/cache/conftool/dbconfig/20230419-091554-root.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P47239 and previous config saved to /var/cache/conftool/dbconfig/20230419-091252-root.json
  • 09:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P47238 and previous config saved to /var/cache/conftool/dbconfig/20230419-090802-root.json
  • 09:07 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:07 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P47237 and previous config saved to /var/cache/conftool/dbconfig/20230419-090727-root.json
  • 09:07 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:07 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:03 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:01 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
  • 09:00 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P47236 and previous config saved to /var/cache/conftool/dbconfig/20230419-090050-root.json
  • 09:00 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync
  • 08:59 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: sync
  • 08:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 08:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P47235 and previous config saved to /var/cache/conftool/dbconfig/20230419-085748-root.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P47234 and previous config saved to /var/cache/conftool/dbconfig/20230419-085257-root.json
  • 08:52 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P47233 and previous config saved to /var/cache/conftool/dbconfig/20230419-085222-root.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P47232 and previous config saved to /var/cache/conftool/dbconfig/20230419-084545-root.json
  • 08:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 08:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 08:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P47231 and previous config saved to /var/cache/conftool/dbconfig/20230419-084243-root.json
  • 08:40 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 9%: Pooling', diff saved to https://phabricator.wikimedia.org/P47230 and previous config saved to /var/cache/conftool/dbconfig/20230419-083753-root.json
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P47229 and previous config saved to /var/cache/conftool/dbconfig/20230419-083717-root.json
  • 08:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:01:00 on db2185.codfw.wmnet,db[1115,1215].eqiad.wmnet with reason: Test
  • 08:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 0:01:00 on db2185.codfw.wmnet,db[1115,1215].eqiad.wmnet with reason: Test
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P47228 and previous config saved to /var/cache/conftool/dbconfig/20230419-083040-root.json
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P47227 and previous config saved to /var/cache/conftool/dbconfig/20230419-082738-root.json
  • 08:24 jnuche@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.5 refs T330211 (duration: 05m 43s)
  • 08:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P47226 and previous config saved to /var/cache/conftool/dbconfig/20230419-082345-root.json
  • 08:23 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 8%: Pooling', diff saved to https://phabricator.wikimedia.org/P47225 and previous config saved to /var/cache/conftool/dbconfig/20230419-082247-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P47224 and previous config saved to /var/cache/conftool/dbconfig/20230419-082213-root.json
  • 08:18 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.5 refs T330211
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P47223 and previous config saved to /var/cache/conftool/dbconfig/20230419-081535-root.json
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P47222 and previous config saved to /var/cache/conftool/dbconfig/20230419-080841-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 7%: Pooling', diff saved to https://phabricator.wikimedia.org/P47221 and previous config saved to /var/cache/conftool/dbconfig/20230419-080742-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P47220 and previous config saved to /var/cache/conftool/dbconfig/20230419-080708-root.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3316 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P47219 and previous config saved to /var/cache/conftool/dbconfig/20230419-080030-root.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P47218 and previous config saved to /var/cache/conftool/dbconfig/20230419-075336-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 6%: Pooling', diff saved to https://phabricator.wikimedia.org/P47217 and previous config saved to /var/cache/conftool/dbconfig/20230419-075237-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P47216 and previous config saved to /var/cache/conftool/dbconfig/20230419-075203-root.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P47215 and previous config saved to /var/cache/conftool/dbconfig/20230419-073831-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P47214 and previous config saved to /var/cache/conftool/dbconfig/20230419-073732-root.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P47213 and previous config saved to /var/cache/conftool/dbconfig/20230419-072326-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P47212 and previous config saved to /var/cache/conftool/dbconfig/20230419-072228-root.json
  • 07:15 XioNoX: update TLS cert on pfw - T334676
  • 07:13 kartik@deploy2002: Finished scap: Backport for Enable Content/Section translation on 6 Wikipedias (T327102) (duration: 09m 33s)
  • 07:10 XioNoX: push pfw policies - T334983
  • 07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T333332)', diff saved to https://phabricator.wikimedia.org/P47211 and previous config saved to /var/cache/conftool/dbconfig/20230419-070920-ladsgroup.json
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P47210 and previous config saved to /var/cache/conftool/dbconfig/20230419-070822-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P47209 and previous config saved to /var/cache/conftool/dbconfig/20230419-070723-root.json
  • 07:05 kartik@deploy2002: kartik: Backport for Enable Content/Section translation on 6 Wikipedias (T327102) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:03 kartik@deploy2002: Started scap: Backport for Enable Content/Section translation on 6 Wikipedias (T327102)
  • 06:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P47208 and previous config saved to /var/cache/conftool/dbconfig/20230419-065413-ladsgroup.json
  • 06:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P47207 and previous config saved to /var/cache/conftool/dbconfig/20230419-065317-root.json
  • 06:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P47206 and previous config saved to /var/cache/conftool/dbconfig/20230419-065218-root.json
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 T335011', diff saved to https://phabricator.wikimedia.org/P47205 and previous config saved to /var/cache/conftool/dbconfig/20230419-064122-root.json
  • 06:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P47204 and previous config saved to /var/cache/conftool/dbconfig/20230419-063907-ladsgroup.json
  • 06:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 06:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P47203 and previous config saved to /var/cache/conftool/dbconfig/20230419-063812-root.json
  • 06:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1219 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P47202 and previous config saved to /var/cache/conftool/dbconfig/20230419-063713-root.json
  • 06:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T333332)', diff saved to https://phabricator.wikimedia.org/P47201 and previous config saved to /var/cache/conftool/dbconfig/20230419-062401-ladsgroup.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P47200 and previous config saved to /var/cache/conftool/dbconfig/20230419-062307-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113 (s5,s6)', diff saved to https://phabricator.wikimedia.org/P47197 and previous config saved to /var/cache/conftool/dbconfig/20230419-062123-root.json
  • 06:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T333332)', diff saved to https://phabricator.wikimedia.org/P47196 and previous config saved to /var/cache/conftool/dbconfig/20230419-062007-ladsgroup.json
  • 06:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 06:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 06:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47195 and previous config saved to /var/cache/conftool/dbconfig/20230419-061944-ladsgroup.json
  • 06:14 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1219 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P47194 and previous config saved to /var/cache/conftool/dbconfig/20230419-061414-marostegui.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1212 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P47193 and previous config saved to /var/cache/conftool/dbconfig/20230419-060803-root.json
  • 06:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P47192 and previous config saved to /var/cache/conftool/dbconfig/20230419-060437-ladsgroup.json
  • 05:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P47191 and previous config saved to /var/cache/conftool/dbconfig/20230419-054931-ladsgroup.json
  • 05:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47190 and previous config saved to /var/cache/conftool/dbconfig/20230419-053425-ladsgroup.json
  • 05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47189 and previous config saved to /var/cache/conftool/dbconfig/20230419-053027-ladsgroup.json
  • 05:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 05:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T333332)', diff saved to https://phabricator.wikimedia.org/P47188 and previous config saved to /var/cache/conftool/dbconfig/20230419-053003-ladsgroup.json
  • 05:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P47187 and previous config saved to /var/cache/conftool/dbconfig/20230419-051457-ladsgroup.json
  • 04:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P47186 and previous config saved to /var/cache/conftool/dbconfig/20230419-045951-ladsgroup.json
  • 04:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T333332)', diff saved to https://phabricator.wikimedia.org/P47185 and previous config saved to /var/cache/conftool/dbconfig/20230419-044445-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T333332)', diff saved to https://phabricator.wikimedia.org/P47184 and previous config saved to /var/cache/conftool/dbconfig/20230419-044050-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47183 and previous config saved to /var/cache/conftool/dbconfig/20230419-044027-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P47182 and previous config saved to /var/cache/conftool/dbconfig/20230419-042520-ladsgroup.json
  • 04:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P47181 and previous config saved to /var/cache/conftool/dbconfig/20230419-041013-ladsgroup.json
  • 03:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47180 and previous config saved to /var/cache/conftool/dbconfig/20230419-035507-ladsgroup.json
  • 03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47178 and previous config saved to /var/cache/conftool/dbconfig/20230419-035112-ladsgroup.json
  • 03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 03:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 03:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T333332)', diff saved to https://phabricator.wikimedia.org/P47177 and previous config saved to /var/cache/conftool/dbconfig/20230419-035048-ladsgroup.json
  • 03:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P47176 and previous config saved to /var/cache/conftool/dbconfig/20230419-033542-ladsgroup.json
  • 03:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P47175 and previous config saved to /var/cache/conftool/dbconfig/20230419-032036-ladsgroup.json
  • 03:12 ejegg: payments-wiki upgraded from a01e5ae8 to 66be66e0
  • 03:11 ejegg: civicrm upgraded from 39bbe8cc to efdf9434
  • 03:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T333332)', diff saved to https://phabricator.wikimedia.org/P47174 and previous config saved to /var/cache/conftool/dbconfig/20230419-030530-ladsgroup.json
  • 03:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T333332)', diff saved to https://phabricator.wikimedia.org/P47173 and previous config saved to /var/cache/conftool/dbconfig/20230419-030234-ladsgroup.json
  • 03:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 03:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 03:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 03:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 03:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T333332)', diff saved to https://phabricator.wikimedia.org/P47172 and previous config saved to /var/cache/conftool/dbconfig/20230419-030205-ladsgroup.json
  • 02:47 ejegg: civicrm upgraded from dab8912d to 39bbe8cc
  • 02:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P47171 and previous config saved to /var/cache/conftool/dbconfig/20230419-024658-ladsgroup.json
  • 02:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P47170 and previous config saved to /var/cache/conftool/dbconfig/20230419-023152-ladsgroup.json
  • 02:19 cstone: payments-wiki upgraded from c01a32c4 to a01e5ae8
  • 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T333332)', diff saved to https://phabricator.wikimedia.org/P47168 and previous config saved to /var/cache/conftool/dbconfig/20230419-021646-ladsgroup.json
  • 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T333332)', diff saved to https://phabricator.wikimedia.org/P47167 and previous config saved to /var/cache/conftool/dbconfig/20230419-021051-ladsgroup.json
  • 02:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 02:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T333332)', diff saved to https://phabricator.wikimedia.org/P47166 and previous config saved to /var/cache/conftool/dbconfig/20230419-021028-ladsgroup.json
  • 02:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1075.eqiad.wmnet with OS bullseye
  • 02:03 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 02:01 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P47165 and previous config saved to /var/cache/conftool/dbconfig/20230419-015522-ladsgroup.json
  • 01:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1073.eqiad.wmnet with OS bullseye
  • 01:46 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:42 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1075.eqiad.wmnet with reason: host reimage
  • 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P47164 and previous config saved to /var/cache/conftool/dbconfig/20230419-014016-ladsgroup.json
  • 01:38 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1075.eqiad.wmnet with reason: host reimage
  • 01:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1074.eqiad.wmnet with OS bullseye
  • 01:36 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:34 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T333332)', diff saved to https://phabricator.wikimedia.org/P47163 and previous config saved to /var/cache/conftool/dbconfig/20230419-012509-ladsgroup.json
  • 01:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1075.eqiad.wmnet with OS bullseye
  • 01:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T333332)', diff saved to https://phabricator.wikimedia.org/P47162 and previous config saved to /var/cache/conftool/dbconfig/20230419-012114-ladsgroup.json
  • 01:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 01:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 01:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 01:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 01:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1072.eqiad.wmnet with OS bullseye
  • 01:18 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 01:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 01:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T333332)', diff saved to https://phabricator.wikimedia.org/P47161 and previous config saved to /var/cache/conftool/dbconfig/20230419-011754-ladsgroup.json
  • 01:16 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1074.eqiad.wmnet with reason: host reimage
  • 01:10 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1074.eqiad.wmnet with reason: host reimage
  • 01:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1073.eqiad.wmnet with reason: host reimage
  • 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P47160 and previous config saved to /var/cache/conftool/dbconfig/20230419-010247-ladsgroup.json
  • 01:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1073.eqiad.wmnet with reason: host reimage
  • 00:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1072.eqiad.wmnet with reason: host reimage
  • 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P47159 and previous config saved to /var/cache/conftool/dbconfig/20230419-004741-ladsgroup.json
  • 00:44 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1072.eqiad.wmnet with reason: host reimage
  • 00:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1074.eqiad.wmnet with OS bullseye
  • 00:37 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1073.eqiad.wmnet with OS bullseye
  • 00:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T333332)', diff saved to https://phabricator.wikimedia.org/P47158 and previous config saved to /var/cache/conftool/dbconfig/20230419-003235-ladsgroup.json
  • 00:30 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1075.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1210 (T333332)', diff saved to https://phabricator.wikimedia.org/P47157 and previous config saved to /var/cache/conftool/dbconfig/20230419-002952-ladsgroup.json
  • 00:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 00:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 00:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T333332)', diff saved to https://phabricator.wikimedia.org/P47156 and previous config saved to /var/cache/conftool/dbconfig/20230419-002929-ladsgroup.json
  • 00:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be1072.eqiad.wmnet with OS bullseye
  • 00:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:19 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1074.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P47155 and previous config saved to /var/cache/conftool/dbconfig/20230419-001423-ladsgroup.json
  • 00:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:02 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
  • 00:01 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED

2023-04-18

  • 23:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P47154 and previous config saved to /var/cache/conftool/dbconfig/20230418-235916-ladsgroup.json
  • 23:58 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:53 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:50 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T333332)', diff saved to https://phabricator.wikimedia.org/P47153 and previous config saved to /var/cache/conftool/dbconfig/20230418-234410-ladsgroup.json
  • 23:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T333332)', diff saved to https://phabricator.wikimedia.org/P47152 and previous config saved to /var/cache/conftool/dbconfig/20230418-234032-ladsgroup.json
  • 23:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 23:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T333332)', diff saved to https://phabricator.wikimedia.org/P47151 and previous config saved to /var/cache/conftool/dbconfig/20230418-234008-ladsgroup.json
  • 23:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P47150 and previous config saved to /var/cache/conftool/dbconfig/20230418-232502-ladsgroup.json
  • 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P47149 and previous config saved to /var/cache/conftool/dbconfig/20230418-230956-ladsgroup.json
  • 22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T333332)', diff saved to https://phabricator.wikimedia.org/P47148 and previous config saved to /var/cache/conftool/dbconfig/20230418-225449-ladsgroup.json
  • 22:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T333332)', diff saved to https://phabricator.wikimedia.org/P47147 and previous config saved to /var/cache/conftool/dbconfig/20230418-225211-ladsgroup.json
  • 22:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 22:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 22:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T333332)', diff saved to https://phabricator.wikimedia.org/P47146 and previous config saved to /var/cache/conftool/dbconfig/20230418-225148-ladsgroup.json
  • 22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P47145 and previous config saved to /var/cache/conftool/dbconfig/20230418-223642-ladsgroup.json
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P47144 and previous config saved to /var/cache/conftool/dbconfig/20230418-222135-ladsgroup.json
  • 22:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T333332)', diff saved to https://phabricator.wikimedia.org/P47143 and previous config saved to /var/cache/conftool/dbconfig/20230418-220629-ladsgroup.json
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1183 (T333332)', diff saved to https://phabricator.wikimedia.org/P47142 and previous config saved to /var/cache/conftool/dbconfig/20230418-220350-ladsgroup.json
  • 22:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 22:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 22:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T333332)', diff saved to https://phabricator.wikimedia.org/P47141 and previous config saved to /var/cache/conftool/dbconfig/20230418-220327-ladsgroup.json
  • 21:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P47140 and previous config saved to /var/cache/conftool/dbconfig/20230418-214820-ladsgroup.json
  • 21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P47139 and previous config saved to /var/cache/conftool/dbconfig/20230418-213314-ladsgroup.json
  • 21:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T333332)', diff saved to https://phabricator.wikimedia.org/P47138 and previous config saved to /var/cache/conftool/dbconfig/20230418-211808-ladsgroup.json
  • 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T333332)', diff saved to https://phabricator.wikimedia.org/P47137 and previous config saved to /var/cache/conftool/dbconfig/20230418-211529-ladsgroup.json
  • 21:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 21:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 21:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 21:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 21:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47136 and previous config saved to /var/cache/conftool/dbconfig/20230418-211354-ladsgroup.json
  • 20:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P47134 and previous config saved to /var/cache/conftool/dbconfig/20230418-205848-ladsgroup.json
  • 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P47133 and previous config saved to /var/cache/conftool/dbconfig/20230418-204339-ladsgroup.json
  • 20:32 TheresNoTime: close UTC late backport window
  • 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47132 and previous config saved to /var/cache/conftool/dbconfig/20230418-202833-ladsgroup.json
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47131 and previous config saved to /var/cache/conftool/dbconfig/20230418-202554-ladsgroup.json
  • 20:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 20:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47130 and previous config saved to /var/cache/conftool/dbconfig/20230418-202530-ladsgroup.json
  • 20:25 samtar@deploy2002: Finished scap: Backport for Remove weird VisualEditor config hack from 2015, Simplify some more VisualEditor configuration (duration: 10m 32s)
  • 20:16 samtar@deploy2002: matmarex and samtar: Backport for Remove weird VisualEditor config hack from 2015, Simplify some more VisualEditor configuration synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:14 samtar@deploy2002: Started scap: Backport for Remove weird VisualEditor config hack from 2015, Simplify some more VisualEditor configuration
  • 20:13 samtar@deploy2002: Finished scap: Backport for Enable visual enhancements on pages using on dewiki (T318596) (duration: 07m 49s)
  • 20:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1003.eqiad.wmnet with OS bullseye
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P47129 and previous config saved to /var/cache/conftool/dbconfig/20230418-201024-ladsgroup.json
  • 20:06 samtar@deploy2002: matmarex and samtar: Backport for Enable visual enhancements on pages using on dewiki (T318596) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:05 samtar@deploy2002: Started scap: Backport for Enable visual enhancements on pages using on dewiki (T318596)
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P47126 and previous config saved to /var/cache/conftool/dbconfig/20230418-195518-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T333332)', diff saved to https://phabricator.wikimedia.org/P47125 and previous config saved to /var/cache/conftool/dbconfig/20230418-194401-ladsgroup.json
  • 19:43 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
  • 19:40 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47124 and previous config saved to /var/cache/conftool/dbconfig/20230418-194012-ladsgroup.json
  • 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T333332)', diff saved to https://phabricator.wikimedia.org/P47123 and previous config saved to /var/cache/conftool/dbconfig/20230418-193832-ladsgroup.json
  • 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T333332)', diff saved to https://phabricator.wikimedia.org/P47122 and previous config saved to /var/cache/conftool/dbconfig/20230418-193809-ladsgroup.json
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P47121 and previous config saved to /var/cache/conftool/dbconfig/20230418-192855-ladsgroup.json
  • 19:24 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1003.eqiad.wmnet with OS bullseye
  • 19:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P47120 and previous config saved to /var/cache/conftool/dbconfig/20230418-192302-ladsgroup.json
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P47119 and previous config saved to /var/cache/conftool/dbconfig/20230418-191348-ladsgroup.json
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P47118 and previous config saved to /var/cache/conftool/dbconfig/20230418-190756-ladsgroup.json
  • 19:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T333332)', diff saved to https://phabricator.wikimedia.org/P47117 and previous config saved to /var/cache/conftool/dbconfig/20230418-185842-ladsgroup.json
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T333332)', diff saved to https://phabricator.wikimedia.org/P47116 and previous config saved to /var/cache/conftool/dbconfig/20230418-185627-ladsgroup.json
  • 18:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 18:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T333332)', diff saved to https://phabricator.wikimedia.org/P47115 and previous config saved to /var/cache/conftool/dbconfig/20230418-185604-ladsgroup.json
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T333332)', diff saved to https://phabricator.wikimedia.org/P47114 and previous config saved to /var/cache/conftool/dbconfig/20230418-185250-ladsgroup.json
  • 18:51 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudswift1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:51 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host cloudswift1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T333332)', diff saved to https://phabricator.wikimedia.org/P47113 and previous config saved to /var/cache/conftool/dbconfig/20230418-185010-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudswift100[1-2] - pt1979@cumin2002"
  • 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 18:48 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS entries for cloudswift100[1-2] - pt1979@cumin2002"
  • 18:47 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 18:46 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 18:44 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 18:43 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs1020
  • 18:43 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs1020
  • 18:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P47112 and previous config saved to /var/cache/conftool/dbconfig/20230418-184058-ladsgroup.json
  • 18:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:28 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:26 taavi@deploy2002: Finished scap: 909693 and 909700 (duration: 07m 36s)
  • 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P47111 and previous config saved to /var/cache/conftool/dbconfig/20230418-182551-ladsgroup.json
  • 18:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:19 taavi@deploy2002: taavi: 909693 and 909700 synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 18:18 taavi@deploy2002: Started scap: 909693 and 909700
  • 18:15 taavi@deploy2002: Finished scap: Backport for Add temporary message for Graph being disabled (T334895), Add temporary message for Graph being disabled (T334895), Add temporary tracking category for Graph being disabled (T334895), Add temporary tracking category for Graph being disabled (T334895) (duration: 37m 33s)
  • 18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T333332)', diff saved to https://phabricator.wikimedia.org/P47110 and previous config saved to /var/cache/conftool/dbconfig/20230418-181045-ladsgroup.json
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T333332)', diff saved to https://phabricator.wikimedia.org/P47109 and previous config saved to /var/cache/conftool/dbconfig/20230418-180830-ladsgroup.json
  • 18:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 18:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T333332)', diff saved to https://phabricator.wikimedia.org/P47108 and previous config saved to /var/cache/conftool/dbconfig/20230418-180807-ladsgroup.json
  • 17:59 taavi@deploy2002: taavi: Backport for Add temporary message for Graph being disabled (T334895), Add temporary message for Graph being disabled (T334895), Add temporary tracking category for Graph being disabled (T334895), Add temporary tracking category for Graph being disabled (T334895) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P47107 and previous config saved to /var/cache/conftool/dbconfig/20230418-175301-ladsgroup.json
  • 17:48 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:47 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 17:47 jclark@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 17:46 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P47106 and previous config saved to /var/cache/conftool/dbconfig/20230418-173754-ladsgroup.json
  • 17:37 taavi@deploy2002: Started scap: Backport for Add temporary message for Graph being disabled (T334895), Add temporary message for Graph being disabled (T334895), Add temporary tracking category for Graph being disabled (T334895), Add temporary tracking category for Graph being disabled (T334895)
  • 17:26 htriedman@deploy2002: Finished deploy [airflow-dags/platform_eng@3b8ab60]: (no justification provided) (duration: 00m 12s)
  • 17:26 htriedman@deploy2002: Started deploy [airflow-dags/platform_eng@3b8ab60]: (no justification provided)
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T333332)', diff saved to https://phabricator.wikimedia.org/P47105 and previous config saved to /var/cache/conftool/dbconfig/20230418-172247-ladsgroup.json
  • 17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T333332)', diff saved to https://phabricator.wikimedia.org/P47104 and previous config saved to /var/cache/conftool/dbconfig/20230418-172032-ladsgroup.json
  • 17:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 17:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T333332)', diff saved to https://phabricator.wikimedia.org/P47103 and previous config saved to /var/cache/conftool/dbconfig/20230418-171951-ladsgroup.json
  • 17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P47102 and previous config saved to /var/cache/conftool/dbconfig/20230418-170445-ladsgroup.json
  • 16:57 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host htmldumper1001.eqiad.wmnet with OS bullseye
  • 16:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P47101 and previous config saved to /var/cache/conftool/dbconfig/20230418-164939-ladsgroup.json
  • 16:44 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T333332)', diff saved to https://phabricator.wikimedia.org/P47100 and previous config saved to /var/cache/conftool/dbconfig/20230418-163432-ladsgroup.json
  • 16:33 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on htmldumper1001.eqiad.wmnet with reason: host reimage
  • 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T333332)', diff saved to https://phabricator.wikimedia.org/P47099 and previous config saved to /var/cache/conftool/dbconfig/20230418-163217-ladsgroup.json
  • 16:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T333332)', diff saved to https://phabricator.wikimedia.org/P47098 and previous config saved to /var/cache/conftool/dbconfig/20230418-163154-ladsgroup.json
  • 16:29 ariel@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on htmldumper1001.eqiad.wmnet with reason: host reimage
  • 16:23 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:22 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P47097 and previous config saved to /var/cache/conftool/dbconfig/20230418-161648-ladsgroup.json
  • 16:14 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: Depool from primary DC following network maintenance
  • 16:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
  • 16:09 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
  • 16:09 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: Depool from primary DC following network maintenance
  • 16:08 claime: depooling restbase-async from codfw
  • 16:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: End of maintenance - T333377
  • 16:08 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: End of maintenance - T333377
  • 16:04 cgoubert@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: End of maintenance - T333377
  • 16:03 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: End of maintenance - T333377
  • 16:03 cgoubert@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: End of maintenance - T333377
  • 16:03 ariel@cumin1001: START - Cookbook sre.hosts.reimage for host htmldumper1001.eqiad.wmnet with OS bullseye
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P47095 and previous config saved to /var/cache/conftool/dbconfig/20230418-160141-ladsgroup.json
  • 16:00 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: End of maintenance - T333377
  • 16:00 cgoubert@cumin1001: END (ERROR) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: End of maintenance - T333377
  • 15:54 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: End of maintenance - T333377
  • 15:54 cgoubert@cumin1001: END (ERROR) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: End of maintenance - T333377
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T333332)', diff saved to https://phabricator.wikimedia.org/P47093 and previous config saved to /var/cache/conftool/dbconfig/20230418-154635-ladsgroup.json
  • 15:45 sukhe: enable puppet in A:lvs and A:codfw to test CR 908909
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T333332)', diff saved to https://phabricator.wikimedia.org/P47092 and previous config saved to /var/cache/conftool/dbconfig/20230418-154219-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T333332)', diff saved to https://phabricator.wikimedia.org/P47091 and previous config saved to /var/cache/conftool/dbconfig/20230418-154156-ladsgroup.json
  • 15:38 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: End of maintenance - T333377
  • 15:38 cgoubert@cumin1001: END (ERROR) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: End of maintenance - T333377
  • 15:37 sukhe: disable puppet in A:lvs and A:codfw to test CR 908909
  • 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P47090 and previous config saved to /var/cache/conftool/dbconfig/20230418-152649-ladsgroup.json
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P47089 and previous config saved to /var/cache/conftool/dbconfig/20230418-151143-ladsgroup.json
  • 15:07 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: End of maintenance - T333377
  • 15:07 claime: repooling all eqiad active active services post T333377
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T333332)', diff saved to https://phabricator.wikimedia.org/P47088 and previous config saved to /var/cache/conftool/dbconfig/20230418-145637-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T333332)', diff saved to https://phabricator.wikimedia.org/P47087 and previous config saved to /var/cache/conftool/dbconfig/20230418-145422-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T333332)', diff saved to https://phabricator.wikimedia.org/P47086 and previous config saved to /var/cache/conftool/dbconfig/20230418-145359-ladsgroup.json
  • 14:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P47085 and previous config saved to /var/cache/conftool/dbconfig/20230418-143852-ladsgroup.json
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P47084 and previous config saved to /var/cache/conftool/dbconfig/20230418-142346-ladsgroup.json
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T333332)', diff saved to https://phabricator.wikimedia.org/P47083 and previous config saved to /var/cache/conftool/dbconfig/20230418-140840-ladsgroup.json
  • 14:06 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1018.eqiad.wmnet
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T333332)', diff saved to https://phabricator.wikimedia.org/P47082 and previous config saved to /var/cache/conftool/dbconfig/20230418-140626-ladsgroup.json
  • 14:06 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase102[5-7].eqiad.wmnet
  • 14:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:06 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase103[03].eqiad.wmnet
  • 14:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T333332)', diff saved to https://phabricator.wikimedia.org/P47081 and previous config saved to /var/cache/conftool/dbconfig/20230418-140602-ladsgroup.json
  • 14:04 sukhe: running authdns-update to repool eqiad after switch maint: T333377
  • 13:57 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-worker1110.eqiad.wmnet
  • 13:57 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-worker1110.eqiad.wmnet
  • 13:55 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for asw2-d-eqiad
  • 13:55 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for asw2-d-eqiad
  • 13:52 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 270 hosts
  • 13:51 jmm@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ldap-replica1004.wikimedia.org
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P47080 and previous config saved to /var/cache/conftool/dbconfig/20230418-135056-ladsgroup.json
  • 13:49 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for 270 hosts
  • 13:41 elukey: restart etcdmirror on conf2005 (down due to conf1009 under maintenance)
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P47079 and previous config saved to /var/cache/conftool/dbconfig/20230418-133549-ladsgroup.json
  • 13:25 topranks: Rebooting asw2-d-eqiad virtual-chassis (all row D top-of-rack switches) to upgrade JunOS. Row D going down T333377
  • 13:22 xSavitar: RESTBase/Proton deployment complete
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T333332)', diff saved to https://phabricator.wikimedia.org/P47078 and previous config saved to /var/cache/conftool/dbconfig/20230418-132042-ladsgroup.json
  • 13:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T333332)', diff saved to https://phabricator.wikimedia.org/P47076 and previous config saved to /var/cache/conftool/dbconfig/20230418-131827-ladsgroup.json
  • 13:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 13:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T333332)', diff saved to https://phabricator.wikimedia.org/P47075 and previous config saved to /var/cache/conftool/dbconfig/20230418-131738-ladsgroup.json
  • 13:16 derick@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 13:15 derick@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 13:15 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on asw2-d-eqiad with reason: eqiad row D upgrade
  • 13:15 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on asw2-d-eqiad with reason: eqiad row D upgrade
  • 13:14 derick@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 13:13 derick@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 13:12 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 13:12 jbond: disable puppet fleet wide T333377
  • 13:11 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 270 hosts with reason: eqiad row D upgrade
  • 13:10 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 13:06 topranks: disabling ping offload on cr1-eqiad and cr2-eqiad in advance of row D switch upgrade T333377
  • 13:06 jbond: upload libapache2-mod-auth-cas_1.2-1+wmf12u1
  • 13:04 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 270 hosts with reason: eqiad row D upgrade
  • 13:03 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P47074 and previous config saved to /var/cache/conftool/dbconfig/20230418-130231-ladsgroup.json
  • 13:02 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P47073 and previous config saved to /var/cache/conftool/dbconfig/20230418-124724-ladsgroup.json
  • 12:40 sukhe: run authdns-update to depool eqiad for switch upgrade
  • 12:39 moritzm: imported puppet 5.5.22-2+deb12u2 for bookworm-wikimedia T330495
  • 12:36 jiji@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) status all services in all: None - None
  • 12:36 jiji@cumin1001: START - Cookbook sre.discovery.datacenter status all services in all: None - None
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T333332)', diff saved to https://phabricator.wikimedia.org/P47072 and previous config saved to /var/cache/conftool/dbconfig/20230418-123218-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T333332)', diff saved to https://phabricator.wikimedia.org/P47071 and previous config saved to /var/cache/conftool/dbconfig/20230418-122903-ladsgroup.json
  • 12:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 12:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 12:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T333332)', diff saved to https://phabricator.wikimedia.org/P47070 and previous config saved to /var/cache/conftool/dbconfig/20230418-122839-ladsgroup.json
  • 12:27 jiji@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: eqiad row D switches upgrade - T333377
  • 12:27 jiji@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: eqiad row D switches upgrade - T333377
  • 12:26 jiji@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) depool all active/active services in eqiad: eqiad row D switches upgrade - T333377
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P47069 and previous config saved to /var/cache/conftool/dbconfig/20230418-121333-ladsgroup.json
  • 11:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P47068 and previous config saved to /var/cache/conftool/dbconfig/20230418-115827-ladsgroup.json
  • 11:57 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase103[03].eqiad.wmnet
  • 11:57 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase102[5-7].eqiad.wmnet
  • 11:57 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1018.eqiad.wmnet
  • 11:50 jiji@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: eqiad row D switches upgrade - T333377
  • 11:49 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1102.eqiad.wmnet
  • 11:49 jynus@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:49 jynus@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1102.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1001"
  • 11:48 effie: depooling eqiad due to eqiad row D switches upgrade - T333377
  • 11:46 jynus@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1102.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1001"
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T333332)', diff saved to https://phabricator.wikimedia.org/P47067 and previous config saved to /var/cache/conftool/dbconfig/20230418-114320-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T333332)', diff saved to https://phabricator.wikimedia.org/P47066 and previous config saved to /var/cache/conftool/dbconfig/20230418-114106-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 11:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T333332)', diff saved to https://phabricator.wikimedia.org/P47065 and previous config saved to /var/cache/conftool/dbconfig/20230418-114042-ladsgroup.json
  • 11:39 jynus@cumin1001: START - Cookbook sre.dns.netbox
  • 11:34 jynus@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1102.eqiad.wmnet
  • 11:32 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts an-worker1110.eqiad.wmnet
  • 11:30 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-worker1110.eqiad.wmnet
  • 11:27 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1116.eqiad.wmnet
  • 11:27 jynus@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:27 jynus@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1116.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1001"
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P47064 and previous config saved to /var/cache/conftool/dbconfig/20230418-112536-ladsgroup.json
  • 11:24 jynus@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1116.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1001"
  • 11:22 jynus@cumin1001: START - Cookbook sre.dns.netbox
  • 11:22 taavi@deploy2002: Finished scap: Backport for Hide raw Graph tags (T334895) (duration: 07m 09s)
  • 11:16 jynus@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1116.eqiad.wmnet
  • 11:16 taavi@deploy2002: taavi: Backport for Hide raw Graph tags (T334895) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 11:14 taavi@deploy2002: Started scap: Backport for Hide raw Graph tags (T334895)
  • 11:10 urbanecm@deploy2002: Finished scap: Backport for [Growth] Prepare for a Personalized praise config variable change (T334630) (duration: 06m 43s)
  • 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P47063 and previous config saved to /var/cache/conftool/dbconfig/20230418-111029-ladsgroup.json
  • 11:03 urbanecm@deploy2002: Started scap: Backport for [Growth] Prepare for a Personalized praise config variable change (T334630)
  • 11:00 elukey: puppet cert clean kafka_jumbo-eqiad_broker on puppetmaster1001 - remove old certificate (not used anymore)
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T333332)', diff saved to https://phabricator.wikimedia.org/P47062 and previous config saved to /var/cache/conftool/dbconfig/20230418-105523-ladsgroup.json
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T333332)', diff saved to https://phabricator.wikimedia.org/P47061 and previous config saved to /var/cache/conftool/dbconfig/20230418-105308-ladsgroup.json
  • 10:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 10:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 10:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T333332)', diff saved to https://phabricator.wikimedia.org/P47060 and previous config saved to /var/cache/conftool/dbconfig/20230418-105131-ladsgroup.json
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P47059 and previous config saved to /var/cache/conftool/dbconfig/20230418-103625-ladsgroup.json
  • 10:25 jmm@puppetmaster1001: conftool action : set/pooled=no; selector: name=ldap-replica1004.wikimedia.org
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P47058 and previous config saved to /var/cache/conftool/dbconfig/20230418-102119-ladsgroup.json
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T333332)', diff saved to https://phabricator.wikimedia.org/P47057 and previous config saved to /var/cache/conftool/dbconfig/20230418-100612-ladsgroup.json
  • 10:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1218 (T333332)', diff saved to https://phabricator.wikimedia.org/P47056 and previous config saved to /var/cache/conftool/dbconfig/20230418-100359-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 08:38 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.5 refs T330211
  • 08:37 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware
  • 08:37 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-worker1110.eqiad.wmnet with reason: Upgrading RAID controller firmware
  • 08:12 zabe@deploy2002: Finished scap: Backport for Add separate config for enabling JsonConfig (duration: 07m 43s)
  • 08:08 dcausse: repooling wdqs2011
  • 08:06 zabe@deploy2002: zabe: Backport for Add separate config for enabling JsonConfig synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 08:04 zabe@deploy2002: Started scap: Backport for Add separate config for enabling JsonConfig
  • 07:51 cgoubert@deploy2002: Finished scap: Forcing redeplou (duration: 02m 31s)
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1212 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P47055 and previous config saved to /var/cache/conftool/dbconfig/20230418-075032-marostegui.json
  • 07:48 cgoubert@deploy2002: Started scap: Forcing redeplou
  • 07:41 zabe@deploy2002: Finished scap: T334895 (duration: 06m 42s)
  • 07:35 zabe@deploy2002: Started scap: T334895
  • 07:30 zabe@deploy2002: Finished scap: T334895 (duration: 06m 37s)
  • 07:24 zabe@deploy2002: Started scap: T334895
  • 07:20 zabe@deploy2002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.41.0-wmf.4" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.8ZJFnr01rx"' returned non-zero exit status 255. (duration: 00m 00s)
  • 07:20 zabe@deploy2002: Started scap: T334895
  • 07:18 zabe@deploy2002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript mergeMessageFileList.php --wiki=aawiki --force-version "1.41.0-wmf.4" --list-file="/srv/mediawiki-staging/wmf-config/extension-list" --output="/tmp/tmp.c2xgrltrG8"' returned non-zero exit status 255. (duration: 00m 01s)
  • 07:18 zabe@deploy2002: Started scap: T334895
  • 07:16 joe: added requestctl rule for T332061 in logging mode
  • 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1109.eqiad.wmnet
  • 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1109.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:05 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1109.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:03 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:59 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1109.eqiad.wmnet
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db2142 to x2 primary T334821', diff saved to https://phabricator.wikimedia.org/P47054 and previous config saved to /var/cache/conftool/dbconfig/20230418-061101-root.json
  • 06:06 marostegui: Starting x2 codfw failover from db2144 to db2142 - T334821
  • 06:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover x2 T334821
  • 06:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 16591
  • 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover x2 T334821
  • 06:01 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 16591
  • 03:53 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.3 (duration: 02m 08s)
  • 03:51 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.5 refs T330211 (duration: 49m 03s)
  • 03:30 eileen: civicrm upgraded from 0b8e303d to dab8912d
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.5 refs T330211
  • 01:38 eileen: civicrm upgraded from cd0f886d to 0b8e303d
  • 00:54 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cassandra-dev2001.codfw.wmnet
  • 00:54 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for cassandra-dev2001.codfw.wmnet
  • 00:28 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cassandra-dev2001.codfw.wmnet with reason: testing systemd unit changes — T327954
  • 00:28 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on cassandra-dev2001.codfw.wmnet with reason: testing systemd unit changes — T327954
  • 00:26 eileen: config revision changed from 7da418a4 to f25cb7cc

2023-04-17

  • 22:00 zabe@deploy2002: Finished scap: Backport for Fix infinite loop for self-redirects with variants conversion (T333050) (duration: 06m 52s)
  • 22:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 13 hosts with reason: T333377 maint
  • 21:59 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 13 hosts with reason: T333377 maint
  • 21:54 zabe@deploy2002: zabe: Backport for Fix infinite loop for self-redirects with variants conversion (T333050) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 21:53 zabe@deploy2002: Started scap: Backport for Fix infinite loop for self-redirects with variants conversion (T333050)
  • 21:45 zabe@deploy2002: Finished scap: Backport for RC: Handle deleted story (T334829) (duration: 07m 01s)
  • 21:39 zabe@deploy2002: zabe: Backport for RC: Handle deleted story (T334829) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:38 zabe@deploy2002: Started scap: Backport for RC: Handle deleted story (T334829)
  • 21:20 sbassett: Deployed updated mitigation for T333140
  • 21:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T333332)', diff saved to https://phabricator.wikimedia.org/P47053 and previous config saved to /var/cache/conftool/dbconfig/20230417-211909-ladsgroup.json
  • 21:17 inflatador: bking@cumin1001 ban cloudelastic1004 for upcoming switch maintenance T333377
  • 21:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P47052 and previous config saved to /var/cache/conftool/dbconfig/20230417-210403-ladsgroup.json
  • 20:52 urbanecm@deploy2002: Finished scap: Backport for [trwikiquote] Add a HD logo for Vector legacy (T334732) (duration: 07m 02s)
  • 20:50 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 20:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P47051 and previous config saved to /var/cache/conftool/dbconfig/20230417-204856-ladsgroup.json
  • 20:48 joal@deploy2002: Started restart [analytics/aqs/deploy@d273fde]: Restarting AQS to pick up new druid datasource
  • 20:46 urbanecm@deploy2002: urbanecm and superpes: Backport for [trwikiquote] Add a HD logo for Vector legacy (T334732) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:44 urbanecm@deploy2002: Started scap: Backport for [trwikiquote] Add a HD logo for Vector legacy (T334732)
  • 20:35 urbanecm@deploy2002: Finished scap: Backport for Mobile editor: Don't try to take over if the form has already been submitted (T334794 T334797 T334877), Mobile editor: Don't try to take over on non-wikitext content (T334799) (duration: 09m 14s)
  • 20:35 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T333332)', diff saved to https://phabricator.wikimedia.org/P47049 and previous config saved to /var/cache/conftool/dbconfig/20230417-203350-ladsgroup.json
  • 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T333332)', diff saved to https://phabricator.wikimedia.org/P47048 and previous config saved to /var/cache/conftool/dbconfig/20230417-203108-ladsgroup.json
  • 20:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 20:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P47047 and previous config saved to /var/cache/conftool/dbconfig/20230417-203056-ladsgroup.json
  • 20:27 urbanecm@deploy2002: urbanecm and matmarex: Backport for Mobile editor: Don't try to take over if the form has already been submitted (T334794 T334797 T334877), Mobile editor: Don't try to take over on non-wikitext content (T334799) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:26 urbanecm@deploy2002: Started scap: Backport for Mobile editor: Don't try to take over if the form has already been submitted (T334794 T334797 T334877), Mobile editor: Don't try to take over on non-wikitext content (T334799)
  • 20:25 urbanecm@deploy2002: Finished scap: Backport for Stop using redundant $wmg variables for VisualEditor extension (T119117) (duration: 08m 19s)
  • 20:18 urbanecm@deploy2002: urbanecm and matmarex: Backport for Stop using redundant $wmg variables for VisualEditor extension (T119117) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:17 urbanecm@deploy2002: Started scap: Backport for Stop using redundant $wmg variables for VisualEditor extension (T119117)
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P47046 and previous config saved to /var/cache/conftool/dbconfig/20230417-201549-ladsgroup.json
  • 20:14 urbanecm@deploy2002: Finished scap: Backport for ruwiki: Allow sysop to add/remove confirmed group (T334780) (duration: 07m 31s)
  • 20:08 urbanecm@deploy2002: urbanecm and stang: Backport for ruwiki: Allow sysop to add/remove confirmed group (T334780) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:06 urbanecm@deploy2002: Started scap: Backport for ruwiki: Allow sysop to add/remove confirmed group (T334780)
  • 20:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P47045 and previous config saved to /var/cache/conftool/dbconfig/20230417-200043-ladsgroup.json
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P47044 and previous config saved to /var/cache/conftool/dbconfig/20230417-194537-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P47043 and previous config saved to /var/cache/conftool/dbconfig/20230417-194253-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T333332)', diff saved to https://phabricator.wikimedia.org/P47042 and previous config saved to /var/cache/conftool/dbconfig/20230417-194229-ladsgroup.json
  • 19:32 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P47041 and previous config saved to /var/cache/conftool/dbconfig/20230417-192723-ladsgroup.json
  • 19:16 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 19:13 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 19:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P47040 and previous config saved to /var/cache/conftool/dbconfig/20230417-191217-ladsgroup.json
  • 19:00 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T333332)', diff saved to https://phabricator.wikimedia.org/P47039 and previous config saved to /var/cache/conftool/dbconfig/20230417-185710-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T333332)', diff saved to https://phabricator.wikimedia.org/P47038 and previous config saved to /var/cache/conftool/dbconfig/20230417-184525-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P47037 and previous config saved to /var/cache/conftool/dbconfig/20230417-184502-ladsgroup.json
  • 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P47036 and previous config saved to /var/cache/conftool/dbconfig/20230417-182956-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P47035 and previous config saved to /var/cache/conftool/dbconfig/20230417-181449-ladsgroup.json
  • 17:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P47034 and previous config saved to /var/cache/conftool/dbconfig/20230417-175943-ladsgroup.json
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P47033 and previous config saved to /var/cache/conftool/dbconfig/20230417-175700-ladsgroup.json
  • 17:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 17:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T333332)', diff saved to https://phabricator.wikimedia.org/P47032 and previous config saved to /var/cache/conftool/dbconfig/20230417-175636-ladsgroup.json
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P47031 and previous config saved to /var/cache/conftool/dbconfig/20230417-174130-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P47030 and previous config saved to /var/cache/conftool/dbconfig/20230417-172623-ladsgroup.json
  • 17:26 SandraEbele: restarted turnilo with ‘sudo systemctl restart turnilo’
  • 17:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup2010']
  • 17:18 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010']
  • 17:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup2010']
  • 17:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010']
  • 17:14 jhancock@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['backup2010']
  • 17:14 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010']
  • 17:13 SandraEbele: restarted Oozie page view-druid-daily job 0174450-220913162928808-oozie-oozi-C
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T333332)', diff saved to https://phabricator.wikimedia.org/P47029 and previous config saved to /var/cache/conftool/dbconfig/20230417-171117-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T333332)', diff saved to https://phabricator.wikimedia.org/P47028 and previous config saved to /var/cache/conftool/dbconfig/20230417-170838-ladsgroup.json
  • 17:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 17:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 17:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 17:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T333332)', diff saved to https://phabricator.wikimedia.org/P47027 and previous config saved to /var/cache/conftool/dbconfig/20230417-170757-ladsgroup.json
  • 17:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['backup2010']
  • 17:03 volans: installed spicerack_6.4.2 on cumin1001
  • 17:01 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2010']
  • 16:59 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@f8dad05]: analytics: deploy Airflow ArchiveOperator should have a number of retries of 0. T332216 (duration: 00m 12s)
  • 16:59 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@f8dad05]: analytics: deploy Airflow ArchiveOperator should have a number of retries of 0. T332216
  • 16:56 SandraEbele: restarted oozie page view-druid-hourly job 0174449-220913162928808-oozie-oozi-C
  • 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P47026 and previous config saved to /var/cache/conftool/dbconfig/20230417-165251-ladsgroup.json
  • 16:49 volans: installed spicerack_6.4.2 on cumin2002
  • 16:46 volans: uploaded spicerack_6.4.2 to apt.wikimedia.org bullseye-wikimedia
  • 16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2010.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P47025 and previous config saved to /var/cache/conftool/dbconfig/20230417-163744-ladsgroup.json
  • 16:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T333332)', diff saved to https://phabricator.wikimedia.org/P47024 and previous config saved to /var/cache/conftool/dbconfig/20230417-162238-ladsgroup.json
  • 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T333332)', diff saved to https://phabricator.wikimedia.org/P47023 and previous config saved to /var/cache/conftool/dbconfig/20230417-161955-ladsgroup.json
  • 16:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 16:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T333332)', diff saved to https://phabricator.wikimedia.org/P47022 and previous config saved to /var/cache/conftool/dbconfig/20230417-161931-ladsgroup.json
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P47021 and previous config saved to /var/cache/conftool/dbconfig/20230417-160425-ladsgroup.json
  • 16:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2010.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P47020 and previous config saved to /var/cache/conftool/dbconfig/20230417-155654-root.json
  • 15:53 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 05m 30s)
  • 15:50 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked backup2010 hosts in codfw - jhancock@cumin2002"
  • 15:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add newly racked backup2010 hosts in codfw - jhancock@cumin2002"
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P47019 and previous config saved to /var/cache/conftool/dbconfig/20230417-154918-ladsgroup.json
  • 15:48 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 05m 59s)
  • 15:42 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P47018 and previous config saved to /var/cache/conftool/dbconfig/20230417-154149-root.json
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T333332)', diff saved to https://phabricator.wikimedia.org/P47017 and previous config saved to /var/cache/conftool/dbconfig/20230417-153412-ladsgroup.json
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T333332)', diff saved to https://phabricator.wikimedia.org/P47016 and previous config saved to /var/cache/conftool/dbconfig/20230417-153134-ladsgroup.json
  • 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 15:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 15:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T333332)', diff saved to https://phabricator.wikimedia.org/P47015 and previous config saved to /var/cache/conftool/dbconfig/20230417-152916-ladsgroup.json
  • 15:27 urbanecm@deploy2002: Finished scap: Expose the sfsblock-bypass right so it can be assigned to global groups (T334856; second try) (duration: 06m 22s)
  • 15:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P47014 and previous config saved to /var/cache/conftool/dbconfig/20230417-152644-root.json
  • 15:21 urbanecm@deploy2002: Started scap: Expose the sfsblock-bypass right so it can be assigned to global groups (T334856; second try)
  • 15:20 urbanecm@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: LVS Maint - Outage (duration: 23m 03s)
  • 15:18 sukhe: run authdns-update and repool eqiad
  • 15:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P47013 and previous config saved to /var/cache/conftool/dbconfig/20230417-151409-ladsgroup.json
  • 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P47012 and previous config saved to /var/cache/conftool/dbconfig/20230417-151138-root.json
  • 15:09 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs1020
  • 15:09 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs1020
  • 15:07 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host lvs1020.eqiad.wmnet with OS bullseye
  • 15:07 vgutierrez: rolling restart of HAProxy in the text cluster - T334448
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P47011 and previous config saved to /var/cache/conftool/dbconfig/20230417-145902-ladsgroup.json
  • 14:57 urbanecm@deploy2002: Locking from deployment [ALL REPOSITORIES]: LVS Maint - Outage
  • 14:57 urbanecm@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: LVS Maint - Outage (duration: 00m 01s)
  • 14:57 urbanecm@deploy2002: Locking from deployment [ALL REPOSITORIES]: LVS Maint - Outage
  • 14:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P47010 and previous config saved to /var/cache/conftool/dbconfig/20230417-145633-root.json
  • 14:55 claime: repooled mw1375.eqiad.wmnet
  • 14:54 claime: depooling mw1375.eqiad.wmnet
  • 14:53 ladsgroup@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: LVS Maint - Outage (T334703) (duration: 13m 39s)
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T333332)', diff saved to https://phabricator.wikimedia.org/P47009 and previous config saved to /var/cache/conftool/dbconfig/20230417-144356-ladsgroup.json
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1222 (T333332)', diff saved to https://phabricator.wikimedia.org/P47008 and previous config saved to /var/cache/conftool/dbconfig/20230417-144133-ladsgroup.json
  • 14:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P47007 and previous config saved to /var/cache/conftool/dbconfig/20230417-144128-root.json
  • 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T333332)', diff saved to https://phabricator.wikimedia.org/P47006 and previous config saved to /var/cache/conftool/dbconfig/20230417-144109-ladsgroup.json
  • 14:40 ladsgroup@deploy2002: Locking from deployment [ALL REPOSITORIES]: LVS Maint - Outage (T334703)
  • 14:31 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid
  • 14:31 claime: repooling parsoid in eqiad
  • 14:31 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver
  • 14:31 claime: repooling appserver in eqiad
  • 14:30 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=api_appserver
  • 14:30 claime: repooling api_appserver in eqiad
  • 14:30 sukhe: running auth-dns update to depool eqiad
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1119 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P47005 and previous config saved to /var/cache/conftool/dbconfig/20230417-142623-root.json
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P47004 and previous config saved to /var/cache/conftool/dbconfig/20230417-142603-ladsgroup.json
  • 14:25 urbanecm@deploy2002: Finished scap: Backport for Expose the 'sfsblock-bypass' right so it can be assigned to global groups (T334856) (duration: 07m 36s)
  • 14:24 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1020.eqiad.wmnet with reason: host reimage
  • 14:21 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1020.eqiad.wmnet with reason: host reimage
  • 14:19 urbanecm@deploy2002: urbanecm and maurelio: Backport for Expose the 'sfsblock-bypass' right so it can be assigned to global groups (T334856) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:17 urbanecm@deploy2002: Started scap: Backport for Expose the 'sfsblock-bypass' right so it can be assigned to global groups (T334856)
  • 14:14 elukey: upload amd-k8s-device-plugin deb (1.25.2.3-1) to bullseye-wikimedia - T333009
  • 14:12 claime: Migrated linkrecommandation to mw-api-int - T334060
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P47003 and previous config saved to /var/cache/conftool/dbconfig/20230417-141056-ladsgroup.json
  • 14:10 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 14:09 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 14:08 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 14:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1020.eqiad.wmnet with OS bullseye
  • 14:07 claime: Migrating linkrecommandation to mw-api-int - T334060
  • 14:06 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T333332)', diff saved to https://phabricator.wikimedia.org/P47002 and previous config saved to /var/cache/conftool/dbconfig/20230417-135550-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T333332)', diff saved to https://phabricator.wikimedia.org/P47001 and previous config saved to /var/cache/conftool/dbconfig/20230417-135334-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T333332)', diff saved to https://phabricator.wikimedia.org/P47000 and previous config saved to /var/cache/conftool/dbconfig/20230417-135311-ladsgroup.json
  • 13:47 moritzm: installing mariadb-10.3 security updates (Debian packaged version, not the wmf-mariadb packages)
  • 13:39 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.e4 in codfw
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P46999 and previous config saved to /var/cache/conftool/dbconfig/20230417-133804-ladsgroup.json
  • 13:37 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.e4 in codfw
  • 13:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1132.eqiad.wmnet
  • 13:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-worker1132.eqiad.wmnet
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P46998 and previous config saved to /var/cache/conftool/dbconfig/20230417-132258-ladsgroup.json
  • 13:12 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:10 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 13:10 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 13:09 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:08 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:08 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T333332)', diff saved to https://phabricator.wikimedia.org/P46997 and previous config saved to /var/cache/conftool/dbconfig/20230417-130751-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T333332)', diff saved to https://phabricator.wikimedia.org/P46996 and previous config saved to /var/cache/conftool/dbconfig/20230417-130535-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46995 and previous config saved to /var/cache/conftool/dbconfig/20230417-130512-ladsgroup.json
  • 12:59 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 12:59 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 12:59 claime: Migrating linkrecommandation staging to mw-api-int - T334060
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P46994 and previous config saved to /var/cache/conftool/dbconfig/20230417-125006-ladsgroup.json
  • 12:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P46993 and previous config saved to /var/cache/conftool/dbconfig/20230417-123500-ladsgroup.json
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46992 and previous config saved to /var/cache/conftool/dbconfig/20230417-121953-ladsgroup.json
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46991 and previous config saved to /var/cache/conftool/dbconfig/20230417-121734-ladsgroup.json
  • 12:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 12:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P46990 and previous config saved to /var/cache/conftool/dbconfig/20230417-121710-ladsgroup.json
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P46989 and previous config saved to /var/cache/conftool/dbconfig/20230417-120204-ladsgroup.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1119 T326669', diff saved to https://phabricator.wikimedia.org/P46987 and previous config saved to /var/cache/conftool/dbconfig/20230417-115847-marostegui.json
  • 11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P46986 and previous config saved to /var/cache/conftool/dbconfig/20230417-114658-ladsgroup.json
  • 11:33 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1132.eqiad.wmnet
  • 11:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P46985 and previous config saved to /var/cache/conftool/dbconfig/20230417-113152-ladsgroup.json
  • 11:30 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1132.eqiad.wmnet
  • 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P46984 and previous config saved to /var/cache/conftool/dbconfig/20230417-113031-ladsgroup.json
  • 11:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T333332)', diff saved to https://phabricator.wikimedia.org/P46983 and previous config saved to /var/cache/conftool/dbconfig/20230417-113008-ladsgroup.json
  • 11:23 kamila@deploy2002: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1109 from dbctl T334820', diff saved to https://phabricator.wikimedia.org/P46981 and previous config saved to /var/cache/conftool/dbconfig/20230417-111724-marostegui.json
  • 11:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P46980 and previous config saved to /var/cache/conftool/dbconfig/20230417-111501-ladsgroup.json
  • 11:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1132.eqiad.wmnet with OS buster
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P46979 and previous config saved to /var/cache/conftool/dbconfig/20230417-105955-ladsgroup.json
  • 10:59 ladsgroup@deploy2002: Finished scap: Backport for filebackend: Find thumbnails from all backends in FileBackendMultiWrite (T331138) (duration: 07m 16s)
  • 10:54 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
  • 10:53 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.98 in codfw
  • 10:53 ladsgroup@deploy2002: ladsgroup: Backport for filebackend: Find thumbnails from all backends in FileBackendMultiWrite (T331138) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 10:51 ladsgroup@deploy2002: Started scap: Backport for filebackend: Find thumbnails from all backends in FileBackendMultiWrite (T331138)
  • 10:51 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
  • 10:50 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.98 in codfw
  • 10:49 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-commons-local-public.98 in eqiad
  • 10:46 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-commons-local-public.98 in eqiad
  • 10:45 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.remove-ghost-objects (exit_code=0) from container wikipedia-en-local-public.1a in eqiad
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T333332)', diff saved to https://phabricator.wikimedia.org/P46978 and previous config saved to /var/cache/conftool/dbconfig/20230417-104449-ladsgroup.json
  • 10:42 mvernon@cumin2002: START - Cookbook sre.swift.remove-ghost-objects from container wikipedia-en-local-public.1a in eqiad
  • 10:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T333332)', diff saved to https://phabricator.wikimedia.org/P46977 and previous config saved to /var/cache/conftool/dbconfig/20230417-104229-ladsgroup.json
  • 10:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 10:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P46976 and previous config saved to /var/cache/conftool/dbconfig/20230417-104144-ladsgroup.json
  • 10:32 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS buster
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P46974 and previous config saved to /var/cache/conftool/dbconfig/20230417-102637-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P46973 and previous config saved to /var/cache/conftool/dbconfig/20230417-101131-ladsgroup.json
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46972 and previous config saved to /var/cache/conftool/dbconfig/20230417-100003-root.json
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P46971 and previous config saved to /var/cache/conftool/dbconfig/20230417-095625-ladsgroup.json
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T333332)', diff saved to https://phabricator.wikimedia.org/P46970 and previous config saved to /var/cache/conftool/dbconfig/20230417-095404-ladsgroup.json
  • 09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 09:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 09:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 09:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T333332)', diff saved to https://phabricator.wikimedia.org/P46969 and previous config saved to /var/cache/conftool/dbconfig/20230417-095311-ladsgroup.json
  • 09:48 ladsgroup@deploy2002: Finished scap: Backport for Also broadcast RCFeed/IRC events to irc1002/irc2002 (T331702) (duration: 44m 21s)
  • 09:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46968 and previous config saved to /var/cache/conftool/dbconfig/20230417-094459-root.json
  • 09:38 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1116.eqiad.wmnet with reason: T334066
  • 09:38 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1116.eqiad.wmnet with reason: T334066
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P46967 and previous config saved to /var/cache/conftool/dbconfig/20230417-093804-ladsgroup.json
  • 09:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46966 and previous config saved to /var/cache/conftool/dbconfig/20230417-092954-root.json
  • 09:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P46965 and previous config saved to /var/cache/conftool/dbconfig/20230417-092258-ladsgroup.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46964 and previous config saved to /var/cache/conftool/dbconfig/20230417-091449-root.json
  • 09:12 ladsgroup@deploy2002: jmm and ladsgroup: Backport for Also broadcast RCFeed/IRC events to irc1002/irc2002 (T331702) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 09:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T333332)', diff saved to https://phabricator.wikimedia.org/P46963 and previous config saved to /var/cache/conftool/dbconfig/20230417-090751-ladsgroup.json
  • 09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T333332)', diff saved to https://phabricator.wikimedia.org/P46962 and previous config saved to /var/cache/conftool/dbconfig/20230417-090535-ladsgroup.json
  • 09:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 09:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46961 and previous config saved to /var/cache/conftool/dbconfig/20230417-090512-ladsgroup.json
  • 09:04 ladsgroup@deploy2002: Started scap: Backport for Also broadcast RCFeed/IRC events to irc1002/irc2002 (T331702)
  • 09:04 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host netflow6002.drmrs.wmnet
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow6002.drmrs.wmnet on all recursors
  • 09:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow6002.drmrs.wmnet on all recursors
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
  • 09:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46960 and previous config saved to /var/cache/conftool/dbconfig/20230417-085944-root.json
  • 08:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow6002.drmrs.wmnet on all recursors
  • 08:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow6002.drmrs.wmnet on all recursors
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:58 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
  • 08:57 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 100%: Maint done', diff saved to https://phabricator.wikimedia.org/P46959 and previous config saved to /var/cache/conftool/dbconfig/20230417-085623-ladsgroup.json
  • 08:55 kamila@deploy2002: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 08:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:55 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host netflow6002.drmrs.wmnet
  • 08:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 08:54 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 08:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host netflow6002.drmrs.wmnet
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow6002.drmrs.wmnet on all recursors
  • 08:52 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow6002.drmrs.wmnet on all recursors
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
  • 08:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
  • 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P46958 and previous config saved to /var/cache/conftool/dbconfig/20230417-085005-ladsgroup.json
  • 08:48 kamila@deploy2002: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46957 and previous config saved to /var/cache/conftool/dbconfig/20230417-084439-root.json
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P46956 and previous config saved to /var/cache/conftool/dbconfig/20230417-084118-ladsgroup.json
  • 08:39 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:39 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 08:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P46955 and previous config saved to /var/cache/conftool/dbconfig/20230417-083459-ladsgroup.json
  • 08:34 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:33 jmm@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow6002.drmrs.wmnet - jmm@cumin2002"
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db1112 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46954 and previous config saved to /var/cache/conftool/dbconfig/20230417-082934-root.json
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P46953 and previous config saved to /var/cache/conftool/dbconfig/20230417-082613-ladsgroup.json
  • 08:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46952 and previous config saved to /var/cache/conftool/dbconfig/20230417-081953-ladsgroup.json
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46951 and previous config saved to /var/cache/conftool/dbconfig/20230417-081732-ladsgroup.json
  • 08:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 08:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 08:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 10%: Maint done', diff saved to https://phabricator.wikimedia.org/P46950 and previous config saved to /var/cache/conftool/dbconfig/20230417-081108-ladsgroup.json
  • 08:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 08:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 07:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1100.eqiad.wmnet
  • 07:58 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:58 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1100.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 100%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46948 and previous config saved to /var/cache/conftool/dbconfig/20230417-075818-root.json
  • 07:57 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1100.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:55 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 07:54 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netflow6002.drmrs.wmnet - jmm@cumin2002"
  • 07:49 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1100.eqiad.wmnet
  • 07:49 vgutierrez: restart haproxy on cp3054 - T334448
  • 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netflow6002.drmrs.wmnet on all recursors
  • 07:44 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netflow6002.drmrs.wmnet on all recursors
  • 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
  • 07:43 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netflow6002.drmrs.wmnet - jmm@cumin2002"
  • 07:43 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 75%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46946 and previous config saved to /var/cache/conftool/dbconfig/20230417-074313-root.json
  • 07:36 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:36 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host netflow6002.drmrs.wmnet
  • 07:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 07:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 50%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46945 and previous config saved to /var/cache/conftool/dbconfig/20230417-072809-root.json
  • 07:13 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 25%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46944 and previous config saved to /var/cache/conftool/dbconfig/20230417-071304-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 10%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46943 and previous config saved to /var/cache/conftool/dbconfig/20230417-065759-root.json
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 T334820', diff saved to https://phabricator.wikimedia.org/P46942 and previous config saved to /var/cache/conftool/dbconfig/20230417-064525-marostegui.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 5%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46941 and previous config saved to /var/cache/conftool/dbconfig/20230417-064254-root.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 4%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46940 and previous config saved to /var/cache/conftool/dbconfig/20230417-062749-root.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 3%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46939 and previous config saved to /var/cache/conftool/dbconfig/20230417-061244-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 2%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46938 and previous config saved to /var/cache/conftool/dbconfig/20230417-055739-root.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'Change db1152 weight', diff saved to https://phabricator.wikimedia.org/P46937 and previous config saved to /var/cache/conftool/dbconfig/20230417-055721-root.json
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1152 to x2 primary T334663', diff saved to https://phabricator.wikimedia.org/P46936 and previous config saved to /var/cache/conftool/dbconfig/20230417-055644-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1214 (re)pooling @ 1%: Pooling db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46935 and previous config saved to /var/cache/conftool/dbconfig/20230417-054235-root.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1214 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46934 and previous config saved to /var/cache/conftool/dbconfig/20230417-054154-marostegui.json
  • 05:33 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1100 from dbctl T329352', diff saved to https://phabricator.wikimedia.org/P46933 and previous config saved to /var/cache/conftool/dbconfig/20230417-053310-marostegui.json
  • 05:32 marostegui: Stop MariaDB on db1112 to clone db1212 - this will generate lag on s3 wiki replicas
  • 05:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1112 T326669', diff saved to https://phabricator.wikimedia.org/P46931 and previous config saved to /var/cache/conftool/dbconfig/20230417-051903-marostegui.json
  • 04:48 phedenskog@deploy2002: Finished deploy [performance/navtiming@e21f08f]: (no justification provided) (duration: 00m 06s)
  • 04:48 phedenskog@deploy2002: Started deploy [performance/navtiming@e21f08f]: (no justification provided)

2023-04-16

  • 07:54 vgutierrez: restart haproxy on cp2033 to clear unexpected service restart alerts - T334448
  • 01:49 legoktm: legoktm@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki commonswiki "Commons:Picture of the Year/2021/Help" "Commons:Picture of the Year/Help" "Legoktm" --reason "make non-year specific" --skip-talkpages

2023-04-15

  • 07:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T333332)', diff saved to https://phabricator.wikimedia.org/P46929 and previous config saved to /var/cache/conftool/dbconfig/20230415-071327-ladsgroup.json
  • 06:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P46928 and previous config saved to /var/cache/conftool/dbconfig/20230415-065821-ladsgroup.json
  • 06:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P46927 and previous config saved to /var/cache/conftool/dbconfig/20230415-064314-ladsgroup.json
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T333332)', diff saved to https://phabricator.wikimedia.org/P46926 and previous config saved to /var/cache/conftool/dbconfig/20230415-062808-ladsgroup.json
  • 06:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T333332)', diff saved to https://phabricator.wikimedia.org/P46925 and previous config saved to /var/cache/conftool/dbconfig/20230415-062558-ladsgroup.json
  • 06:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 06:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 06:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T333332)', diff saved to https://phabricator.wikimedia.org/P46924 and previous config saved to /var/cache/conftool/dbconfig/20230415-062534-ladsgroup.json
  • 06:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P46923 and previous config saved to /var/cache/conftool/dbconfig/20230415-061028-ladsgroup.json
  • 05:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P46922 and previous config saved to /var/cache/conftool/dbconfig/20230415-055521-ladsgroup.json
  • 05:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T333332)', diff saved to https://phabricator.wikimedia.org/P46921 and previous config saved to /var/cache/conftool/dbconfig/20230415-054015-ladsgroup.json
  • 05:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T333332)', diff saved to https://phabricator.wikimedia.org/P46920 and previous config saved to /var/cache/conftool/dbconfig/20230415-053804-ladsgroup.json
  • 05:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 05:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 05:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T333332)', diff saved to https://phabricator.wikimedia.org/P46919 and previous config saved to /var/cache/conftool/dbconfig/20230415-053752-ladsgroup.json
  • 05:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P46918 and previous config saved to /var/cache/conftool/dbconfig/20230415-052246-ladsgroup.json
  • 05:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P46917 and previous config saved to /var/cache/conftool/dbconfig/20230415-050739-ladsgroup.json
  • 04:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T333332)', diff saved to https://phabricator.wikimedia.org/P46916 and previous config saved to /var/cache/conftool/dbconfig/20230415-045233-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T333332)', diff saved to https://phabricator.wikimedia.org/P46915 and previous config saved to /var/cache/conftool/dbconfig/20230415-045023-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 04:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T333332)', diff saved to https://phabricator.wikimedia.org/P46914 and previous config saved to /var/cache/conftool/dbconfig/20230415-044959-ladsgroup.json
  • 04:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P46913 and previous config saved to /var/cache/conftool/dbconfig/20230415-043453-ladsgroup.json
  • 04:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P46912 and previous config saved to /var/cache/conftool/dbconfig/20230415-041947-ladsgroup.json
  • 04:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T333332)', diff saved to https://phabricator.wikimedia.org/P46911 and previous config saved to /var/cache/conftool/dbconfig/20230415-040440-ladsgroup.json
  • 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T333332)', diff saved to https://phabricator.wikimedia.org/P46910 and previous config saved to /var/cache/conftool/dbconfig/20230415-040230-ladsgroup.json
  • 04:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 04:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 04:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T333332)', diff saved to https://phabricator.wikimedia.org/P46909 and previous config saved to /var/cache/conftool/dbconfig/20230415-040207-ladsgroup.json
  • 03:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P46908 and previous config saved to /var/cache/conftool/dbconfig/20230415-034700-ladsgroup.json
  • 03:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P46907 and previous config saved to /var/cache/conftool/dbconfig/20230415-033154-ladsgroup.json
  • 03:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T333332)', diff saved to https://phabricator.wikimedia.org/P46906 and previous config saved to /var/cache/conftool/dbconfig/20230415-031648-ladsgroup.json
  • 03:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T333332)', diff saved to https://phabricator.wikimedia.org/P46905 and previous config saved to /var/cache/conftool/dbconfig/20230415-031437-ladsgroup.json
  • 03:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 03:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 03:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 03:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 03:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T333332)', diff saved to https://phabricator.wikimedia.org/P46904 and previous config saved to /var/cache/conftool/dbconfig/20230415-031356-ladsgroup.json
  • 02:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P46903 and previous config saved to /var/cache/conftool/dbconfig/20230415-025850-ladsgroup.json
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P46902 and previous config saved to /var/cache/conftool/dbconfig/20230415-024344-ladsgroup.json
  • 02:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T333332)', diff saved to https://phabricator.wikimedia.org/P46901 and previous config saved to /var/cache/conftool/dbconfig/20230415-022837-ladsgroup.json
  • 02:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T333332)', diff saved to https://phabricator.wikimedia.org/P46900 and previous config saved to /var/cache/conftool/dbconfig/20230415-022627-ladsgroup.json
  • 02:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 02:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 02:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T333332)', diff saved to https://phabricator.wikimedia.org/P46899 and previous config saved to /var/cache/conftool/dbconfig/20230415-022604-ladsgroup.json
  • 02:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P46898 and previous config saved to /var/cache/conftool/dbconfig/20230415-021057-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P46897 and previous config saved to /var/cache/conftool/dbconfig/20230415-015551-ladsgroup.json
  • 01:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T333332)', diff saved to https://phabricator.wikimedia.org/P46896 and previous config saved to /var/cache/conftool/dbconfig/20230415-014045-ladsgroup.json
  • 01:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T333332)', diff saved to https://phabricator.wikimedia.org/P46895 and previous config saved to /var/cache/conftool/dbconfig/20230415-013835-ladsgroup.json
  • 01:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 01:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 01:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T333332)', diff saved to https://phabricator.wikimedia.org/P46894 and previous config saved to /var/cache/conftool/dbconfig/20230415-013811-ladsgroup.json
  • 01:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46893 and previous config saved to /var/cache/conftool/dbconfig/20230415-012753-ladsgroup.json
  • 01:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P46892 and previous config saved to /var/cache/conftool/dbconfig/20230415-012305-ladsgroup.json
  • 01:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P46891 and previous config saved to /var/cache/conftool/dbconfig/20230415-011246-ladsgroup.json
  • 01:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P46890 and previous config saved to /var/cache/conftool/dbconfig/20230415-010759-ladsgroup.json
  • 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P46889 and previous config saved to /var/cache/conftool/dbconfig/20230415-005740-ladsgroup.json
  • 00:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T333332)', diff saved to https://phabricator.wikimedia.org/P46888 and previous config saved to /var/cache/conftool/dbconfig/20230415-005252-ladsgroup.json
  • 00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T333332)', diff saved to https://phabricator.wikimedia.org/P46887 and previous config saved to /var/cache/conftool/dbconfig/20230415-005042-ladsgroup.json
  • 00:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 00:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 00:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T333332)', diff saved to https://phabricator.wikimedia.org/P46886 and previous config saved to /var/cache/conftool/dbconfig/20230415-005019-ladsgroup.json
  • 00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46885 and previous config saved to /var/cache/conftool/dbconfig/20230415-004233-ladsgroup.json
  • 00:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P46884 and previous config saved to /var/cache/conftool/dbconfig/20230415-003512-ladsgroup.json
  • 00:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46883 and previous config saved to /var/cache/conftool/dbconfig/20230415-003315-ladsgroup.json
  • 00:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 00:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46882 and previous config saved to /var/cache/conftool/dbconfig/20230415-003251-ladsgroup.json
  • 00:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P46881 and previous config saved to /var/cache/conftool/dbconfig/20230415-002006-ladsgroup.json
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P46880 and previous config saved to /var/cache/conftool/dbconfig/20230415-001745-ladsgroup.json
  • 00:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T333332)', diff saved to https://phabricator.wikimedia.org/P46879 and previous config saved to /var/cache/conftool/dbconfig/20230415-000500-ladsgroup.json
  • 00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T333332)', diff saved to https://phabricator.wikimedia.org/P46878 and previous config saved to /var/cache/conftool/dbconfig/20230415-000249-ladsgroup.json
  • 00:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P46877 and previous config saved to /var/cache/conftool/dbconfig/20230415-000239-ladsgroup.json
  • 00:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T333332)', diff saved to https://phabricator.wikimedia.org/P46876 and previous config saved to /var/cache/conftool/dbconfig/20230415-000226-ladsgroup.json

2023-04-14

  • 23:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46875 and previous config saved to /var/cache/conftool/dbconfig/20230414-234732-ladsgroup.json
  • 23:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P46874 and previous config saved to /var/cache/conftool/dbconfig/20230414-234720-ladsgroup.json
  • 23:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46873 and previous config saved to /var/cache/conftool/dbconfig/20230414-234516-ladsgroup.json
  • 23:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 23:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 23:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46872 and previous config saved to /var/cache/conftool/dbconfig/20230414-234453-ladsgroup.json
  • 23:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P46871 and previous config saved to /var/cache/conftool/dbconfig/20230414-233213-ladsgroup.json
  • 23:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P46870 and previous config saved to /var/cache/conftool/dbconfig/20230414-232946-ladsgroup.json
  • 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T333332)', diff saved to https://phabricator.wikimedia.org/P46869 and previous config saved to /var/cache/conftool/dbconfig/20230414-231707-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T333332)', diff saved to https://phabricator.wikimedia.org/P46868 and previous config saved to /var/cache/conftool/dbconfig/20230414-231557-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 23:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 23:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 23:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 23:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 23:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 23:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 23:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 23:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 23:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T333332)', diff saved to https://phabricator.wikimedia.org/P46867 and previous config saved to /var/cache/conftool/dbconfig/20230414-231440-ladsgroup.json
  • 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P46866 and previous config saved to /var/cache/conftool/dbconfig/20230414-231440-ladsgroup.json
  • 22:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P46865 and previous config saved to /var/cache/conftool/dbconfig/20230414-225934-ladsgroup.json
  • 22:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46864 and previous config saved to /var/cache/conftool/dbconfig/20230414-225934-ladsgroup.json
  • 22:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46863 and previous config saved to /var/cache/conftool/dbconfig/20230414-225717-ladsgroup.json
  • 22:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 22:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 22:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T333332)', diff saved to https://phabricator.wikimedia.org/P46862 and previous config saved to /var/cache/conftool/dbconfig/20230414-225654-ladsgroup.json
  • 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P46861 and previous config saved to /var/cache/conftool/dbconfig/20230414-224428-ladsgroup.json
  • 22:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P46860 and previous config saved to /var/cache/conftool/dbconfig/20230414-224147-ladsgroup.json
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T333332)', diff saved to https://phabricator.wikimedia.org/P46859 and previous config saved to /var/cache/conftool/dbconfig/20230414-222921-ladsgroup.json
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1211 (T333332)', diff saved to https://phabricator.wikimedia.org/P46858 and previous config saved to /var/cache/conftool/dbconfig/20230414-222814-ladsgroup.json
  • 22:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 22:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 22:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T333332)', diff saved to https://phabricator.wikimedia.org/P46857 and previous config saved to /var/cache/conftool/dbconfig/20230414-222750-ladsgroup.json
  • 22:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P46856 and previous config saved to /var/cache/conftool/dbconfig/20230414-222641-ladsgroup.json
  • 22:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P46855 and previous config saved to /var/cache/conftool/dbconfig/20230414-221244-ladsgroup.json
  • 22:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T333332)', diff saved to https://phabricator.wikimedia.org/P46854 and previous config saved to /var/cache/conftool/dbconfig/20230414-221134-ladsgroup.json
  • 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T333332)', diff saved to https://phabricator.wikimedia.org/P46853 and previous config saved to /var/cache/conftool/dbconfig/20230414-220918-ladsgroup.json
  • 22:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 22:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T333332)', diff saved to https://phabricator.wikimedia.org/P46852 and previous config saved to /var/cache/conftool/dbconfig/20230414-220838-ladsgroup.json
  • 21:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P46851 and previous config saved to /var/cache/conftool/dbconfig/20230414-215738-ladsgroup.json
  • 21:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P46850 and previous config saved to /var/cache/conftool/dbconfig/20230414-215331-ladsgroup.json
  • 21:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T333332)', diff saved to https://phabricator.wikimedia.org/P46849 and previous config saved to /var/cache/conftool/dbconfig/20230414-214231-ladsgroup.json
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1209 (T333332)', diff saved to https://phabricator.wikimedia.org/P46848 and previous config saved to /var/cache/conftool/dbconfig/20230414-214123-ladsgroup.json
  • 21:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T333332)', diff saved to https://phabricator.wikimedia.org/P46847 and previous config saved to /var/cache/conftool/dbconfig/20230414-214100-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P46846 and previous config saved to /var/cache/conftool/dbconfig/20230414-213825-ladsgroup.json
  • 21:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P46845 and previous config saved to /var/cache/conftool/dbconfig/20230414-212554-ladsgroup.json
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T333332)', diff saved to https://phabricator.wikimedia.org/P46844 and previous config saved to /var/cache/conftool/dbconfig/20230414-212319-ladsgroup.json
  • 21:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T333332)', diff saved to https://phabricator.wikimedia.org/P46843 and previous config saved to /var/cache/conftool/dbconfig/20230414-212102-ladsgroup.json
  • 21:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46842 and previous config saved to /var/cache/conftool/dbconfig/20230414-212039-ladsgroup.json
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P46841 and previous config saved to /var/cache/conftool/dbconfig/20230414-211048-ladsgroup.json
  • 21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P46840 and previous config saved to /var/cache/conftool/dbconfig/20230414-210533-ladsgroup.json
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T333332)', diff saved to https://phabricator.wikimedia.org/P46838 and previous config saved to /var/cache/conftool/dbconfig/20230414-205541-ladsgroup.json
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T333332)', diff saved to https://phabricator.wikimedia.org/P46837 and previous config saved to /var/cache/conftool/dbconfig/20230414-205333-ladsgroup.json
  • 20:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 20:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 20:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T333332)', diff saved to https://phabricator.wikimedia.org/P46836 and previous config saved to /var/cache/conftool/dbconfig/20230414-205310-ladsgroup.json
  • 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P46835 and previous config saved to /var/cache/conftool/dbconfig/20230414-205026-ladsgroup.json
  • 20:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P46834 and previous config saved to /var/cache/conftool/dbconfig/20230414-203804-ladsgroup.json
  • 20:36 papaul: rebooting labstore1004 for mgmt interface issue
  • 20:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46833 and previous config saved to /var/cache/conftool/dbconfig/20230414-203520-ladsgroup.json
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46832 and previous config saved to /var/cache/conftool/dbconfig/20230414-203304-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46831 and previous config saved to /var/cache/conftool/dbconfig/20230414-203241-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1207 (T333332)', diff saved to https://phabricator.wikimedia.org/P46830 and previous config saved to /var/cache/conftool/dbconfig/20230414-203220-ladsgroup.json
  • 20:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 20:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T333332)', diff saved to https://phabricator.wikimedia.org/P46829 and previous config saved to /var/cache/conftool/dbconfig/20230414-203156-ladsgroup.json
  • 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P46828 and previous config saved to /var/cache/conftool/dbconfig/20230414-202257-ladsgroup.json
  • 20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P46827 and previous config saved to /var/cache/conftool/dbconfig/20230414-201734-ladsgroup.json
  • 20:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P46826 and previous config saved to /var/cache/conftool/dbconfig/20230414-201650-ladsgroup.json
  • 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T333332)', diff saved to https://phabricator.wikimedia.org/P46825 and previous config saved to /var/cache/conftool/dbconfig/20230414-200751-ladsgroup.json
  • 20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T333332)', diff saved to https://phabricator.wikimedia.org/P46824 and previous config saved to /var/cache/conftool/dbconfig/20230414-200543-ladsgroup.json
  • 20:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 20:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 20:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T333332)', diff saved to https://phabricator.wikimedia.org/P46823 and previous config saved to /var/cache/conftool/dbconfig/20230414-200520-ladsgroup.json
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P46822 and previous config saved to /var/cache/conftool/dbconfig/20230414-200226-ladsgroup.json
  • 20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P46821 and previous config saved to /var/cache/conftool/dbconfig/20230414-200144-ladsgroup.json
  • 19:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P46820 and previous config saved to /var/cache/conftool/dbconfig/20230414-195014-ladsgroup.json
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46819 and previous config saved to /var/cache/conftool/dbconfig/20230414-194720-ladsgroup.json
  • 19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T333332)', diff saved to https://phabricator.wikimedia.org/P46818 and previous config saved to /var/cache/conftool/dbconfig/20230414-194637-ladsgroup.json
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46817 and previous config saved to /var/cache/conftool/dbconfig/20230414-194504-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T333332)', diff saved to https://phabricator.wikimedia.org/P46816 and previous config saved to /var/cache/conftool/dbconfig/20230414-194441-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1206 (T333332)', diff saved to https://phabricator.wikimedia.org/P46815 and previous config saved to /var/cache/conftool/dbconfig/20230414-194424-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T333332)', diff saved to https://phabricator.wikimedia.org/P46814 and previous config saved to /var/cache/conftool/dbconfig/20230414-194401-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P46813 and previous config saved to /var/cache/conftool/dbconfig/20230414-193507-ladsgroup.json
  • 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P46812 and previous config saved to /var/cache/conftool/dbconfig/20230414-192934-ladsgroup.json
  • 19:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P46811 and previous config saved to /var/cache/conftool/dbconfig/20230414-192855-ladsgroup.json
  • 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T333332)', diff saved to https://phabricator.wikimedia.org/P46810 and previous config saved to /var/cache/conftool/dbconfig/20230414-192001-ladsgroup.json
  • 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T333332)', diff saved to https://phabricator.wikimedia.org/P46809 and previous config saved to /var/cache/conftool/dbconfig/20230414-191854-ladsgroup.json
  • 19:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 19:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 19:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T333332)', diff saved to https://phabricator.wikimedia.org/P46808 and previous config saved to /var/cache/conftool/dbconfig/20230414-191831-ladsgroup.json
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P46807 and previous config saved to /var/cache/conftool/dbconfig/20230414-191428-ladsgroup.json
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P46806 and previous config saved to /var/cache/conftool/dbconfig/20230414-191348-ladsgroup.json
  • 19:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P46805 and previous config saved to /var/cache/conftool/dbconfig/20230414-190324-ladsgroup.json
  • 18:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T333332)', diff saved to https://phabricator.wikimedia.org/P46804 and previous config saved to /var/cache/conftool/dbconfig/20230414-185921-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T333332)', diff saved to https://phabricator.wikimedia.org/P46803 and previous config saved to /var/cache/conftool/dbconfig/20230414-185842-ladsgroup.json
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T333332)', diff saved to https://phabricator.wikimedia.org/P46802 and previous config saved to /var/cache/conftool/dbconfig/20230414-185705-ladsgroup.json
  • 18:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 18:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T333332)', diff saved to https://phabricator.wikimedia.org/P46801 and previous config saved to /var/cache/conftool/dbconfig/20230414-185642-ladsgroup.json
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T333332)', diff saved to https://phabricator.wikimedia.org/P46800 and previous config saved to /var/cache/conftool/dbconfig/20230414-185630-ladsgroup.json
  • 18:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 18:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 18:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T333332)', diff saved to https://phabricator.wikimedia.org/P46799 and previous config saved to /var/cache/conftool/dbconfig/20230414-185545-ladsgroup.json
  • 18:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P46798 and previous config saved to /var/cache/conftool/dbconfig/20230414-184818-ladsgroup.json
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P46797 and previous config saved to /var/cache/conftool/dbconfig/20230414-184135-ladsgroup.json
  • 18:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P46796 and previous config saved to /var/cache/conftool/dbconfig/20230414-184038-ladsgroup.json
  • 18:36 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 18:33 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 18:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T333332)', diff saved to https://phabricator.wikimedia.org/P46795 and previous config saved to /var/cache/conftool/dbconfig/20230414-183311-ladsgroup.json
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P46794 and previous config saved to /var/cache/conftool/dbconfig/20230414-182629-ladsgroup.json
  • 18:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P46793 and previous config saved to /var/cache/conftool/dbconfig/20230414-182532-ladsgroup.json
  • 18:18 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:17 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T333332)', diff saved to https://phabricator.wikimedia.org/P46792 and previous config saved to /var/cache/conftool/dbconfig/20230414-181123-ladsgroup.json
  • 18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T333332)', diff saved to https://phabricator.wikimedia.org/P46791 and previous config saved to /var/cache/conftool/dbconfig/20230414-181025-ladsgroup.json
  • 18:09 mutante: doc1002, doc2001 - manually remove php7.3-fpm restart timers to fix T334735 and alerting - T322357 - systemctl stop wmf_auto_restart_php7.3-fpm.timer; systemctl stop wmf_auto_restart_php7.3-fpm.service; rm /lib/systemd/system/wmf_auto_restart_php7.3-fpm.*
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T333332)', diff saved to https://phabricator.wikimedia.org/P46790 and previous config saved to /var/cache/conftool/dbconfig/20230414-180812-ladsgroup.json
  • 18:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 18:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T333332)', diff saved to https://phabricator.wikimedia.org/P46789 and previous config saved to /var/cache/conftool/dbconfig/20230414-180748-ladsgroup.json
  • 18:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T333332)', diff saved to https://phabricator.wikimedia.org/P46788 and previous config saved to /var/cache/conftool/dbconfig/20230414-180606-ladsgroup.json
  • 18:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 18:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 18:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 18:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 18:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T333332)', diff saved to https://phabricator.wikimedia.org/P46787 and previous config saved to /var/cache/conftool/dbconfig/20230414-180430-ladsgroup.json
  • 18:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:03 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 17:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1016.eqiad.wmnet with OS bullseye
  • 17:53 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1014.eqiad.wmnet with OS bullseye
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P46786 and previous config saved to /var/cache/conftool/dbconfig/20230414-175242-ladsgroup.json
  • 17:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P46785 and previous config saved to /var/cache/conftool/dbconfig/20230414-174924-ladsgroup.json
  • 17:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 17:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet
  • 17:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
  • 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T333332)', diff saved to https://phabricator.wikimedia.org/P46784 and previous config saved to /var/cache/conftool/dbconfig/20230414-174356-ladsgroup.json
  • 17:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 17:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T333332)', diff saved to https://phabricator.wikimedia.org/P46783 and previous config saved to /var/cache/conftool/dbconfig/20230414-174333-ladsgroup.json
  • 17:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
  • 17:39 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1014.eqiad.wmnet with reason: host reimage
  • 17:39 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072']
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P46782 and previous config saved to /var/cache/conftool/dbconfig/20230414-173734-ladsgroup.json
  • 17:36 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1014.eqiad.wmnet with reason: host reimage
  • 17:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P46781 and previous config saved to /var/cache/conftool/dbconfig/20230414-173418-ladsgroup.json
  • 17:29 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bullseye
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P46780 and previous config saved to /var/cache/conftool/dbconfig/20230414-172826-ladsgroup.json
  • 17:27 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirtlocal1001.eqiad.wmnet
  • 17:25 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 17:24 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:23 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T333332)', diff saved to https://phabricator.wikimedia.org/P46779 and previous config saved to /var/cache/conftool/dbconfig/20230414-172229-ladsgroup.json
  • 17:21 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 17:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T333332)', diff saved to https://phabricator.wikimedia.org/P46778 and previous config saved to /var/cache/conftool/dbconfig/20230414-172016-ladsgroup.json
  • 17:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 17:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T333332)', diff saved to https://phabricator.wikimedia.org/P46777 and previous config saved to /var/cache/conftool/dbconfig/20230414-171953-ladsgroup.json
  • 17:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T333332)', diff saved to https://phabricator.wikimedia.org/P46776 and previous config saved to /var/cache/conftool/dbconfig/20230414-171911-ladsgroup.json
  • 17:17 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1014.eqiad.wmnet with OS bullseye
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T333332)', diff saved to https://phabricator.wikimedia.org/P46775 and previous config saved to /var/cache/conftool/dbconfig/20230414-171702-ladsgroup.json
  • 17:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 17:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T333332)', diff saved to https://phabricator.wikimedia.org/P46774 and previous config saved to /var/cache/conftool/dbconfig/20230414-171638-ladsgroup.json
  • 17:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1015.eqiad.wmnet with OS bullseye
  • 17:15 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P46773 and previous config saved to /var/cache/conftool/dbconfig/20230414-171320-ladsgroup.json
  • 17:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 17:10 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 17:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 17:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P46772 and previous config saved to /var/cache/conftool/dbconfig/20230414-170447-ladsgroup.json
  • 17:04 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 17:04 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 17:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1015.eqiad.wmnet with reason: host reimage
  • 17:02 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P46771 and previous config saved to /var/cache/conftool/dbconfig/20230414-170133-ladsgroup.json
  • 17:00 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1015.eqiad.wmnet with reason: host reimage
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T333332)', diff saved to https://phabricator.wikimedia.org/P46770 and previous config saved to /var/cache/conftool/dbconfig/20230414-165814-ladsgroup.json
  • 16:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P46769 and previous config saved to /var/cache/conftool/dbconfig/20230414-164940-ladsgroup.json
  • 16:47 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1015.eqiad.wmnet with OS bullseye
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P46768 and previous config saved to /var/cache/conftool/dbconfig/20230414-164627-ladsgroup.json
  • 16:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 16:38 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 16:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 16:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T333332)', diff saved to https://phabricator.wikimedia.org/P46767 and previous config saved to /var/cache/conftool/dbconfig/20230414-163434-ladsgroup.json
  • 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T333332)', diff saved to https://phabricator.wikimedia.org/P46766 and previous config saved to /var/cache/conftool/dbconfig/20230414-163221-ladsgroup.json
  • 16:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 16:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 16:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 16:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 16:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T333332)', diff saved to https://phabricator.wikimedia.org/P46765 and previous config saved to /var/cache/conftool/dbconfig/20230414-163120-ladsgroup.json
  • 16:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T333332)', diff saved to https://phabricator.wikimedia.org/P46764 and previous config saved to /var/cache/conftool/dbconfig/20230414-163110-ladsgroup.json
  • 16:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T333332)', diff saved to https://phabricator.wikimedia.org/P46763 and previous config saved to /var/cache/conftool/dbconfig/20230414-162911-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 16:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T333332)', diff saved to https://phabricator.wikimedia.org/P46762 and previous config saved to /var/cache/conftool/dbconfig/20230414-162848-ladsgroup.json
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P46761 and previous config saved to /var/cache/conftool/dbconfig/20230414-161604-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P46760 and previous config saved to /var/cache/conftool/dbconfig/20230414-161341-ladsgroup.json
  • 16:06 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bullseye
  • 16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P46759 and previous config saved to /var/cache/conftool/dbconfig/20230414-160058-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P46758 and previous config saved to /var/cache/conftool/dbconfig/20230414-155835-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T333332)', diff saved to https://phabricator.wikimedia.org/P46757 and previous config saved to /var/cache/conftool/dbconfig/20230414-155758-ladsgroup.json
  • 15:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 15:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T333332)', diff saved to https://phabricator.wikimedia.org/P46756 and previous config saved to /var/cache/conftool/dbconfig/20230414-155735-ladsgroup.json
  • 15:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 15:52 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:52 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:50 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T333332)', diff saved to https://phabricator.wikimedia.org/P46755 and previous config saved to /var/cache/conftool/dbconfig/20230414-154551-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T333332)', diff saved to https://phabricator.wikimedia.org/P46754 and previous config saved to /var/cache/conftool/dbconfig/20230414-154339-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T333332)', diff saved to https://phabricator.wikimedia.org/P46753 and previous config saved to /var/cache/conftool/dbconfig/20230414-154329-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T333332)', diff saved to https://phabricator.wikimedia.org/P46752 and previous config saved to /var/cache/conftool/dbconfig/20230414-154316-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P46751 and previous config saved to /var/cache/conftool/dbconfig/20230414-154228-ladsgroup.json
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T333332)', diff saved to https://phabricator.wikimedia.org/P46750 and previous config saved to /var/cache/conftool/dbconfig/20230414-154119-ladsgroup.json
  • 15:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T333332)', diff saved to https://phabricator.wikimedia.org/P46749 and previous config saved to /var/cache/conftool/dbconfig/20230414-154056-ladsgroup.json
  • 15:36 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bullseye
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P46748 and previous config saved to /var/cache/conftool/dbconfig/20230414-152809-ladsgroup.json
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P46747 and previous config saved to /var/cache/conftool/dbconfig/20230414-152722-ladsgroup.json
  • 15:26 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P46746 and previous config saved to /var/cache/conftool/dbconfig/20230414-152550-ladsgroup.json
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P46745 and previous config saved to /var/cache/conftool/dbconfig/20230414-151303-ladsgroup.json
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T333332)', diff saved to https://phabricator.wikimedia.org/P46744 and previous config saved to /var/cache/conftool/dbconfig/20230414-151216-ladsgroup.json
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T333332)', diff saved to https://phabricator.wikimedia.org/P46743 and previous config saved to /var/cache/conftool/dbconfig/20230414-151108-ladsgroup.json
  • 15:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P46742 and previous config saved to /var/cache/conftool/dbconfig/20230414-151043-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T333332)', diff saved to https://phabricator.wikimedia.org/P46741 and previous config saved to /var/cache/conftool/dbconfig/20230414-151037-ladsgroup.json
  • 15:04 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:04 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T333332)', diff saved to https://phabricator.wikimedia.org/P46740 and previous config saved to /var/cache/conftool/dbconfig/20230414-145756-ladsgroup.json
  • 14:55 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw1349.eqiad.wmnet
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T333332)', diff saved to https://phabricator.wikimedia.org/P46739 and previous config saved to /var/cache/conftool/dbconfig/20230414-145544-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T333332)', diff saved to https://phabricator.wikimedia.org/P46738 and previous config saved to /var/cache/conftool/dbconfig/20230414-145537-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P46737 and previous config saved to /var/cache/conftool/dbconfig/20230414-145531-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T333332)', diff saved to https://phabricator.wikimedia.org/P46736 and previous config saved to /var/cache/conftool/dbconfig/20230414-145521-ladsgroup.json
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T333332)', diff saved to https://phabricator.wikimedia.org/P46735 and previous config saved to /var/cache/conftool/dbconfig/20230414-145327-ladsgroup.json
  • 14:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 14:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 14:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46734 and previous config saved to /var/cache/conftool/dbconfig/20230414-145245-ladsgroup.json
  • 14:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pybal-test2002.codfw.wmnet
  • 14:49 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:49 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pybal-test2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:48 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pybal-test2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 14:45 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P46733 and previous config saved to /var/cache/conftool/dbconfig/20230414-144024-ladsgroup.json
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P46732 and previous config saved to /var/cache/conftool/dbconfig/20230414-144014-ladsgroup.json
  • 14:38 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:38 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mngmt dns fundrasing - jclark@cumin1001"
  • 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts pybal-test2002.codfw.wmnet
  • 14:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pybal-test2001.codfw.wmnet
  • 14:38 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P46731 and previous config saved to /var/cache/conftool/dbconfig/20230414-143738-ladsgroup.json
  • 14:37 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mngmt dns fundrasing - jclark@cumin1001"
  • 14:36 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 14:35 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:32 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts pybal-test2001.codfw.wmnet
  • 14:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 14:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 14:29 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 14:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T333332)', diff saved to https://phabricator.wikimedia.org/P46730 and previous config saved to /var/cache/conftool/dbconfig/20230414-142518-ladsgroup.json
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P46729 and previous config saved to /var/cache/conftool/dbconfig/20230414-142508-ladsgroup.json
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P46728 and previous config saved to /var/cache/conftool/dbconfig/20230414-142232-ladsgroup.json
  • 14:21 claime: rebooting list1001 for cpu bump
  • 14:11 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 14:11 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 14:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T333332)', diff saved to https://phabricator.wikimedia.org/P46727 and previous config saved to /var/cache/conftool/dbconfig/20230414-141002-ladsgroup.json
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T333332)', diff saved to https://phabricator.wikimedia.org/P46726 and previous config saved to /var/cache/conftool/dbconfig/20230414-140749-ladsgroup.json
  • 14:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 14:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46725 and previous config saved to /var/cache/conftool/dbconfig/20230414-140725-ladsgroup.json
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46724 and previous config saved to /var/cache/conftool/dbconfig/20230414-140616-ladsgroup.json
  • 14:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46723 and previous config saved to /var/cache/conftool/dbconfig/20230414-140553-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T333332)', diff saved to https://phabricator.wikimedia.org/P46722 and previous config saved to /var/cache/conftool/dbconfig/20230414-140401-ladsgroup.json
  • 14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T333332)', diff saved to https://phabricator.wikimedia.org/P46721 and previous config saved to /var/cache/conftool/dbconfig/20230414-140258-ladsgroup.json
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P46720 and previous config saved to /var/cache/conftool/dbconfig/20230414-135220-ladsgroup.json
  • 13:51 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P46719 and previous config saved to /var/cache/conftool/dbconfig/20230414-135047-ladsgroup.json
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P46718 and previous config saved to /var/cache/conftool/dbconfig/20230414-134751-ladsgroup.json
  • 13:45 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 13:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 13:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 13:42 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 13:37 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS buster
  • 13:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P46717 and previous config saved to /var/cache/conftool/dbconfig/20230414-133714-ladsgroup.json
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P46716 and previous config saved to /var/cache/conftool/dbconfig/20230414-133540-ladsgroup.json
  • 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P46715 and previous config saved to /var/cache/conftool/dbconfig/20230414-133245-ladsgroup.json
  • 13:31 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:30 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T333332)', diff saved to https://phabricator.wikimedia.org/P46714 and previous config saved to /var/cache/conftool/dbconfig/20230414-132208-ladsgroup.json
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46713 and previous config saved to /var/cache/conftool/dbconfig/20230414-132034-ladsgroup.json
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T333332)', diff saved to https://phabricator.wikimedia.org/P46712 and previous config saved to /var/cache/conftool/dbconfig/20230414-131956-ladsgroup.json
  • 13:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T333332)', diff saved to https://phabricator.wikimedia.org/P46711 and previous config saved to /var/cache/conftool/dbconfig/20230414-131932-ladsgroup.json
  • 13:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46710 and previous config saved to /var/cache/conftool/dbconfig/20230414-131824-ladsgroup.json
  • 13:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T333332)', diff saved to https://phabricator.wikimedia.org/P46709 and previous config saved to /var/cache/conftool/dbconfig/20230414-131739-ladsgroup.json
  • 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T333332)', diff saved to https://phabricator.wikimedia.org/P46708 and previous config saved to /var/cache/conftool/dbconfig/20230414-131631-ladsgroup.json
  • 13:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 13:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 13:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T333332)', diff saved to https://phabricator.wikimedia.org/P46707 and previous config saved to /var/cache/conftool/dbconfig/20230414-131607-ladsgroup.json
  • 13:11 ottomata: granting IdempotentWrite on kafka jumbo-eqiad cluster to User:ANONYNOUS - this will allow for user of newer kafka producers that have enabled transactional writes by default. `kafka acls --add --allow-principal User:ANONYMOUS --cluster --operation IdempotentWrite`
  • 13:07 ottomata: creating User:ANONYMOUS ACLs on kafka-test cluster https://wikitech.wikimedia.org/wiki/Kafka/Administration#Kafka_ACLs
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P46706 and previous config saved to /var/cache/conftool/dbconfig/20230414-130426-ladsgroup.json
  • 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P46705 and previous config saved to /var/cache/conftool/dbconfig/20230414-130234-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P46704 and previous config saved to /var/cache/conftool/dbconfig/20230414-130101-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P46703 and previous config saved to /var/cache/conftool/dbconfig/20230414-124920-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P46702 and previous config saved to /var/cache/conftool/dbconfig/20230414-124727-ladsgroup.json
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P46701 and previous config saved to /var/cache/conftool/dbconfig/20230414-124553-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T333332)', diff saved to https://phabricator.wikimedia.org/P46700 and previous config saved to /var/cache/conftool/dbconfig/20230414-123413-ladsgroup.json
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46699 and previous config saved to /var/cache/conftool/dbconfig/20230414-123221-ladsgroup.json
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T333332)', diff saved to https://phabricator.wikimedia.org/P46698 and previous config saved to /var/cache/conftool/dbconfig/20230414-123201-ladsgroup.json
  • 12:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T333332)', diff saved to https://phabricator.wikimedia.org/P46697 and previous config saved to /var/cache/conftool/dbconfig/20230414-123138-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T333332)', diff saved to https://phabricator.wikimedia.org/P46696 and previous config saved to /var/cache/conftool/dbconfig/20230414-123047-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46695 and previous config saved to /var/cache/conftool/dbconfig/20230414-123011-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46694 and previous config saved to /var/cache/conftool/dbconfig/20230414-122948-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T333332)', diff saved to https://phabricator.wikimedia.org/P46693 and previous config saved to /var/cache/conftool/dbconfig/20230414-122939-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T333332)', diff saved to https://phabricator.wikimedia.org/P46692 and previous config saved to /var/cache/conftool/dbconfig/20230414-122915-ladsgroup.json
  • 12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P46691 and previous config saved to /var/cache/conftool/dbconfig/20230414-121632-ladsgroup.json
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P46690 and previous config saved to /var/cache/conftool/dbconfig/20230414-121442-ladsgroup.json
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P46689 and previous config saved to /var/cache/conftool/dbconfig/20230414-121409-ladsgroup.json
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P46688 and previous config saved to /var/cache/conftool/dbconfig/20230414-120125-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P46687 and previous config saved to /var/cache/conftool/dbconfig/20230414-115935-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P46686 and previous config saved to /var/cache/conftool/dbconfig/20230414-115903-ladsgroup.json
  • 11:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T333332)', diff saved to https://phabricator.wikimedia.org/P46685 and previous config saved to /var/cache/conftool/dbconfig/20230414-114619-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46684 and previous config saved to /var/cache/conftool/dbconfig/20230414-114429-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T333332)', diff saved to https://phabricator.wikimedia.org/P46683 and previous config saved to /var/cache/conftool/dbconfig/20230414-114407-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T333332)', diff saved to https://phabricator.wikimedia.org/P46682 and previous config saved to /var/cache/conftool/dbconfig/20230414-114356-ladsgroup.json
  • 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 11:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46681 and previous config saved to /var/cache/conftool/dbconfig/20230414-114219-ladsgroup.json
  • 11:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1109 (T333332)', diff saved to https://phabricator.wikimedia.org/P46680 and previous config saved to /var/cache/conftool/dbconfig/20230414-114148-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 11:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 11:34 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 10:49 kamila@deploy2002: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 10:43 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1120.eqiad.wmnet
  • 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:40 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1120.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 10:39 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1120.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 10:37 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 10:32 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1120.eqiad.wmnet
  • 10:26 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 10:08 kamila@deploy2002: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 09:53 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2.*.codfw.wmnet,cluster=api_appserver
  • 09:53 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2.*.codfw.wmnet,cluster=appserver
  • 09:45 kamila@deploy2002: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 09:22 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=parsoid
  • 09:21 kamila@deploy2002: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 09:16 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2002.codfw.wmnet with reason: systemd package upgrade
  • 09:16 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2002.codfw.wmnet with reason: systemd package upgrade
  • 08:51 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:35 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:21 arturo: aborrero@apt2001:~ $ sudo -i reprepro --noskipold --component thirdparty/kubeadm-k8s-1-23 update buster-wikimedia (T298005)
  • 07:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:39 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1100 T329352', diff saved to https://phabricator.wikimedia.org/P46679 and previous config saved to /var/cache/conftool/dbconfig/20230414-062553-marostegui.json
  • 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1107.eqiad.wmnet
  • 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:09 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1107.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:08 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1107.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:06 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:01 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1107.eqiad.wmnet
  • 04:04 ejegg: SmashPig upgraded from 24d700f4 to db9fa965
  • 01:37 fab@deploy2002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 11s)
  • 01:37 fab@deploy2002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
  • 01:07 fab@deploy2002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 10s)
  • 01:07 fab@deploy2002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
  • 01:01 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 00:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 00:04 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye

2023-04-13

  • 23:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 23:41 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 23:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 23:16 ejegg: civicrm upgraded from 2d5ede8d to cd0f886d
  • 22:00 ryankemper: T333656 `ryankemper@dns1001:~$ sudo -i authdns-update` after merge of https://gerrit.wikimedia.org/r/905754 => `OK - authdns-update successful on all nodes!`
  • 21:38 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 21:37 SandraEbele: Successfully Deployed analytics refinery using scap, then deployed onto hdfs.
  • 21:28 mutante: https://query-preview.wikidata.org has been deactivated at ATS layer - T333656
  • 21:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 21:25 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 21:10 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.4 refs T330210
  • 21:03 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 21:02 mutante: doc1002 (doc.wikimedia.org) - switching from PHP 7.3 to 7.4 - systemctl stop php7.3-fpm, restart php7.4-fpm, apt-get remove --purge php7.3*, systemctl restart apache2. - all tests still working (on deployment server: httpbb --hosts doc1002.eqiad.wmnet /srv/deployment/httpbb-tests/doc/test_doc.yaml) T322357 T319477
  • 21:01 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 20:55 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on doc1002.eqiad.wmnet with reason: maintenance
  • 20:55 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on doc1002.eqiad.wmnet with reason: maintenance
  • 20:55 urbanecm@deploy2002: Finished scap: Backport for Only log 'visualEditorFeatureUse' events if 'editAttemptStep' events are being logged (T334157), Stop using redundant $wmg variable for MobileFrontend extension (T119117) (duration: 06m 26s)
  • 20:50 urbanecm@deploy2002: urbanecm and matmarex: Backport for Only log 'visualEditorFeatureUse' events if 'editAttemptStep' events are being logged (T334157), Stop using redundant $wmg variable for MobileFrontend extension (T119117) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:48 urbanecm@deploy2002: Started scap: Backport for Only log 'visualEditorFeatureUse' events if 'editAttemptStep' events are being logged (T334157), Stop using redundant $wmg variable for MobileFrontend extension (T119117)
  • 20:46 mutante: doc2001 - systemctl stop php7.3-fpm; systemctl restart php7.4-fpm - needed because after gerrit:901612 we had BOTH PHP versions, 7.3 and 7.4 running their own php-fpm process, also packages for both versions are installed, so also manual package removal needed - apt-get remove php7.3* T322357 T319477
  • 20:38 urbanecm@deploy2002: Finished scap: Backport for enwiki: Remove userrights from `founder` (T334692) (duration: 05m 55s)
  • 20:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 20:34 urbanecm@deploy2002: urbanecm: Backport for enwiki: Remove userrights from `founder` (T334692) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:32 urbanecm@deploy2002: Started scap: Backport for enwiki: Remove userrights from `founder` (T334692)
  • 20:32 urbanecm@deploy2002: Finished scap: Backport for [wikitech] Add a logo and a wordmark for Vector 2022 (T334666) (duration: 05m 41s)
  • 20:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 20:27 urbanecm@deploy2002: superpes and urbanecm: Backport for [wikitech] Add a logo and a wordmark for Vector 2022 (T334666) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:27 mutante: doc2001 - switching PHP version from 7.3 to 7.4 for T322357
  • 20:26 urbanecm@deploy2002: Started scap: Backport for [wikitech] Add a logo and a wordmark for Vector 2022 (T334666)
  • 20:25 urbanecm@deploy2002: Finished scap: Backport for Enable mobile page tabs for everyone in ruwiki (T334395) (duration: 06m 49s)
  • 20:20 urbanecm@deploy2002: urbanecm and matmarex: Backport for Enable mobile page tabs for everyone in ruwiki (T334395) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:19 urbanecm@deploy2002: Started scap: Backport for Enable mobile page tabs for everyone in ruwiki (T334395)
  • 20:15 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 20:15 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.3 refs T330210
  • 20:14 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 19:55 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.4 refs T330210
  • 19:29 sukhe: restart pybal on lvs2009 to pick up bgp-med change and pool
  • 19:25 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2009
  • 19:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 19:25 brett@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2009
  • 19:25 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2009
  • 19:25 brett@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2009
  • 19:18 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 19:16 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
  • 19:03 brennen@deploy2002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.3 refs T330210
  • 18:59 bblack: lvs1020: restart pybal for experiment...
  • 18:58 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:57 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 18:56 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2009.codfw.wmnet with OS bullseye
  • 18:46 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 18:45 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
  • 18:44 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 18:44 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 18:42 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
  • 18:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2009.codfw.wmnet with reason: host reimage
  • 18:35 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2009.codfw.wmnet with reason: host reimage
  • 18:34 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.4 refs T330210
  • 18:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:26 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
  • 18:23 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 18:23 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 18:16 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2009.codfw.wmnet with OS bullseye
  • 18:07 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 18:07 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:57 brett: Disable Puppet/PyBal on lvs2009 in preparation for reimaging - T321309
  • 17:55 brett: restarting pybal on lvs2008 to pick up bgp-med change
  • 17:49 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: T334057
  • 17:48 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: T334057
  • 17:46 brett@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2008
  • 17:46 brett@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2008
  • 17:37 brett@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2008
  • 17:37 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2008
  • 17:37 brett@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2008
  • 17:36 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2008
  • 17:31 ejegg: payments-wiki upgraded from 4dcba0a9 to c01a32c4
  • 17:30 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2008.codfw.wmnet with OS bullseye
  • 17:28 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:28 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:28 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:28 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:27 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:27 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2008.codfw.wmnet with reason: host reimage
  • 17:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2008.codfw.wmnet with reason: host reimage
  • 16:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2008.codfw.wmnet with OS bullseye
  • 16:46 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 16:31 sukhe: sudo cumin -b1 -s30 'A:cp-text' 'ats-backend-restart': T332650
  • 16:28 jhancock@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['ms-be2067']
  • 16:28 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2067']
  • 16:27 sukhe: enable puppet on A:cp-text to merge CR 907937
  • 16:23 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 16:21 sukhe: disable puppet on A:cp-text to merge CR 907937
  • 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
  • 16:10 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
  • 16:05 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 16:04 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 15:58 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS buster
  • 15:51 ebysans@deploy2002: Finished deploy [analytics/refinery@4e8f1ac] (hadoop-test): Update druid pageview hourly and daily tables TEST [analytics/refinery@4e8f1ac] (duration: 01m 26s)
  • 15:51 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
  • 15:51 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1001"
  • 15:50 ebysans@deploy2002: Started deploy [analytics/refinery@4e8f1ac] (hadoop-test): Update druid pageview hourly and daily tables TEST [analytics/refinery@4e8f1ac]
  • 15:49 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1001"
  • 15:49 ebysans@deploy2002: Finished deploy [analytics/refinery@4e8f1ac] (thin): Update druid pageview hourly and daily tables THIN [analytics/refinery@4e8f1ac] (duration: 00m 08s)
  • 15:49 ebysans@deploy2002: Started deploy [analytics/refinery@4e8f1ac] (thin): Update druid pageview hourly and daily tables THIN [analytics/refinery@4e8f1ac]
  • 15:49 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS buster
  • 15:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 15:48 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1001"
  • 15:47 ebysans@deploy2002: Finished deploy [analytics/refinery@4e8f1ac]: Update druid pageview hourly and daily tables [analytics/refinery@4e8f1ac] (duration: 06m 24s)
  • 15:47 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1003.eqiad.wmnet with OS bullseye
  • 15:47 andrew@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1001"
  • 15:46 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1001"
  • 15:46 brett: Disable Puppet/PyBal on lvs2008 in preparation for reimaging - T321309
  • 15:44 andrew@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1001"
  • 15:42 SandraEbele: paused Oozie pageview-druid-hourly job.
  • 15:41 ebysans@deploy2002: Started deploy [analytics/refinery@4e8f1ac]: Update druid pageview hourly and daily tables [analytics/refinery@4e8f1ac]
  • 15:36 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2007
  • 15:36 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2007
  • 15:33 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
  • 15:31 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:31 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 15:30 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS buster
  • 15:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
  • 15:29 SandraEbele: deploying analytics refinery-update pageview druid table
  • 15:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 15:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
  • 15:25 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
  • 15:25 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:24 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 15:24 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 15:23 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS buster
  • 15:22 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 15:22 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 15:19 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 15:19 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 15:17 claime: cxserver migrated to mw-api-int on kubernetes, take three - T334204
  • 15:14 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 15:13 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 15:13 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 15:13 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:13 moritzm: remove runc packages installed on mw1349-mw1436, these were once used for a load test with dragonfly and are no longer needed
  • 15:12 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 15:11 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 15:10 claime: Migrating cxserver to mw-api-int on kubernetes, take three - T334204
  • 15:10 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 15:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
  • 15:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1003.eqiad.wmnet with OS bullseye
  • 15:07 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:06 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:06 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:05 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:04 moritzm: installing unbound security updates on buster
  • 15:03 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 15:03 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 15:00 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirtlocal1003.eqiad.wmnet with OS bullseye
  • 14:49 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
  • 14:41 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 14:39 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 14:36 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 14:36 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 14:26 sukhe: restart pybal on lvs2007 to pick up bgp-med change CR 908552
  • 14:23 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:23 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:20 moritzm: installing mariadb-10.3 security updates (as shipped in Debian, not the wmf-mariadb packages)
  • 14:19 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dse-k8s-worker1002.eqiad.wmnet
  • 14:09 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:06 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:06 kamila@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:05 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1002.eqiad.wmnet
  • 14:04 vgutierrez: rolling restart of HAProxy on A:cp-text - T334448
  • 14:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 13:54 sukhe: [puppetmaster] sudo /usr/local/sbin/puppet-facts-upload --proxy http://webproxy.eqiad.wmnet:8080; failing PCC for recently reimaged node
  • 13:45 mbsantos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:45 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1003.eqiad.wmnet with OS bullseye
  • 13:45 mbsantos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 13:44 andrew@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudvirtlocal1003']
  • 13:44 andrew@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1003']
  • 13:43 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
  • 13:43 mbsantos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 13:42 mbsantos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46674 and previous config saved to /var/cache/conftool/dbconfig/20230413-134030-root.json
  • 13:38 mbsantos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:37 mbsantos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:33 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
  • 13:31 sukhe@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs2007
  • 13:30 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs2007
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46673 and previous config saved to /var/cache/conftool/dbconfig/20230413-132525-root.json
  • 13:23 vgutierrez: restarting haproxy in cp5022 - T334448
  • 13:19 jgiannelos@deploy2002: Finished deploy [restbase/deploy@a08f56d]: (no justification provided) (duration: 17m 02s)
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46672 and previous config saved to /var/cache/conftool/dbconfig/20230413-131021-root.json
  • 13:04 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 13:02 jgiannelos@deploy2002: Started deploy [restbase/deploy@a08f56d]: (no justification provided)
  • 12:57 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:57 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:56 claime: Migrating cxserver to mw-api-int on kubernetes, take two - T334204
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46671 and previous config saved to /var/cache/conftool/dbconfig/20230413-125516-root.json
  • 12:49 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46670 and previous config saved to /var/cache/conftool/dbconfig/20230413-124011-root.json
  • 12:38 moritzm: installing systemd security updates on buster
  • 12:33 moritzm: installing Django security updates
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46669 and previous config saved to /var/cache/conftool/dbconfig/20230413-122506-root.json
  • 12:21 filippo@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3001.esams.wmnet
  • 12:21 moritzm: remove imagemagick 8:6.9.10.23+dfsg-2.1+deb10u1+wmf1 from apt.wikimedia.org (obsoleted by 8:6.9.10.23+dfsg-2.1+deb10u4 from the Debian archive) T328901
  • 12:15 filippo@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus3001.esams.wmnet
  • 12:11 moritzm: installing imagemagick security updates for buster T328901
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P46668 and previous config saved to /var/cache/conftool/dbconfig/20230413-121001-root.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P46667 and previous config saved to /var/cache/conftool/dbconfig/20230413-115456-root.json
  • 11:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P46666 and previous config saved to /var/cache/conftool/dbconfig/20230413-113951-root.json
  • 11:34 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1120 from dbctl T334580', diff saved to https://phabricator.wikimedia.org/P46665 and previous config saved to /var/cache/conftool/dbconfig/20230413-113435-marostegui.json
  • 11:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1114 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46664 and previous config saved to /var/cache/conftool/dbconfig/20230413-112446-root.json
  • 11:24 moritzm: installing imagemagick security updates
  • 11:18 cgoubert@deploy2002: Finished scap: Updating mw-on-k8s certificates (duration: 01m 56s)
  • 11:16 cgoubert@deploy2002: Started scap: Updating mw-on-k8s certificates
  • 11:15 claime: Re-deploying mw-on-k8s to update certificates - T334561
  • 10:39 claime: updating appservers and api certificates - T334561
  • 10:23 Emperor: clear old 2/22/Free-object-universal-property.svg thumbs from wikipedia-commons-local-thumb.22 T334303
  • 10:15 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS buster
  • 10:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46662 and previous config saved to /var/cache/conftool/dbconfig/20230413-101307-root.json
  • 10:07 moritzm: installing tomcat security updates
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46661 and previous config saved to /var/cache/conftool/dbconfig/20230413-095802-root.json
  • 09:53 taavi: taavi@mwmaint2002 ~ $ mwscript emptyUserGroup.php --wiki frwikinews editor # T333750
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 50%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46660 and previous config saved to /var/cache/conftool/dbconfig/20230413-094257-root.json
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 25%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46659 and previous config saved to /var/cache/conftool/dbconfig/20230413-092752-root.json
  • 09:25 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host dse-k8s-worker1001.eqiad.wmnet
  • 09:22 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1132.eqiad.wmnet with OS buster
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 10%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46658 and previous config saved to /var/cache/conftool/dbconfig/20230413-091247-root.json
  • 09:12 elukey@cumin1001: START - Cookbook sre.hosts.reboot-single for host dse-k8s-worker1001.eqiad.wmnet
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: centrallog1001.eqiad.wmnet
  • 09:04 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: centrallog1001.eqiad.wmnet
  • 09:01 stevemunene@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS buster
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 5%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46657 and previous config saved to /var/cache/conftool/dbconfig/20230413-085742-root.json
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 4%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46656 and previous config saved to /var/cache/conftool/dbconfig/20230413-084238-root.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P46655 and previous config saved to /var/cache/conftool/dbconfig/20230413-084036-root.json
  • 08:36 moritzm: installing git security updates
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 100%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46654 and previous config saved to /var/cache/conftool/dbconfig/20230413-083457-root.json
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 3%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46653 and previous config saved to /var/cache/conftool/dbconfig/20230413-082732-root.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46652 and previous config saved to /var/cache/conftool/dbconfig/20230413-082532-root.json
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 75%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46651 and previous config saved to /var/cache/conftool/dbconfig/20230413-081952-root.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 2%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46650 and previous config saved to /var/cache/conftool/dbconfig/20230413-081227-root.json
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46649 and previous config saved to /var/cache/conftool/dbconfig/20230413-081027-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 50%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46647 and previous config saved to /var/cache/conftool/dbconfig/20230413-080447-root.json
  • 08:00 moritzm: imported perccli 007.1910.0000.000 to bookworm-wikimedia-private T330495
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1223 (re)pooling @ 1%: Pooling db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46646 and previous config saved to /var/cache/conftool/dbconfig/20230413-075722-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46645 and previous config saved to /var/cache/conftool/dbconfig/20230413-075522-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1223 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46644 and previous config saved to /var/cache/conftool/dbconfig/20230413-075513-marostegui.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 25%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46643 and previous config saved to /var/cache/conftool/dbconfig/20230413-074942-root.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46642 and previous config saved to /var/cache/conftool/dbconfig/20230413-074010-root.json
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 10%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46641 and previous config saved to /var/cache/conftool/dbconfig/20230413-073437-root.json
  • 07:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 10 hosts with reason: Cloning db1117
  • 07:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 10 hosts with reason: Cloning db1117
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46639 and previous config saved to /var/cache/conftool/dbconfig/20230413-072505-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 5%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46638 and previous config saved to /var/cache/conftool/dbconfig/20230413-071932-root.json
  • 07:14 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
  • 07:14 slyngs: Puppet: move htcacheclean to httpd class https://gerrit.wikimedia.org/r/c/operations/puppet/+/904102
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46637 and previous config saved to /var/cache/conftool/dbconfig/20230413-071000-root.json
  • 07:09 moritzm: update bookworm installer to rc1 T330495
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 4%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46636 and previous config saved to /var/cache/conftool/dbconfig/20230413-070428-root.json
  • 06:59 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 06:56 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46635 and previous config saved to /var/cache/conftool/dbconfig/20230413-065456-root.json
  • 06:49 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 3%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46634 and previous config saved to /var/cache/conftool/dbconfig/20230413-064922-root.json
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1114 to clone db1214 T326669', diff saved to https://phabricator.wikimedia.org/P46632 and previous config saved to /var/cache/conftool/dbconfig/20230413-064452-marostegui.json
  • 06:43 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46631 and previous config saved to /var/cache/conftool/dbconfig/20230413-063951-root.json
  • 06:34 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 2%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46630 and previous config saved to /var/cache/conftool/dbconfig/20230413-063417-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1221 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46629 and previous config saved to /var/cache/conftool/dbconfig/20230413-062446-root.json
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1221 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46628 and previous config saved to /var/cache/conftool/dbconfig/20230413-062231-marostegui.json
  • 06:19 marostegui@cumin1001: dbctl commit (dc=all): 'db1210 (re)pooling @ 1%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46627 and previous config saved to /var/cache/conftool/dbconfig/20230413-061913-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1210 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46626 and previous config saved to /var/cache/conftool/dbconfig/20230413-061716-marostegui.json
  • 02:08 fab@deploy2002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 10s)
  • 02:07 fab@deploy2002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
  • 02:01 fab@deploy2002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 11s)
  • 02:01 fab@deploy2002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
  • 02:00 ejegg: civicrm upgraded from 0f37f981 to 2d5ede8d
  • 01:41 fab@deploy2002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 10s)
  • 01:41 fab@deploy2002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
  • 01:23 fab@deploy2002: Finished deploy [airflow-dags/research@f8dad05]: (no justification provided) (duration: 00m 10s)
  • 01:23 fab@deploy2002: Started deploy [airflow-dags/research@f8dad05]: (no justification provided)
  • 00:22 krinkle@deploy2002: Finished deploy [integration/docroot@f68055d]: (no justification provided) (duration: 00m 28s)
  • 00:21 krinkle@deploy2002: Started deploy [integration/docroot@f68055d]: (no justification provided)

2023-04-12

  • 23:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46625 and previous config saved to /var/cache/conftool/dbconfig/20230412-230933-ladsgroup.json
  • 22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P46624 and previous config saved to /var/cache/conftool/dbconfig/20230412-225427-ladsgroup.json
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P46623 and previous config saved to /var/cache/conftool/dbconfig/20230412-223921-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46622 and previous config saved to /var/cache/conftool/dbconfig/20230412-222414-ladsgroup.json
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T333332)', diff saved to https://phabricator.wikimedia.org/P46621 and previous config saved to /var/cache/conftool/dbconfig/20230412-222141-ladsgroup.json
  • 22:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46620 and previous config saved to /var/cache/conftool/dbconfig/20230412-222117-ladsgroup.json
  • 22:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P46619 and previous config saved to /var/cache/conftool/dbconfig/20230412-220611-ladsgroup.json
  • 21:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2007.codfw.wmnet with OS bullseye
  • 21:52 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for sessionstore1001.eqiad.wmnet
  • 21:52 eevans@cumin1001: START - Cookbook sre.hosts.remove-downtime for sessionstore1001.eqiad.wmnet
  • 21:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P46618 and previous config saved to /var/cache/conftool/dbconfig/20230412-215104-ladsgroup.json
  • 21:38 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2007.codfw.wmnet with reason: host reimage
  • 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46617 and previous config saved to /var/cache/conftool/dbconfig/20230412-213558-ladsgroup.json
  • 21:35 urandom: restarting Cassandra —sessionstore1001— to reenable native transport — T327954
  • 21:35 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2007.codfw.wmnet with reason: host reimage
  • 21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46616 and previous config saved to /var/cache/conftool/dbconfig/20230412-213325-ladsgroup.json
  • 21:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 21:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 21:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46615 and previous config saved to /var/cache/conftool/dbconfig/20230412-213301-ladsgroup.json
  • 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P46614 and previous config saved to /var/cache/conftool/dbconfig/20230412-211755-ladsgroup.json
  • 21:16 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2007.codfw.wmnet with OS bullseye
  • 21:04 mutante: gerrit1001 - pushing data over to gerrit1003 via rsync, with bwlimit option: rsync -avp --bwlimit=1m /srv/gerrit/ rsync://gerrit1003.wikimedia.org/gerrit-data/ (T326368)
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P46613 and previous config saved to /var/cache/conftool/dbconfig/20230412-210249-ladsgroup.json
  • 21:01 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs2007.codfw.wmnet with OS bullseye
  • 21:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2007.codfw.wmnet with OS bullseye
  • 20:58 brett: Disable Puppet/PyBal on lvs2007 in preparation for reimaging - T321309
  • 20:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46612 and previous config saved to /var/cache/conftool/dbconfig/20230412-204742-ladsgroup.json
  • 20:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46611 and previous config saved to /var/cache/conftool/dbconfig/20230412-204508-ladsgroup.json
  • 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 20:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T333332)', diff saved to https://phabricator.wikimedia.org/P46610 and previous config saved to /var/cache/conftool/dbconfig/20230412-204445-ladsgroup.json
  • 20:38 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 20:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P46609 and previous config saved to /var/cache/conftool/dbconfig/20230412-202939-ladsgroup.json
  • 20:15 zabe@deploy2002: Finished scap: Backport for Drop unused VectorPageTools feature flag (T332090), Set Vector 2022 as default skin on Welsh Wikipedia (T334279) (duration: 10m 19s)
  • 20:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P46608 and previous config saved to /var/cache/conftool/dbconfig/20230412-201432-ladsgroup.json
  • 20:06 zabe@deploy2002: zabe and jdlrobson: Backport for Drop unused VectorPageTools feature flag (T332090), Set Vector 2022 as default skin on Welsh Wikipedia (T334279) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:05 zabe@deploy2002: Started scap: Backport for Drop unused VectorPageTools feature flag (T332090), Set Vector 2022 as default skin on Welsh Wikipedia (T334279)
  • 19:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T333332)', diff saved to https://phabricator.wikimedia.org/P46606 and previous config saved to /var/cache/conftool/dbconfig/20230412-195926-ladsgroup.json
  • 19:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T333332)', diff saved to https://phabricator.wikimedia.org/P46605 and previous config saved to /var/cache/conftool/dbconfig/20230412-195453-ladsgroup.json
  • 19:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 19:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 19:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 19:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 19:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T333332)', diff saved to https://phabricator.wikimedia.org/P46604 and previous config saved to /var/cache/conftool/dbconfig/20230412-195423-ladsgroup.json
  • 19:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
  • 19:43 zabe@deploy2002: Finished scap: Backport for Revert "Ensure ApiHelp correctly types values in TOCData objects", Revert "Ensure ApiHelp correctly types values in TOCData objects" (duration: 06m 40s)
  • 19:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 19:41 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:40 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 19:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P46603 and previous config saved to /var/cache/conftool/dbconfig/20230412-193917-ladsgroup.json
  • 19:38 zabe@deploy2002: zabe: Backport for Revert "Ensure ApiHelp correctly types values in TOCData objects", Revert "Ensure ApiHelp correctly types values in TOCData objects" synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 19:37 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 19:37 zabe@deploy2002: Started scap: Backport for Revert "Ensure ApiHelp correctly types values in TOCData objects", Revert "Ensure ApiHelp correctly types values in TOCData objects"
  • 19:37 urandom: sessionstore1001: systemctl stop cassandra-a.service && systemctl start cassandra-a.service — T327954
  • 19:36 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
  • 19:35 zabe@deploy2002: Sync cancelled.
  • 19:32 zabe@deploy2002: jforrester and zabe: Backport for composer.json: Explicitly pin psr/http-message to 1.0.1 (T333993), Ensure ApiHelp correctly types values in TOCData objects (T334551), Ensure ApiHelp correctly types values in TOCData objects (T334551) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.
  • 19:30 zabe@deploy2002: Started scap: Backport for composer.json: Explicitly pin psr/http-message to 1.0.1 (T333993), Ensure ApiHelp correctly types values in TOCData objects (T334551), Ensure ApiHelp correctly types values in TOCData objects (T334551)
  • 19:28 urandom: restart Cassandra —sessionstore1001— to disable native transport for testing — T327954
  • 19:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P46602 and previous config saved to /var/cache/conftool/dbconfig/20230412-192411-ladsgroup.json
  • 19:17 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on sessionstore1001.eqiad.wmnet with reason: Reproducing dissonant cluster state
  • 19:16 eevans@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on sessionstore1001.eqiad.wmnet with reason: Reproducing dissonant cluster state
  • 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T333332)', diff saved to https://phabricator.wikimedia.org/P46601 and previous config saved to /var/cache/conftool/dbconfig/20230412-190904-ladsgroup.json
  • 18:42 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:42 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:41 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
  • 18:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
  • 18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T333332)', diff saved to https://phabricator.wikimedia.org/P46600 and previous config saved to /var/cache/conftool/dbconfig/20230412-183822-ladsgroup.json
  • 18:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 18:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46599 and previous config saved to /var/cache/conftool/dbconfig/20230412-183758-ladsgroup.json
  • 18:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P46598 and previous config saved to /var/cache/conftool/dbconfig/20230412-182252-ladsgroup.json
  • 18:16 dancy@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.4 refs T330210 (duration: 06m 02s)
  • 18:10 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.4 refs T330210
  • 18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P46597 and previous config saved to /var/cache/conftool/dbconfig/20230412-180746-ladsgroup.json
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46596 and previous config saved to /var/cache/conftool/dbconfig/20230412-175240-ladsgroup.json
  • 17:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T333332)', diff saved to https://phabricator.wikimedia.org/P46595 and previous config saved to /var/cache/conftool/dbconfig/20230412-174806-ladsgroup.json
  • 17:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 17:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 17:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46594 and previous config saved to /var/cache/conftool/dbconfig/20230412-174743-ladsgroup.json
  • 17:47 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 17:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 17:44 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bullseye
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P46593 and previous config saved to /var/cache/conftool/dbconfig/20230412-173237-ladsgroup.json
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P46592 and previous config saved to /var/cache/conftool/dbconfig/20230412-171730-ladsgroup.json
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T333332)', diff saved to https://phabricator.wikimedia.org/P46591 and previous config saved to /var/cache/conftool/dbconfig/20230412-171219-ladsgroup.json
  • 17:06 ejegg: payments-wiki upgraded from efe7e408 to 4dcba0a9
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46590 and previous config saved to /var/cache/conftool/dbconfig/20230412-170224-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46589 and previous config saved to /var/cache/conftool/dbconfig/20230412-165951-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T333332)', diff saved to https://phabricator.wikimedia.org/P46588 and previous config saved to /var/cache/conftool/dbconfig/20230412-165928-ladsgroup.json
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P46587 and previous config saved to /var/cache/conftool/dbconfig/20230412-165712-ladsgroup.json
  • 16:54 topranks: Updating routing-options on drmrs asw switches to add empty rib inet6 stanza T334281
  • 16:51 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 16:51 topranks: Updating routing-options on Eqiad lsw1 switches to add empty rib inet6 stanza T334281
  • 16:50 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P46586 and previous config saved to /var/cache/conftool/dbconfig/20230412-164422-ladsgroup.json
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P46585 and previous config saved to /var/cache/conftool/dbconfig/20230412-164206-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P46584 and previous config saved to /var/cache/conftool/dbconfig/20230412-162915-ladsgroup.json
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T333332)', diff saved to https://phabricator.wikimedia.org/P46583 and previous config saved to /var/cache/conftool/dbconfig/20230412-162700-ladsgroup.json
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T333332)', diff saved to https://phabricator.wikimedia.org/P46582 and previous config saved to /var/cache/conftool/dbconfig/20230412-162448-ladsgroup.json
  • 16:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T333332)', diff saved to https://phabricator.wikimedia.org/P46581 and previous config saved to /var/cache/conftool/dbconfig/20230412-162422-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T333332)', diff saved to https://phabricator.wikimedia.org/P46580 and previous config saved to /var/cache/conftool/dbconfig/20230412-161409-ladsgroup.json
  • 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T333332)', diff saved to https://phabricator.wikimedia.org/P46579 and previous config saved to /var/cache/conftool/dbconfig/20230412-161135-ladsgroup.json
  • 16:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 16:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T333332)', diff saved to https://phabricator.wikimedia.org/P46578 and previous config saved to /var/cache/conftool/dbconfig/20230412-161112-ladsgroup.json
  • 16:09 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P46577 and previous config saved to /var/cache/conftool/dbconfig/20230412-160916-ladsgroup.json
  • 16:05 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:04 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 16:04 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 16:04 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 16:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs2010.codfw.wmnet with OS bullseye
  • 16:03 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 16:03 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 16:02 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 15:58 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 15:57 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:57 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P46576 and previous config saved to /var/cache/conftool/dbconfig/20230412-155606-ladsgroup.json
  • 15:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P46575 and previous config saved to /var/cache/conftool/dbconfig/20230412-155410-ladsgroup.json
  • 15:52 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:52 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:49 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:49 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:47 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:47 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:46 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs2010.codfw.wmnet with reason: host reimage
  • 15:45 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:44 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:44 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs2010.codfw.wmnet with reason: host reimage
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P46573 and previous config saved to /var/cache/conftool/dbconfig/20230412-154100-ladsgroup.json
  • 15:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T333332)', diff saved to https://phabricator.wikimedia.org/P46572 and previous config saved to /var/cache/conftool/dbconfig/20230412-153903-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T333332)', diff saved to https://phabricator.wikimedia.org/P46571 and previous config saved to /var/cache/conftool/dbconfig/20230412-153651-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T333332)', diff saved to https://phabricator.wikimedia.org/P46570 and previous config saved to /var/cache/conftool/dbconfig/20230412-153627-ladsgroup.json
  • 15:30 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:30 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T333332)', diff saved to https://phabricator.wikimedia.org/P46569 and previous config saved to /var/cache/conftool/dbconfig/20230412-152553-ladsgroup.json
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T333332)', diff saved to https://phabricator.wikimedia.org/P46568 and previous config saved to /var/cache/conftool/dbconfig/20230412-152320-ladsgroup.json
  • 15:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 15:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 15:22 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs2010.codfw.wmnet with OS bullseye
  • 15:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 15:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 15:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P46567 and previous config saved to /var/cache/conftool/dbconfig/20230412-152120-ladsgroup.json
  • 15:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T333332)', diff saved to https://phabricator.wikimedia.org/P46566 and previous config saved to /var/cache/conftool/dbconfig/20230412-152104-ladsgroup.json
  • 15:14 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 15:14 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P46565 and previous config saved to /var/cache/conftool/dbconfig/20230412-150614-ladsgroup.json
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P46564 and previous config saved to /var/cache/conftool/dbconfig/20230412-150557-ladsgroup.json
  • 15:05 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 15:05 kamila@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 15:04 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 15:04 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 15:02 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 15:00 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 15:00 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 14:59 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 14:59 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 14:59 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 14:59 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 14:58 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 14:58 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:57 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:57 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:57 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:56 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:56 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:55 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:55 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 14:54 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:53 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 14:53 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:53 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 14:52 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:52 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:52 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T333332)', diff saved to https://phabricator.wikimedia.org/P46563 and previous config saved to /var/cache/conftool/dbconfig/20230412-145108-ladsgroup.json
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P46562 and previous config saved to /var/cache/conftool/dbconfig/20230412-145051-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T333332)', diff saved to https://phabricator.wikimedia.org/P46561 and previous config saved to /var/cache/conftool/dbconfig/20230412-144856-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T333332)', diff saved to https://phabricator.wikimedia.org/P46560 and previous config saved to /var/cache/conftool/dbconfig/20230412-144815-ladsgroup.json
  • 14:44 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:43 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 14:43 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:43 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 14:42 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:42 kamila@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:41 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 14:40 moritzm: installing apache security updates on phab1004 (phabricator.wikimedia.org)
  • 14:38 moritzm: installing apache security updates on gerrit1001
  • 14:36 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:36 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T333332)', diff saved to https://phabricator.wikimedia.org/P46559 and previous config saved to /var/cache/conftool/dbconfig/20230412-143545-ladsgroup.json
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1202 (T333332)', diff saved to https://phabricator.wikimedia.org/P46558 and previous config saved to /var/cache/conftool/dbconfig/20230412-143331-ladsgroup.json
  • 14:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P46557 and previous config saved to /var/cache/conftool/dbconfig/20230412-143309-ladsgroup.json
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T333332)', diff saved to https://phabricator.wikimedia.org/P46556 and previous config saved to /var/cache/conftool/dbconfig/20230412-143308-ladsgroup.json
  • 14:32 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:23 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46554 and previous config saved to /var/cache/conftool/dbconfig/20230412-142045-root.json
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P46553 and previous config saved to /var/cache/conftool/dbconfig/20230412-141802-ladsgroup.json
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P46552 and previous config saved to /var/cache/conftool/dbconfig/20230412-141801-ladsgroup.json
  • 14:13 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:10 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes kswiki --fix # T334277, fixed the one remaining link
  • 14:07 moritzm: re-enabled Puppet in codfw/edges after puppetdb maintenance
  • 14:07 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 14:06 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:05 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46550 and previous config saved to /var/cache/conftool/dbconfig/20230412-140540-root.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P46549 and previous config saved to /var/cache/conftool/dbconfig/20230412-140255-ladsgroup.json
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T333332)', diff saved to https://phabricator.wikimedia.org/P46548 and previous config saved to /var/cache/conftool/dbconfig/20230412-140045-ladsgroup.json
  • 14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46547 and previous config saved to /var/cache/conftool/dbconfig/20230412-135959-ladsgroup.json
  • 13:55 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46546 and previous config saved to /var/cache/conftool/dbconfig/20230412-135035-root.json
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T333332)', diff saved to https://phabricator.wikimedia.org/P46545 and previous config saved to /var/cache/conftool/dbconfig/20230412-134749-ladsgroup.json
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T333332)', diff saved to https://phabricator.wikimedia.org/P46544 and previous config saved to /var/cache/conftool/dbconfig/20230412-134535-ladsgroup.json
  • 13:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 13:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T333332)', diff saved to https://phabricator.wikimedia.org/P46543 and previous config saved to /var/cache/conftool/dbconfig/20230412-134512-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P46542 and previous config saved to /var/cache/conftool/dbconfig/20230412-134453-ladsgroup.json
  • 13:43 moritzm: stop Puppet in codfw/edges for puppetdb maintenance
  • 13:43 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:39 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Make VE on officewiki use Parsoid directly (T320529 T333402) (duration: 09m 48s)
  • 13:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on puppetdb2002.codfw.wmnet with reason: puppetdb maintenance
  • 13:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on puppetdb2002.codfw.wmnet with reason: puppetdb maintenance
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46541 and previous config saved to /var/cache/conftool/dbconfig/20230412-133531-root.json
  • 13:34 sukhe: [puppetmaster] sudo /usr/local/sbin/puppet-facts-upload --proxy http://webproxy.eqiad.wmnet:8080 to update PCC
  • 13:30 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and daniel: Backport for Make VE on officewiki use Parsoid directly (T320529 T333402) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P46540 and previous config saved to /var/cache/conftool/dbconfig/20230412-133006-ladsgroup.json
  • 13:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P46539 and previous config saved to /var/cache/conftool/dbconfig/20230412-132946-ladsgroup.json
  • 13:29 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Make VE on officewiki use Parsoid directly (T320529 T333402)
  • 13:28 eoghan: Stopping puppet on gitlab hosts to slow-rollout puppet ssh key management - T333840
  • 13:26 elukey: upload AMD ROCm 5.4 debian packages to wikimedia-bullseye:thirdparty/amd-rocm54 - T295661
  • 13:22 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint2002:~$ mwscript namespaceDupes kswiki --fix | tee >(phaste -t T334277) # P46538; errors on stderr, cf. T328634
  • 13:20 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for GrowthExperiments: enable add link frontend in 7,8th round wikis (T304551 T308133) (duration: 13m 30s)
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46537 and previous config saved to /var/cache/conftool/dbconfig/20230412-132026-root.json
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P46535 and previous config saved to /var/cache/conftool/dbconfig/20230412-131459-ladsgroup.json
  • 13:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46533 and previous config saved to /var/cache/conftool/dbconfig/20230412-131440-ladsgroup.json
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46532 and previous config saved to /var/cache/conftool/dbconfig/20230412-131227-ladsgroup.json
  • 13:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46531 and previous config saved to /var/cache/conftool/dbconfig/20230412-131204-ladsgroup.json
  • 13:08 lucaswerkmeister-wmde@deploy2002: sgimeno and lucaswerkmeister-wmde: Backport for GrowthExperiments: enable add link frontend in 7,8th round wikis (T304551 T308133) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 13:07 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for GrowthExperiments: enable add link frontend in 7,8th round wikis (T304551 T308133)
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46530 and previous config saved to /var/cache/conftool/dbconfig/20230412-130521-root.json
  • 13:03 moritzm: installing nodejs security updates on buster
  • 13:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idm1001.wikimedia.org
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T333332)', diff saved to https://phabricator.wikimedia.org/P46529 and previous config saved to /var/cache/conftool/dbconfig/20230412-125953-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T333332)', diff saved to https://phabricator.wikimedia.org/P46528 and previous config saved to /var/cache/conftool/dbconfig/20230412-125739-ladsgroup.json
  • 12:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 12:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idm1001.wikimedia.org
  • 12:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T333332)', diff saved to https://phabricator.wikimedia.org/P46527 and previous config saved to /var/cache/conftool/dbconfig/20230412-125716-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P46526 and previous config saved to /var/cache/conftool/dbconfig/20230412-125658-ladsgroup.json
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1123 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46525 and previous config saved to /var/cache/conftool/dbconfig/20230412-125016-root.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P46524 and previous config saved to /var/cache/conftool/dbconfig/20230412-124210-ladsgroup.json
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P46523 and previous config saved to /var/cache/conftool/dbconfig/20230412-124151-ladsgroup.json
  • 12:35 moritzm: installing intel-microcode security updates
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P46522 and previous config saved to /var/cache/conftool/dbconfig/20230412-122703-ladsgroup.json
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46521 and previous config saved to /var/cache/conftool/dbconfig/20230412-122645-ladsgroup.json
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46520 and previous config saved to /var/cache/conftool/dbconfig/20230412-122433-ladsgroup.json
  • 12:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 12:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46519 and previous config saved to /var/cache/conftool/dbconfig/20230412-122409-ladsgroup.json
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T334580', diff saved to https://phabricator.wikimedia.org/P46518 and previous config saved to /var/cache/conftool/dbconfig/20230412-121420-marostegui.json
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T333332)', diff saved to https://phabricator.wikimedia.org/P46517 and previous config saved to /var/cache/conftool/dbconfig/20230412-121157-ladsgroup.json
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T333332)', diff saved to https://phabricator.wikimedia.org/P46516 and previous config saved to /var/cache/conftool/dbconfig/20230412-120943-ladsgroup.json
  • 12:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P46515 and previous config saved to /var/cache/conftool/dbconfig/20230412-120903-ladsgroup.json
  • 12:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46514 and previous config saved to /var/cache/conftool/dbconfig/20230412-120853-ladsgroup.json
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P46513 and previous config saved to /var/cache/conftool/dbconfig/20230412-115357-ladsgroup.json
  • 11:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P46512 and previous config saved to /var/cache/conftool/dbconfig/20230412-115347-ladsgroup.json
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46509 and previous config saved to /var/cache/conftool/dbconfig/20230412-113850-ladsgroup.json
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P46508 and previous config saved to /var/cache/conftool/dbconfig/20230412-113840-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46507 and previous config saved to /var/cache/conftool/dbconfig/20230412-113638-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T333332)', diff saved to https://phabricator.wikimedia.org/P46506 and previous config saved to /var/cache/conftool/dbconfig/20230412-113615-ladsgroup.json
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46505 and previous config saved to /var/cache/conftool/dbconfig/20230412-112334-ladsgroup.json
  • 11:23 marostegui: dbmaint Upgrade db1106 to mariadb 11.1 (eqiad) T333289
  • 11:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T333332)', diff saved to https://phabricator.wikimedia.org/P46504 and previous config saved to /var/cache/conftool/dbconfig/20230412-112217-ladsgroup.json
  • 11:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46503 and previous config saved to /var/cache/conftool/dbconfig/20230412-112154-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P46502 and previous config saved to /var/cache/conftool/dbconfig/20230412-112108-ladsgroup.json
  • 11:12 moritzm: installing gnutls28 security updates on buster
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P46501 and previous config saved to /var/cache/conftool/dbconfig/20230412-110647-ladsgroup.json
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P46500 and previous config saved to /var/cache/conftool/dbconfig/20230412-110602-ladsgroup.json
  • 11:00 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw2448.*.codfw.wmnet
  • 10:59 claime: repooling mw2448.codfw.wmnet - T334429
  • 10:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2448.codfw.wmnet
  • 10:59 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for mw2448.codfw.wmnet
  • 10:56 moritzm: installing apache2 security updates on Buster
  • 10:56 moritzm: installing apache2 security updates on Bullseye
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46499 and previous config saved to /var/cache/conftool/dbconfig/20230412-105356-root.json
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P46498 and previous config saved to /var/cache/conftool/dbconfig/20230412-105141-ladsgroup.json
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T333332)', diff saved to https://phabricator.wikimedia.org/P46497 and previous config saved to /var/cache/conftool/dbconfig/20230412-105056-ladsgroup.json
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T333332)', diff saved to https://phabricator.wikimedia.org/P46496 and previous config saved to /var/cache/conftool/dbconfig/20230412-104843-ladsgroup.json
  • 10:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T333332)', diff saved to https://phabricator.wikimedia.org/P46495 and previous config saved to /var/cache/conftool/dbconfig/20230412-104820-ladsgroup.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46494 and previous config saved to /var/cache/conftool/dbconfig/20230412-103851-root.json
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46493 and previous config saved to /var/cache/conftool/dbconfig/20230412-103635-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46492 and previous config saved to /var/cache/conftool/dbconfig/20230412-103421-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46491 and previous config saved to /var/cache/conftool/dbconfig/20230412-103348-ladsgroup.json
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P46490 and previous config saved to /var/cache/conftool/dbconfig/20230412-103314-ladsgroup.json
  • 10:29 hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: Dummy deploy with dsh file managed by Puppet (duration: 00m 04s)
  • 10:29 hashar@deploy2002: Started deploy [integration/docroot@ab848e3]: Dummy deploy with dsh file managed by Puppet
  • 10:29 hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: Dummy deploy with dsh file managed by Puppet (duration: 00m 06s)
  • 10:29 hashar@deploy2002: Started deploy [integration/docroot@ab848e3]: Dummy deploy with dsh file managed by Puppet
  • 10:29 hashar@deploy2002: Finished deploy [integration/docroot@ab848e3]: Dummy deploy with dsh file managed by Puppet (duration: 00m 02s)
  • 10:29 hashar@deploy2002: Started deploy [integration/docroot@ab848e3]: Dummy deploy with dsh file managed by Puppet
  • 10:28 hashar@deploy2002: Finished deploy [zuul/deploy@4c6859c]: Dummy deploy with dsh file managed by Puppet (duration: 00m 02s)
  • 10:28 hashar@deploy2002: Started deploy [zuul/deploy@4c6859c]: Dummy deploy with dsh file managed by Puppet
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46489 and previous config saved to /var/cache/conftool/dbconfig/20230412-102346-root.json
  • 10:18 Emperor: clearing out 24 ghost objects from Swift T327253
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P46488 and previous config saved to /var/cache/conftool/dbconfig/20230412-101841-ladsgroup.json
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P46487 and previous config saved to /var/cache/conftool/dbconfig/20230412-101808-ladsgroup.json
  • 10:10 cgoubert@deploy2002: Synchronized README: (no justification provided) (duration: 05m 44s)
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46486 and previous config saved to /var/cache/conftool/dbconfig/20230412-100841-root.json
  • 10:06 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P46485 and previous config saved to /var/cache/conftool/dbconfig/20230412-100335-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T333332)', diff saved to https://phabricator.wikimedia.org/P46484 and previous config saved to /var/cache/conftool/dbconfig/20230412-100301-ladsgroup.json
  • 10:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1123 to clone db1223 T326669', diff saved to https://phabricator.wikimedia.org/P46482 and previous config saved to /var/cache/conftool/dbconfig/20230412-100111-marostegui.json
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T333332)', diff saved to https://phabricator.wikimedia.org/P46481 and previous config saved to /var/cache/conftool/dbconfig/20230412-100049-ladsgroup.json
  • 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T333332)', diff saved to https://phabricator.wikimedia.org/P46480 and previous config saved to /var/cache/conftool/dbconfig/20230412-100026-ladsgroup.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46479 and previous config saved to /var/cache/conftool/dbconfig/20230412-095336-root.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46478 and previous config saved to /var/cache/conftool/dbconfig/20230412-094829-ladsgroup.json
  • 09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T333332)', diff saved to https://phabricator.wikimedia.org/P46477 and previous config saved to /var/cache/conftool/dbconfig/20230412-094615-ladsgroup.json
  • 09:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 09:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46476 and previous config saved to /var/cache/conftool/dbconfig/20230412-094551-ladsgroup.json
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P46475 and previous config saved to /var/cache/conftool/dbconfig/20230412-094520-ladsgroup.json
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46474 and previous config saved to /var/cache/conftool/dbconfig/20230412-093831-root.json
  • 09:34 claime: Reverted migrating cxserver to mw-api-int on kubernetes - T334204
  • 09:34 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 09:34 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P46473 and previous config saved to /var/cache/conftool/dbconfig/20230412-093045-ladsgroup.json
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P46472 and previous config saved to /var/cache/conftool/dbconfig/20230412-093013-ladsgroup.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1121 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46470 and previous config saved to /var/cache/conftool/dbconfig/20230412-092327-root.json
  • 09:21 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
  • 09:21 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 09:20 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P46469 and previous config saved to /var/cache/conftool/dbconfig/20230412-091539-ladsgroup.json
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T333332)', diff saved to https://phabricator.wikimedia.org/P46468 and previous config saved to /var/cache/conftool/dbconfig/20230412-091507-ladsgroup.json
  • 09:13 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T333332)', diff saved to https://phabricator.wikimedia.org/P46467 and previous config saved to /var/cache/conftool/dbconfig/20230412-091255-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 09:12 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 09:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T333332)', diff saved to https://phabricator.wikimedia.org/P46466 and previous config saved to /var/cache/conftool/dbconfig/20230412-091151-ladsgroup.json
  • 09:11 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 09:11 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 09:07 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 09:06 claime: Migrating cxserver to mw-api-int on kubernetes - T334204
  • 09:04 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46464 and previous config saved to /var/cache/conftool/dbconfig/20230412-090032-ladsgroup.json
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46463 and previous config saved to /var/cache/conftool/dbconfig/20230412-085816-ladsgroup.json
  • 08:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P46462 and previous config saved to /var/cache/conftool/dbconfig/20230412-085644-ladsgroup.json
  • 08:51 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 08:51 aqu@deploy2002: Finished deploy [airflow-dags/analytics@18ae3be]: Deploy airflow-dags including webrequest load job - Analytics [airflow-dags@18ae3be] (duration: 00m 12s)
  • 08:50 aqu@deploy2002: Started deploy [airflow-dags/analytics@18ae3be]: Deploy airflow-dags including webrequest load job - Analytics [airflow-dags@18ae3be]
  • 08:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P46460 and previous config saved to /var/cache/conftool/dbconfig/20230412-084138-ladsgroup.json
  • 08:37 marostegui: dbmaint Deploy schema change on s1 codfw with replication T334536
  • 08:35 aqu@deploy2002: Finished deploy [analytics/refinery@f3389dc] (thin): Deploy analytics_refinery in production thin [analytics/refinery@f3389dc] (duration: 00m 07s)
  • 08:35 aqu@deploy2002: Started deploy [analytics/refinery@f3389dc] (thin): Deploy analytics_refinery in production thin [analytics/refinery@f3389dc]
  • 08:35 moritzm: imported puppet 5.5.22-2+deb12u1 for bookworm-wikimedia component/puppet5 T330495
  • 08:34 aqu@deploy2002: Finished deploy [analytics/refinery@f3389dc]: Deploy analytics_refinery in production [analytics/refinery@f3389dc] (duration: 00m 41s)
  • 08:34 aqu@deploy2002: Started deploy [analytics/refinery@f3389dc]: Deploy analytics_refinery in production [analytics/refinery@f3389dc]
  • 08:33 aqu: About to deploy analytics/refinery in production
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T333332)', diff saved to https://phabricator.wikimedia.org/P46459 and previous config saved to /var/cache/conftool/dbconfig/20230412-082632-ladsgroup.json
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T333332)', diff saved to https://phabricator.wikimedia.org/P46458 and previous config saved to /var/cache/conftool/dbconfig/20230412-082424-ladsgroup.json
  • 08:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T333332)', diff saved to https://phabricator.wikimedia.org/P46457 and previous config saved to /var/cache/conftool/dbconfig/20230412-082400-ladsgroup.json
  • 08:17 hashar@deploy2002: Synchronized wmf-config/CommonSettings-labs.php: [Beta Cluster] Replicate WebResponseSetCookie wgHooks migration here too - T333926 (duration: 05m 51s)
  • 08:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P46456 and previous config saved to /var/cache/conftool/dbconfig/20230412-080854-ladsgroup.json
  • 08:03 marostegui: dbmaint Deploy schema change on s3 codfw with replication enabled (only for testwiki and test2wiki) T334536
  • 08:01 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 100%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46455 and previous config saved to /var/cache/conftool/dbconfig/20230412-075703-root.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46454 and previous config saved to /var/cache/conftool/dbconfig/20230412-075422-root.json
  • 07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P46453 and previous config saved to /var/cache/conftool/dbconfig/20230412-075348-ladsgroup.json
  • 07:45 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 75%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46451 and previous config saved to /var/cache/conftool/dbconfig/20230412-074158-root.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1107 from dbctl T334447', diff saved to https://phabricator.wikimedia.org/P46450 and previous config saved to /var/cache/conftool/dbconfig/20230412-073921-marostegui.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46449 and previous config saved to /var/cache/conftool/dbconfig/20230412-073917-root.json
  • 07:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T333332)', diff saved to https://phabricator.wikimedia.org/P46448 and previous config saved to /var/cache/conftool/dbconfig/20230412-073841-ladsgroup.json
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T333332)', diff saved to https://phabricator.wikimedia.org/P46447 and previous config saved to /var/cache/conftool/dbconfig/20230412-073633-ladsgroup.json
  • 07:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 07:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 07:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 07:36 moritzm: installing python-cryptography security updates
  • 07:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 07:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T333332)', diff saved to https://phabricator.wikimedia.org/P46446 and previous config saved to /var/cache/conftool/dbconfig/20230412-073550-ladsgroup.json
  • 07:30 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 07:28 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46445 and previous config saved to /var/cache/conftool/dbconfig/20230412-072812-root.json
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 50%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46444 and previous config saved to /var/cache/conftool/dbconfig/20230412-072654-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46443 and previous config saved to /var/cache/conftool/dbconfig/20230412-072412-root.json
  • 07:21 moritzm: installing xen security updates
  • 07:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P46442 and previous config saved to /var/cache/conftool/dbconfig/20230412-072044-ladsgroup.json
  • 07:16 marostegui: Drop flaggerevs tables from ptwikisource T332594
  • 07:13 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46441 and previous config saved to /var/cache/conftool/dbconfig/20230412-071307-root.json
  • 07:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 25%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46440 and previous config saved to /var/cache/conftool/dbconfig/20230412-071149-root.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46439 and previous config saved to /var/cache/conftool/dbconfig/20230412-070907-root.json
  • 07:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P46438 and previous config saved to /var/cache/conftool/dbconfig/20230412-070538-ladsgroup.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46437 and previous config saved to /var/cache/conftool/dbconfig/20230412-065802-root.json
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 10%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46436 and previous config saved to /var/cache/conftool/dbconfig/20230412-065644-root.json
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46435 and previous config saved to /var/cache/conftool/dbconfig/20230412-065402-root.json
  • 06:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T333332)', diff saved to https://phabricator.wikimedia.org/P46434 and previous config saved to /var/cache/conftool/dbconfig/20230412-065032-ladsgroup.json
  • 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T333332)', diff saved to https://phabricator.wikimedia.org/P46433 and previous config saved to /var/cache/conftool/dbconfig/20230412-064823-ladsgroup.json
  • 06:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 06:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 06:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T333332)', diff saved to https://phabricator.wikimedia.org/P46432 and previous config saved to /var/cache/conftool/dbconfig/20230412-064800-ladsgroup.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46431 and previous config saved to /var/cache/conftool/dbconfig/20230412-064257-root.json
  • 06:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 5%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46430 and previous config saved to /var/cache/conftool/dbconfig/20230412-064139-root.json
  • 06:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46429 and previous config saved to /var/cache/conftool/dbconfig/20230412-063858-root.json
  • 06:38 vgutierrez: restart haproxy on cp2035 - T334448
  • 06:33 marostegui: Stop mariadb on db1121 to clone db1221 this will generate lag on clouddb replicas for s4 T326669
  • 06:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P46427 and previous config saved to /var/cache/conftool/dbconfig/20230412-063253-ladsgroup.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1121 to clone db1221 T326669', diff saved to https://phabricator.wikimedia.org/P46426 and previous config saved to /var/cache/conftool/dbconfig/20230412-063224-marostegui.json
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46425 and previous config saved to /var/cache/conftool/dbconfig/20230412-062752-root.json
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 4%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46424 and previous config saved to /var/cache/conftool/dbconfig/20230412-062634-root.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46423 and previous config saved to /var/cache/conftool/dbconfig/20230412-062353-root.json
  • 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P46422 and previous config saved to /var/cache/conftool/dbconfig/20230412-061747-ladsgroup.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46421 and previous config saved to /var/cache/conftool/dbconfig/20230412-061248-root.json
  • 06:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 3%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46420 and previous config saved to /var/cache/conftool/dbconfig/20230412-061129-root.json
  • 06:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T333332)', diff saved to https://phabricator.wikimedia.org/P46419 and previous config saved to /var/cache/conftool/dbconfig/20230412-060241-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T333332)', diff saved to https://phabricator.wikimedia.org/P46418 and previous config saved to /var/cache/conftool/dbconfig/20230412-060133-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 06:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T333332)', diff saved to https://phabricator.wikimedia.org/P46417 and previous config saved to /var/cache/conftool/dbconfig/20230412-060109-ladsgroup.json
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46416 and previous config saved to /var/cache/conftool/dbconfig/20230412-055743-root.json
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 2%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46415 and previous config saved to /var/cache/conftool/dbconfig/20230412-055624-root.json
  • 05:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P46414 and previous config saved to /var/cache/conftool/dbconfig/20230412-054603-ladsgroup.json
  • 05:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 to clone db1210 T326669', diff saved to https://phabricator.wikimedia.org/P46412 and previous config saved to /var/cache/conftool/dbconfig/20230412-054258-marostegui.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46411 and previous config saved to /var/cache/conftool/dbconfig/20230412-054238-root.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1218 (re)pooling @ 1%: Pooling db1218 T326669', diff saved to https://phabricator.wikimedia.org/P46410 and previous config saved to /var/cache/conftool/dbconfig/20230412-054120-root.json
  • 05:41 krinkle@deploy2002: Synchronized php-1.41.0-wmf.4/includes/libs/objectcache/: Ie3a2215d33: disable WANCache cool-off feature (duration: 06m 00s)
  • 05:40 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1218 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46409 and previous config saved to /var/cache/conftool/dbconfig/20230412-054024-marostegui.json
  • 05:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P46408 and previous config saved to /var/cache/conftool/dbconfig/20230412-053057-ladsgroup.json
  • 05:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1222 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46407 and previous config saved to /var/cache/conftool/dbconfig/20230412-052733-root.json
  • 05:25 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1222 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46406 and previous config saved to /var/cache/conftool/dbconfig/20230412-052504-marostegui.json
  • 05:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T333332)', diff saved to https://phabricator.wikimedia.org/P46405 and previous config saved to /var/cache/conftool/dbconfig/20230412-051550-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T333332)', diff saved to https://phabricator.wikimedia.org/P46404 and previous config saved to /var/cache/conftool/dbconfig/20230412-051342-ladsgroup.json
  • 05:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 05:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 05:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46403 and previous config saved to /var/cache/conftool/dbconfig/20230412-051319-ladsgroup.json
  • 04:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P46402 and previous config saved to /var/cache/conftool/dbconfig/20230412-045813-ladsgroup.json
  • 04:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P46401 and previous config saved to /var/cache/conftool/dbconfig/20230412-044306-ladsgroup.json
  • 04:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46400 and previous config saved to /var/cache/conftool/dbconfig/20230412-042800-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46399 and previous config saved to /var/cache/conftool/dbconfig/20230412-042552-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 04:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 04:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 04:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46398 and previous config saved to /var/cache/conftool/dbconfig/20230412-042510-ladsgroup.json
  • 04:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P46397 and previous config saved to /var/cache/conftool/dbconfig/20230412-041003-ladsgroup.json
  • 03:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P46396 and previous config saved to /var/cache/conftool/dbconfig/20230412-035457-ladsgroup.json
  • 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46395 and previous config saved to /var/cache/conftool/dbconfig/20230412-033951-ladsgroup.json
  • 03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T333332)', diff saved to https://phabricator.wikimedia.org/P46394 and previous config saved to /var/cache/conftool/dbconfig/20230412-033742-ladsgroup.json
  • 03:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 03:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 03:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T333332)', diff saved to https://phabricator.wikimedia.org/P46393 and previous config saved to /var/cache/conftool/dbconfig/20230412-033719-ladsgroup.json
  • 03:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P46392 and previous config saved to /var/cache/conftool/dbconfig/20230412-032213-ladsgroup.json
  • 03:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143', diff saved to https://phabricator.wikimedia.org/P46391 and previous config saved to /var/cache/conftool/dbconfig/20230412-030707-ladsgroup.json
  • 02:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1143 (T333332)', diff saved to https://phabricator.wikimedia.org/P46390 and previous config saved to /var/cache/conftool/dbconfig/20230412-025200-ladsgroup.json
  • 02:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1143 (T333332)', diff saved to https://phabricator.wikimedia.org/P46389 and previous config saved to /var/cache/conftool/dbconfig/20230412-024952-ladsgroup.json
  • 02:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 02:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 02:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T333332)', diff saved to https://phabricator.wikimedia.org/P46388 and previous config saved to /var/cache/conftool/dbconfig/20230412-024929-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P46387 and previous config saved to /var/cache/conftool/dbconfig/20230412-023422-ladsgroup.json
  • 02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P46386 and previous config saved to /var/cache/conftool/dbconfig/20230412-021916-ladsgroup.json
  • 02:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T333332)', diff saved to https://phabricator.wikimedia.org/P46385 and previous config saved to /var/cache/conftool/dbconfig/20230412-020410-ladsgroup.json
  • 02:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1142 (T333332)', diff saved to https://phabricator.wikimedia.org/P46384 and previous config saved to /var/cache/conftool/dbconfig/20230412-020201-ladsgroup.json
  • 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1142.eqiad.wmnet with reason: Maintenance
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T333332)', diff saved to https://phabricator.wikimedia.org/P46383 and previous config saved to /var/cache/conftool/dbconfig/20230412-020138-ladsgroup.json
  • 01:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P46382 and previous config saved to /var/cache/conftool/dbconfig/20230412-014632-ladsgroup.json
  • 01:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P46381 and previous config saved to /var/cache/conftool/dbconfig/20230412-013126-ladsgroup.json
  • 01:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T333332)', diff saved to https://phabricator.wikimedia.org/P46380 and previous config saved to /var/cache/conftool/dbconfig/20230412-011619-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T333332)', diff saved to https://phabricator.wikimedia.org/P46379 and previous config saved to /var/cache/conftool/dbconfig/20230412-011411-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T333332)', diff saved to https://phabricator.wikimedia.org/P46378 and previous config saved to /var/cache/conftool/dbconfig/20230412-011348-ladsgroup.json
  • 01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T333332)', diff saved to https://phabricator.wikimedia.org/P46377 and previous config saved to /var/cache/conftool/dbconfig/20230412-010832-ladsgroup.json
  • 00:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P46376 and previous config saved to /var/cache/conftool/dbconfig/20230412-005841-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P46375 and previous config saved to /var/cache/conftool/dbconfig/20230412-005325-ladsgroup.json
  • 00:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P46374 and previous config saved to /var/cache/conftool/dbconfig/20230412-004335-ladsgroup.json
  • 00:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P46373 and previous config saved to /var/cache/conftool/dbconfig/20230412-003819-ladsgroup.json
  • 00:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T333332)', diff saved to https://phabricator.wikimedia.org/P46372 and previous config saved to /var/cache/conftool/dbconfig/20230412-002829-ladsgroup.json
  • 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1138 (T333332)', diff saved to https://phabricator.wikimedia.org/P46371 and previous config saved to /var/cache/conftool/dbconfig/20230412-002620-ladsgroup.json
  • 00:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 00:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1138.eqiad.wmnet with reason: Maintenance
  • 00:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46370 and previous config saved to /var/cache/conftool/dbconfig/20230412-002557-ladsgroup.json
  • 00:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T333332)', diff saved to https://phabricator.wikimedia.org/P46369 and previous config saved to /var/cache/conftool/dbconfig/20230412-002312-ladsgroup.json
  • 00:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P46368 and previous config saved to /var/cache/conftool/dbconfig/20230412-001051-ladsgroup.json

2023-04-11

  • 23:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P46367 and previous config saved to /var/cache/conftool/dbconfig/20230411-235544-ladsgroup.json
  • 23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T333332)', diff saved to https://phabricator.wikimedia.org/P46366 and previous config saved to /var/cache/conftool/dbconfig/20230411-235225-ladsgroup.json
  • 23:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 23:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 23:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T333332)', diff saved to https://phabricator.wikimedia.org/P46365 and previous config saved to /var/cache/conftool/dbconfig/20230411-235202-ladsgroup.json
  • 23:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46364 and previous config saved to /var/cache/conftool/dbconfig/20230411-234038-ladsgroup.json
  • 23:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T333332)', diff saved to https://phabricator.wikimedia.org/P46363 and previous config saved to /var/cache/conftool/dbconfig/20230411-233930-ladsgroup.json
  • 23:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 23:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 23:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 23:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P46362 and previous config saved to /var/cache/conftool/dbconfig/20230411-233655-ladsgroup.json
  • 23:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P46361 and previous config saved to /var/cache/conftool/dbconfig/20230411-232149-ladsgroup.json
  • 23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T333332)', diff saved to https://phabricator.wikimedia.org/P46360 and previous config saved to /var/cache/conftool/dbconfig/20230411-230643-ladsgroup.json
  • 22:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T333332)', diff saved to https://phabricator.wikimedia.org/P46359 and previous config saved to /var/cache/conftool/dbconfig/20230411-223732-ladsgroup.json
  • 22:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 22:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 22:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 22:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T333332)', diff saved to https://phabricator.wikimedia.org/P46358 and previous config saved to /var/cache/conftool/dbconfig/20230411-223651-ladsgroup.json
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P46357 and previous config saved to /var/cache/conftool/dbconfig/20230411-222145-ladsgroup.json
  • 22:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P46356 and previous config saved to /var/cache/conftool/dbconfig/20230411-220638-ladsgroup.json
  • 21:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T333332)', diff saved to https://phabricator.wikimedia.org/P46355 and previous config saved to /var/cache/conftool/dbconfig/20230411-215132-ladsgroup.json
  • 21:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T333332)', diff saved to https://phabricator.wikimedia.org/P46354 and previous config saved to /var/cache/conftool/dbconfig/20230411-212053-ladsgroup.json
  • 21:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 20:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46353 and previous config saved to /var/cache/conftool/dbconfig/20230411-205239-ladsgroup.json
  • 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P46352 and previous config saved to /var/cache/conftool/dbconfig/20230411-203733-ladsgroup.json
  • 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P46351 and previous config saved to /var/cache/conftool/dbconfig/20230411-202227-ladsgroup.json
  • 20:19 mforns@deploy2002: Finished deploy [airflow-dags/analytics@fcc4c9b]: (no justification provided) (duration: 00m 11s)
  • 20:19 mforns@deploy2002: Started deploy [airflow-dags/analytics@fcc4c9b]: (no justification provided)
  • 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46350 and previous config saved to /var/cache/conftool/dbconfig/20230411-200720-ladsgroup.json
  • 20:05 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T333332)', diff saved to https://phabricator.wikimedia.org/P46349 and previous config saved to /var/cache/conftool/dbconfig/20230411-193640-ladsgroup.json
  • 19:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T333332)', diff saved to https://phabricator.wikimedia.org/P46348 and previous config saved to /var/cache/conftool/dbconfig/20230411-193628-ladsgroup.json
  • 19:31 ejegg: payments-wiki upgraded from ad6e5801 to 153bdf64
  • 19:29 ejegg: civicrm upgraded from e2fdb4a4 to 0f37f981
  • 19:22 andrew@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirtlocal1003']
  • 19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P46347 and previous config saved to /var/cache/conftool/dbconfig/20230411-192122-ladsgroup.json
  • 19:19 eileen: civicrm upgraded from b573aee4 to e2fdb4a4
  • 19:16 andrew@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1003']
  • 19:16 andrew@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirtlocal1002']
  • 19:10 andrew@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1002']
  • 19:08 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P46346 and previous config saved to /var/cache/conftool/dbconfig/20230411-190616-ladsgroup.json
  • 19:05 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 19:05 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:59 ebysans@deploy2002: Finished deploy [airflow-dags/analytics@d2cd28d]: (no justification provided) (duration: 00m 11s)
  • 18:59 ebysans@deploy2002: Started deploy [airflow-dags/analytics@d2cd28d]: (no justification provided)
  • 18:58 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:57 andrew@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirtlocal1001']
  • 18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T333332)', diff saved to https://phabricator.wikimedia.org/P46345 and previous config saved to /var/cache/conftool/dbconfig/20230411-185110-ladsgroup.json
  • 18:50 andrew@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1001']
  • 18:38 demon@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.4 refs T330210
  • 18:32 zabe@deploy2002: Finished scap: close wowikiquote (T334482) (duration: 06m 46s)
  • 18:25 zabe@deploy2002: Started scap: close wowikiquote (T334482)
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T333332)', diff saved to https://phabricator.wikimedia.org/P46344 and previous config saved to /var/cache/conftool/dbconfig/20230411-182024-ladsgroup.json
  • 18:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 18:20 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bullseye
  • 18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 18:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T333332)', diff saved to https://phabricator.wikimedia.org/P46343 and previous config saved to /var/cache/conftool/dbconfig/20230411-181123-ladsgroup.json
  • 17:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3006.esams.wmnet with OS bullseye
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P46342 and previous config saved to /var/cache/conftool/dbconfig/20230411-175617-ladsgroup.json
  • 17:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3006.esams.wmnet with reason: host reimage
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P46341 and previous config saved to /var/cache/conftool/dbconfig/20230411-174110-ladsgroup.json
  • 17:38 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3006.esams.wmnet with reason: host reimage
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T333332)', diff saved to https://phabricator.wikimedia.org/P46340 and previous config saved to /var/cache/conftool/dbconfig/20230411-172604-ladsgroup.json
  • 17:17 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3006.esams.wmnet with OS bullseye
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T333332)', diff saved to https://phabricator.wikimedia.org/P46339 and previous config saved to /var/cache/conftool/dbconfig/20230411-171600-ladsgroup.json
  • 17:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 17:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T333332)', diff saved to https://phabricator.wikimedia.org/P46338 and previous config saved to /var/cache/conftool/dbconfig/20230411-171537-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P46337 and previous config saved to /var/cache/conftool/dbconfig/20230411-170031-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P46336 and previous config saved to /var/cache/conftool/dbconfig/20230411-164524-ladsgroup.json
  • 16:33 sbassett: Deployed security mitigation update for T333140
  • 16:33 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T333332)', diff saved to https://phabricator.wikimedia.org/P46335 and previous config saved to /var/cache/conftool/dbconfig/20230411-163018-ladsgroup.json
  • 16:30 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 16:29 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:27 mforns@deploy2002: Finished deploy [airflow-dags/analytics@ce3d4d6]: (no justification provided) (duration: 00m 11s)
  • 16:27 mforns@deploy2002: Started deploy [airflow-dags/analytics@ce3d4d6]: (no justification provided)
  • 16:23 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T333332)', diff saved to https://phabricator.wikimedia.org/P46334 and previous config saved to /var/cache/conftool/dbconfig/20230411-162020-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 16:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 16:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T333332)', diff saved to https://phabricator.wikimedia.org/P46333 and previous config saved to /var/cache/conftool/dbconfig/20230411-161956-ladsgroup.json
  • 16:19 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:19 brett: Disable Puppet/PyBal on lvs3006 in preparation for reimaging - T321309
  • 16:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:12 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:11 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 16:09 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:08 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:07 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1132.eqiad.wmnet with reason: More tests are needed before the host can be added to prod
  • 16:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker1132.eqiad.wmnet with reason: More tests are needed before the host can be added to prod
  • 16:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1132.eqiad.wmnet with OS buster
  • 16:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P46332 and previous config saved to /var/cache/conftool/dbconfig/20230411-160450-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P46331 and previous config saved to /var/cache/conftool/dbconfig/20230411-154943-ladsgroup.json
  • 15:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T333332)', diff saved to https://phabricator.wikimedia.org/P46330 and previous config saved to /var/cache/conftool/dbconfig/20230411-153437-ladsgroup.json
  • 15:34 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1132.eqiad.wmnet with reason: host reimage
  • 15:33 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 15:32 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T333332)', diff saved to https://phabricator.wikimedia.org/P46329 and previous config saved to /var/cache/conftool/dbconfig/20230411-152438-ladsgroup.json
  • 15:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 15:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 15:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T333332)', diff saved to https://phabricator.wikimedia.org/P46328 and previous config saved to /var/cache/conftool/dbconfig/20230411-152413-ladsgroup.json
  • 15:21 moritzm: installing xen security updates
  • 15:13 ebysans@deploy2002: Finished deploy [analytics/refinery@f3389dc] (hadoop-test): Update pageview hourly table with referer data field TEST [analytics/refinery@f3389dc] (duration: 01m 28s)
  • 15:11 ebysans@deploy2002: Started deploy [analytics/refinery@f3389dc] (hadoop-test): Update pageview hourly table with referer data field TEST [analytics/refinery@f3389dc]
  • 15:10 ebysans@deploy2002: Finished deploy [analytics/refinery@f3389dc] (thin): Update pageview hourly table with referer data field THIN [analytics/refinery@f3389dc] (duration: 00m 08s)
  • 15:10 ebysans@deploy2002: Started deploy [analytics/refinery@f3389dc] (thin): Update pageview hourly table with referer data field THIN [analytics/refinery@f3389dc]
  • 15:09 ebysans@deploy2002: Finished deploy [analytics/refinery@f3389dc]: Update pageview hourly table with referer data field [analytics/refinery@f3389dc] (duration: 05m 34s)
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P46327 and previous config saved to /var/cache/conftool/dbconfig/20230411-150907-ladsgroup.json
  • 15:03 ebysans@deploy2002: Started deploy [analytics/refinery@f3389dc]: Update pageview hourly table with referer data field [analytics/refinery@f3389dc]
  • 15:01 SandraEbele: deploying analytics refinery to update hive pageview hourly table with referer_data field.
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P46326 and previous config saved to /var/cache/conftool/dbconfig/20230411-145401-ladsgroup.json
  • 14:53 SandraEbele: paused pageview hourly job.
  • 14:51 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:51 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add kafka-logging1005 ipv6 - herron@cumin1001"
  • 14:48 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host an-worker1132.eqiad.wmnet with OS buster
  • 14:47 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add kafka-logging1005 ipv6 - herron@cumin1001"
  • 14:45 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 14:42 moritzm: installing Tomcat security updates
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T333332)', diff saved to https://phabricator.wikimedia.org/P46325 and previous config saved to /var/cache/conftool/dbconfig/20230411-143854-ladsgroup.json
  • 14:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 14:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 14:29 jnuche@deploy2002: Installing scap version "4.49.0" for 590 hosts
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T333332)', diff saved to https://phabricator.wikimedia.org/P46324 and previous config saved to /var/cache/conftool/dbconfig/20230411-142857-ladsgroup.json
  • 14:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 14:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 14:27 jnuche@deploy2002: Installing scap version "4.49.0" for 590 hosts
  • 14:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 14:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T333332)', diff saved to https://phabricator.wikimedia.org/P46323 and previous config saved to /var/cache/conftool/dbconfig/20230411-141944-ladsgroup.json
  • 14:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2448.codfw.wmnet with reason: HW failure
  • 14:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2448.codfw.wmnet with reason: HW failure
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P46321 and previous config saved to /var/cache/conftool/dbconfig/20230411-140438-ladsgroup.json
  • 14:00 claime: Revoking kafka_main-codfw_broker and kafka_main-eqiad_broker puppet CA certs - T319372
  • 13:55 elukey: remove old puppet certificates for kafka main brokers from A:kafka-main - T319372
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P46320 and previous config saved to /var/cache/conftool/dbconfig/20230411-134932-ladsgroup.json
  • 13:46 elukey: powercycle analytics1069, down for some days now, host stuck from the mgmt/serial console
  • 13:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T333332)', diff saved to https://phabricator.wikimedia.org/P46319 and previous config saved to /var/cache/conftool/dbconfig/20230411-133425-ladsgroup.json
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T333332)', diff saved to https://phabricator.wikimedia.org/P46318 and previous config saved to /var/cache/conftool/dbconfig/20230411-132348-ladsgroup.json
  • 13:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T333332)', diff saved to https://phabricator.wikimedia.org/P46317 and previous config saved to /var/cache/conftool/dbconfig/20230411-132324-ladsgroup.json
  • 13:21 taavi@deploy2002: Finished scap: Backport for Deploy Nearby feature on most wikis [2/2] (T334079) (duration: 08m 25s)
  • 13:14 taavi@deploy2002: wmde-fisch and taavi: Backport for Deploy Nearby feature on most wikis [2/2] (T334079) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:13 taavi@deploy2002: Started scap: Backport for Deploy Nearby feature on most wikis [2/2] (T334079)
  • 13:11 taavi@deploy2002: Finished scap: Backport for Deploy Nearby feature on most wikis [1/2] (T334079) (duration: 07m 24s)
  • 13:09 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2003.wikimedia.org with OS bullseye
  • 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P46316 and previous config saved to /var/cache/conftool/dbconfig/20230411-130817-ladsgroup.json
  • 13:05 taavi@deploy2002: taavi and wmde-fisch: Backport for Deploy Nearby feature on most wikis [1/2] (T334079) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:04 taavi@deploy2002: Started scap: Backport for Deploy Nearby feature on most wikis [1/2] (T334079)
  • 12:54 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P46315 and previous config saved to /var/cache/conftool/dbconfig/20230411-125310-ladsgroup.json
  • 12:50 jelto@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2003.wikimedia.org with reason: host reimage
  • 12:38 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T333332)', diff saved to https://phabricator.wikimedia.org/P46314 and previous config saved to /var/cache/conftool/dbconfig/20230411-123803-ladsgroup.json
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T333332)', diff saved to https://phabricator.wikimedia.org/P46313 and previous config saved to /var/cache/conftool/dbconfig/20230411-122735-ladsgroup.json
  • 12:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 12:24 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=mw2448.*.codfw.wmnet
  • 12:24 claime: Setting mw2448.codfw.wmnet to pooled=invalid - T334429
  • 12:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 12:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 12:16 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 12:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46312 and previous config saved to /var/cache/conftool/dbconfig/20230411-115137-root.json
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46311 and previous config saved to /var/cache/conftool/dbconfig/20230411-113631-root.json
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46310 and previous config saved to /var/cache/conftool/dbconfig/20230411-112126-root.json
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P46309 and previous config saved to /var/cache/conftool/dbconfig/20230411-111854-root.json
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46308 and previous config saved to /var/cache/conftool/dbconfig/20230411-110621-root.json
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46307 and previous config saved to /var/cache/conftool/dbconfig/20230411-110349-root.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46306 and previous config saved to /var/cache/conftool/dbconfig/20230411-105116-root.json
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 T334447', diff saved to https://phabricator.wikimedia.org/P46305 and previous config saved to /var/cache/conftool/dbconfig/20230411-105100-marostegui.json
  • 10:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46304 and previous config saved to /var/cache/conftool/dbconfig/20230411-104844-root.json
  • 10:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46303 and previous config saved to /var/cache/conftool/dbconfig/20230411-103611-root.json
  • 10:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 10:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46302 and previous config saved to /var/cache/conftool/dbconfig/20230411-103339-root.json
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46301 and previous config saved to /var/cache/conftool/dbconfig/20230411-102106-root.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46300 and previous config saved to /var/cache/conftool/dbconfig/20230411-101835-root.json
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46298 and previous config saved to /var/cache/conftool/dbconfig/20230411-100330-root.json
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46297 and previous config saved to /var/cache/conftool/dbconfig/20230411-094825-root.json
  • 09:44 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46296 and previous config saved to /var/cache/conftool/dbconfig/20230411-093320-root.json
  • 09:27 Amir1: start of watchlist clean up of a user in wikidatawiki (T328501)
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P46295 and previous config saved to /var/cache/conftool/dbconfig/20230411-092224-root.json
  • 09:20 moritzm: installing nodejs security updates on buster
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46294 and previous config saved to /var/cache/conftool/dbconfig/20230411-091815-root.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46293 and previous config saved to /var/cache/conftool/dbconfig/20230411-090720-root.json
  • 09:04 moritzm: installing pcre2 security updates
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1118 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46292 and previous config saved to /var/cache/conftool/dbconfig/20230411-090310-root.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122 to clone db1222 T326669', diff saved to https://phabricator.wikimedia.org/P46290 and previous config saved to /var/cache/conftool/dbconfig/20230411-085654-marostegui.json
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46289 and previous config saved to /var/cache/conftool/dbconfig/20230411-085215-root.json
  • 08:50 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46288 and previous config saved to /var/cache/conftool/dbconfig/20230411-083710-root.json
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 100%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46287 and previous config saved to /var/cache/conftool/dbconfig/20230411-083339-root.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46286 and previous config saved to /var/cache/conftool/dbconfig/20230411-083106-root.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P46285 and previous config saved to /var/cache/conftool/dbconfig/20230411-082521-root.json
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46284 and previous config saved to /var/cache/conftool/dbconfig/20230411-082205-root.json
  • 08:19 aqu@deploy2002: Finished deploy [analytics/refinery@bed78f6] (hadoop-test): Deploy analytics_refinery including last webrequest load scripts in TEST 3nd try [analytics/refinery@bed78f6] (duration: 01m 25s)
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 75%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46283 and previous config saved to /var/cache/conftool/dbconfig/20230411-081834-root.json
  • 08:18 aqu@deploy2002: Started deploy [analytics/refinery@bed78f6] (hadoop-test): Deploy analytics_refinery including last webrequest load scripts in TEST 3nd try [analytics/refinery@bed78f6]
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46282 and previous config saved to /var/cache/conftool/dbconfig/20230411-081601-root.json
  • 08:15 aqu: About to deploy analytics/refinery (To migrate webrequest load from Oozie to Airflow)
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46281 and previous config saved to /var/cache/conftool/dbconfig/20230411-081016-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46280 and previous config saved to /var/cache/conftool/dbconfig/20230411-080700-root.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 50%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46279 and previous config saved to /var/cache/conftool/dbconfig/20230411-080329-root.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46278 and previous config saved to /var/cache/conftool/dbconfig/20230411-080057-root.json
  • 07:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46277 and previous config saved to /var/cache/conftool/dbconfig/20230411-075511-root.json
  • 07:54 vgutierrez: restart haproxy on cp2033 - T334448
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46276 and previous config saved to /var/cache/conftool/dbconfig/20230411-075155-root.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 25%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46275 and previous config saved to /var/cache/conftool/dbconfig/20230411-074824-root.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46274 and previous config saved to /var/cache/conftool/dbconfig/20230411-074552-root.json
  • 07:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46273 and previous config saved to /var/cache/conftool/dbconfig/20230411-074006-root.json
  • 07:39 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
  • 07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1103.eqiad.wmnet
  • 07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:39 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1103.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:37 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1103.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46272 and previous config saved to /var/cache/conftool/dbconfig/20230411-073651-root.json
  • 07:35 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 10%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46271 and previous config saved to /var/cache/conftool/dbconfig/20230411-073319-root.json
  • 07:30 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1103.eqiad.wmnet
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46270 and previous config saved to /var/cache/conftool/dbconfig/20230411-073047-root.json
  • 07:30 dcausse: restarting blazegraph on wdqs1007 (stuck for 48hours)
  • 07:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46269 and previous config saved to /var/cache/conftool/dbconfig/20230411-072501-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46268 and previous config saved to /var/cache/conftool/dbconfig/20230411-072146-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 5%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46267 and previous config saved to /var/cache/conftool/dbconfig/20230411-071815-root.json
  • 07:18 zabe@deploy2002: Finished scap: Backport for Add blkwiki to wgSitename (T334351) (duration: 08m 08s)
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1103 from dbctl T332293', diff saved to https://phabricator.wikimedia.org/P46266 and previous config saved to /var/cache/conftool/dbconfig/20230411-071647-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46265 and previous config saved to /var/cache/conftool/dbconfig/20230411-071542-root.json
  • 07:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 393731
  • 07:11 zabe@deploy2002: zabe and jhsoby: Backport for Add blkwiki to wgSitename (T334351) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 07:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 393731
  • 07:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 150279
  • 07:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 150279
  • 07:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 35467
  • 07:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 35467
  • 07:10 zabe@deploy2002: Started scap: Backport for Add blkwiki to wgSitename (T334351)
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46264 and previous config saved to /var/cache/conftool/dbconfig/20230411-070956-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1211 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46263 and previous config saved to /var/cache/conftool/dbconfig/20230411-070641-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1211 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46262 and previous config saved to /var/cache/conftool/dbconfig/20230411-070609-marostegui.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 4%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46261 and previous config saved to /var/cache/conftool/dbconfig/20230411-070310-root.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1110 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46260 and previous config saved to /var/cache/conftool/dbconfig/20230411-070037-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1118 T334375', diff saved to https://phabricator.wikimedia.org/P46258 and previous config saved to /var/cache/conftool/dbconfig/20230411-065734-marostegui.json
  • 06:56 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1163 to s1 primary T334375', diff saved to https://phabricator.wikimedia.org/P46257 and previous config saved to /var/cache/conftool/dbconfig/20230411-065639-root.json
  • 06:56 marostegui: Starting s1 eqiad failover from db1118 to db1163 - T334375
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46256 and previous config saved to /var/cache/conftool/dbconfig/20230411-065452-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 3%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46255 and previous config saved to /var/cache/conftool/dbconfig/20230411-064805-root.json
  • 06:43 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46254 and previous config saved to /var/cache/conftool/dbconfig/20230411-063947-root.json
  • 06:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 2%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46252 and previous config saved to /var/cache/conftool/dbconfig/20230411-063300-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46251 and previous config saved to /var/cache/conftool/dbconfig/20230411-062442-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1163 with weight 0 T334375', diff saved to https://phabricator.wikimedia.org/P46250 and previous config saved to /var/cache/conftool/dbconfig/20230411-062127-root.json
  • 06:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: Primary switchover s1 T334375
  • 06:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: Primary switchover s1 T334375
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1209 (re)pooling @ 1%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46249 and previous config saved to /var/cache/conftool/dbconfig/20230411-061755-root.json
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1209 to dbctl T326206', diff saved to https://phabricator.wikimedia.org/P46248 and previous config saved to /var/cache/conftool/dbconfig/20230411-061642-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1110 to clone db1210 T326669', diff saved to https://phabricator.wikimedia.org/P46246 and previous config saved to /var/cache/conftool/dbconfig/20230411-061044-marostegui.json
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1224 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46245 and previous config saved to /var/cache/conftool/dbconfig/20230411-060937-root.json
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1224 to dbctl T326206', diff saved to https://phabricator.wikimedia.org/P46244 and previous config saved to /var/cache/conftool/dbconfig/20230411-060922-marostegui.json
  • 05:45 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Swakiyama out of all services on: 814 hosts
  • 05:45 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Swakiyama out of all services on: 814 hosts
  • 05:44 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Swakiyama out of all services on: 1241 hosts
  • 05:43 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Swakiyama out of all services on: 1241 hosts
  • 04:10 eileen: civicrm upgraded from bc2f5ccc to b573aee4
  • 03:54 mwpresync@deploy2002: Pruned MediaWiki: 1.41.0-wmf.2 (duration: 02m 15s)
  • 03:52 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.4 refs T330210 (duration: 49m 57s)
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.4 refs T330210
  • 00:37 eileen: civicrm upgraded from 001e156a to bc2f5ccc
  • 00:13 eileen: civicrm upgraded from 223f655a to 001e156a

2023-04-10

  • 23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts miscweb1002.eqiad.wmnet
  • 23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:07 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: miscweb1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 23:06 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: miscweb1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 23:00 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 22:55 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts miscweb1002.eqiad.wmnet
  • 22:53 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on miscweb1002.eqiad.wmnet with reason: decom
  • 22:53 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on miscweb1002.eqiad.wmnet with reason: decom
  • 21:53 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3005.esams.wmnet with OS bullseye
  • 21:46 urandom: restarting Cassandra, sessionstore1001-a, to restore native transport settings — T327954
  • 21:36 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3005.esams.wmnet with reason: host reimage
  • 21:33 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
  • 21:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3005.esams.wmnet with reason: host reimage
  • 21:31 urandom: restarting Cassandra, sessionstore1002-a — T327954
  • 21:22 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 21:21 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
  • 21:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3005.esams.wmnet with OS bullseye
  • 21:14 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs3005.esams.wmnet with OS bullseye
  • 21:13 sbassett: Deployed updated security mitigation for T333140
  • 21:10 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 21:08 eevans@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host sessionstore1001.eqiad.wmnet
  • 21:06 urandom: restarting Cassandra, sessionstore1003-a — T327954
  • 21:04 urandom: restarting Cassandra, sessionstore1002-a — T327954
  • 20:57 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 20:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3005.esams.wmnet with reason: host reimage
  • 20:36 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3005.esams.wmnet with reason: host reimage
  • 20:15 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3005.esams.wmnet with OS bullseye
  • 20:09 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:07 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirtlocal1003']
  • 19:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1003']
  • 19:52 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirtlocal1002']
  • 19:52 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1002']
  • 19:51 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirtlocal1001']
  • 19:51 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirtlocal1001']
  • 19:48 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:35 brett: Disable Puppet/PyBal on lvs3005 in preparation for reimaging - T321309
  • 19:25 mutante: mw2488 - scap pull - T334429
  • 19:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs6002.drmrs.wmnet
  • 19:22 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs6002.drmrs.wmnet
  • 19:19 mforns@deploy2002: Finished deploy [airflow-dags/analytics@6d6f1ec]: (no justification provided) (duration: 00m 11s)
  • 19:19 mforns@deploy2002: Started deploy [airflow-dags/analytics@6d6f1ec]: (no justification provided)
  • 19:16 mutante: power-cycling mw2448 - down, no console output T334429
  • 19:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6002.drmrs.wmnet with OS bullseye
  • 18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6002.drmrs.wmnet with reason: host reimage
  • 18:43 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6002.drmrs.wmnet with reason: host reimage
  • 18:34 herron@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:34 herron@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add kafka-logging1004 ipv6 - herron@cumin1001"
  • 18:33 herron@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add kafka-logging1004 ipv6 - herron@cumin1001"
  • 18:31 herron@cumin1001: START - Cookbook sre.dns.netbox
  • 18:22 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs6002.drmrs.wmnet with OS bullseye
  • 18:16 krinkle@deploy2002: Synchronized wmf-config/: (no justification provided) (duration: 587m 34s)
  • 17:29 brett: Disable Puppet/PyBal on lvs6002 in preparation for reimaging - T321309
  • 16:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6001.drmrs.wmnet with OS bullseye
  • 16:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage
  • 16:27 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage
  • 16:05 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs6001.drmrs.wmnet with OS bullseye
  • 15:53 herron: centrallog1002:~# systemctl restart rsyslog
  • 15:46 brett: Disable Puppet/PyBal on lvs6001 in preparation for reimaging - T321309
  • 14:57 sukhe: enable puppet on A:lvs and A:ulsfo to merge 906580
  • 14:52 sukhe: disable puppet on A:lvs and A:ulsfo to merge 906580
  • 14:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46242 and previous config saved to /var/cache/conftool/dbconfig/20230410-141052-root.json
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46241 and previous config saved to /var/cache/conftool/dbconfig/20230410-135547-root.json
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46240 and previous config saved to /var/cache/conftool/dbconfig/20230410-134042-root.json
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46239 and previous config saved to /var/cache/conftool/dbconfig/20230410-132538-root.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46238 and previous config saved to /var/cache/conftool/dbconfig/20230410-131033-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46237 and previous config saved to /var/cache/conftool/dbconfig/20230410-125528-root.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46236 and previous config saved to /var/cache/conftool/dbconfig/20230410-124023-root.json
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 100%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46235 and previous config saved to /var/cache/conftool/dbconfig/20230410-122112-root.json
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 75%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46234 and previous config saved to /var/cache/conftool/dbconfig/20230410-120607-root.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 50%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46233 and previous config saved to /var/cache/conftool/dbconfig/20230410-115102-root.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P46232 and previous config saved to /var/cache/conftool/dbconfig/20230410-114733-root.json
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 25%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46231 and previous config saved to /var/cache/conftool/dbconfig/20230410-113557-root.json
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46230 and previous config saved to /var/cache/conftool/dbconfig/20230410-113228-root.json
  • 11:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1201 to clone db1224 T326669', diff saved to https://phabricator.wikimedia.org/P46228 and previous config saved to /var/cache/conftool/dbconfig/20230410-112524-marostegui.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 10%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46227 and previous config saved to /var/cache/conftool/dbconfig/20230410-112052-root.json
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46226 and previous config saved to /var/cache/conftool/dbconfig/20230410-111723-root.json
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 5%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46225 and previous config saved to /var/cache/conftool/dbconfig/20230410-110548-root.json
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46224 and previous config saved to /var/cache/conftool/dbconfig/20230410-110218-root.json
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 4%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46222 and previous config saved to /var/cache/conftool/dbconfig/20230410-105043-root.json
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46221 and previous config saved to /var/cache/conftool/dbconfig/20230410-104714-root.json
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 3%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46220 and previous config saved to /var/cache/conftool/dbconfig/20230410-103538-root.json
  • 10:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46219 and previous config saved to /var/cache/conftool/dbconfig/20230410-103209-root.json
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 2%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46218 and previous config saved to /var/cache/conftool/dbconfig/20230410-102033-root.json
  • 10:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46217 and previous config saved to /var/cache/conftool/dbconfig/20230410-101704-root.json
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1183 (re)pooling @ 1%: Pooling T334080', diff saved to https://phabricator.wikimedia.org/P46216 and previous config saved to /var/cache/conftool/dbconfig/20230410-100528-root.json
  • 10:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46215 and previous config saved to /var/cache/conftool/dbconfig/20230410-100159-root.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1183 to s5 depooled T334080', diff saved to https://phabricator.wikimedia.org/P46214 and previous config saved to /var/cache/conftool/dbconfig/20230410-095846-marostegui.json
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46213 and previous config saved to /var/cache/conftool/dbconfig/20230410-095530-root.json
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46212 and previous config saved to /var/cache/conftool/dbconfig/20230410-094654-root.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46211 and previous config saved to /var/cache/conftool/dbconfig/20230410-094025-root.json
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1109 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46210 and previous config saved to /var/cache/conftool/dbconfig/20230410-093149-root.json
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46209 and previous config saved to /var/cache/conftool/dbconfig/20230410-092520-root.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46207 and previous config saved to /var/cache/conftool/dbconfig/20230410-091015-root.json
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46206 and previous config saved to /var/cache/conftool/dbconfig/20230410-090141-root.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46205 and previous config saved to /var/cache/conftool/dbconfig/20230410-085511-root.json
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 100%: Pooling', diff saved to https://phabricator.wikimedia.org/P46204 and previous config saved to /var/cache/conftool/dbconfig/20230410-085117-root.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46203 and previous config saved to /var/cache/conftool/dbconfig/20230410-084636-root.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46202 and previous config saved to /var/cache/conftool/dbconfig/20230410-084006-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 75%: Pooling', diff saved to https://phabricator.wikimedia.org/P46201 and previous config saved to /var/cache/conftool/dbconfig/20230410-083613-root.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46200 and previous config saved to /var/cache/conftool/dbconfig/20230410-083131-root.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P46199 and previous config saved to /var/cache/conftool/dbconfig/20230410-082501-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 50%: Pooling', diff saved to https://phabricator.wikimedia.org/P46198 and previous config saved to /var/cache/conftool/dbconfig/20230410-082108-root.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46197 and previous config saved to /var/cache/conftool/dbconfig/20230410-081626-root.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P46196 and previous config saved to /var/cache/conftool/dbconfig/20230410-080956-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 25%: Pooling', diff saved to https://phabricator.wikimedia.org/P46195 and previous config saved to /var/cache/conftool/dbconfig/20230410-080603-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46194 and previous config saved to /var/cache/conftool/dbconfig/20230410-080121-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 100%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46193 and previous config saved to /var/cache/conftool/dbconfig/20230410-080115-root.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P46192 and previous config saved to /var/cache/conftool/dbconfig/20230410-075451-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 10%: Pooling', diff saved to https://phabricator.wikimedia.org/P46191 and previous config saved to /var/cache/conftool/dbconfig/20230410-075058-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46190 and previous config saved to /var/cache/conftool/dbconfig/20230410-074617-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 75%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46189 and previous config saved to /var/cache/conftool/dbconfig/20230410-074610-root.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1161 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46188 and previous config saved to /var/cache/conftool/dbconfig/20230410-073947-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 5%: Pooling', diff saved to https://phabricator.wikimedia.org/P46187 and previous config saved to /var/cache/conftool/dbconfig/20230410-073553-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1163 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46186 and previous config saved to /var/cache/conftool/dbconfig/20230410-073112-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 50%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46185 and previous config saved to /var/cache/conftool/dbconfig/20230410-073105-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1163', diff saved to https://phabricator.wikimedia.org/P46184 and previous config saved to /var/cache/conftool/dbconfig/20230410-072206-marostegui.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 4%: Pooling', diff saved to https://phabricator.wikimedia.org/P46183 and previous config saved to /var/cache/conftool/dbconfig/20230410-072048-root.json
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 T326669', diff saved to https://phabricator.wikimedia.org/P46181 and previous config saved to /var/cache/conftool/dbconfig/20230410-071747-marostegui.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 25%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46180 and previous config saved to /var/cache/conftool/dbconfig/20230410-071600-root.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46179 and previous config saved to /var/cache/conftool/dbconfig/20230410-070948-root.json
  • 07:09 marostegui@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts db1101.eqiad.wmnet
  • 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:09 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1101.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 3%: Pooling', diff saved to https://phabricator.wikimedia.org/P46178 and previous config saved to /var/cache/conftool/dbconfig/20230410-070544-root.json
  • 07:05 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1101.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:03 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 10%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46177 and previous config saved to /var/cache/conftool/dbconfig/20230410-070056-root.json
  • 06:58 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1101.eqiad.wmnet
  • 06:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46176 and previous config saved to /var/cache/conftool/dbconfig/20230410-065443-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1103 T334374', diff saved to https://phabricator.wikimedia.org/P46175 and previous config saved to /var/cache/conftool/dbconfig/20230410-065149-marostegui.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1179 to x1 primary T334374', diff saved to https://phabricator.wikimedia.org/P46174 and previous config saved to /var/cache/conftool/dbconfig/20230410-065047-root.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 2%: Pooling', diff saved to https://phabricator.wikimedia.org/P46173 and previous config saved to /var/cache/conftool/dbconfig/20230410-065039-root.json
  • 06:50 marostegui: Starting x1 eqiad failover from db1103 to db1179 - T334374
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 5%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46172 and previous config saved to /var/cache/conftool/dbconfig/20230410-064551-root.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46171 and previous config saved to /var/cache/conftool/dbconfig/20230410-063939-root.json
  • 06:39 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1179 with weight 0 T334374', diff saved to https://phabricator.wikimedia.org/P46170 and previous config saved to /var/cache/conftool/dbconfig/20230410-063916-root.json
  • 06:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 12 hosts with reason: Primary switchover x1 T334374
  • 06:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 12 hosts with reason: Primary switchover x1 T334374
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1220 (re)pooling @ 1%: Pooling', diff saved to https://phabricator.wikimedia.org/P46169 and previous config saved to /var/cache/conftool/dbconfig/20230410-063534-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1220 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46168 and previous config saved to /var/cache/conftool/dbconfig/20230410-063458-marostegui.json
  • 06:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 4%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46167 and previous config saved to /var/cache/conftool/dbconfig/20230410-063046-root.json
  • 06:24 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46166 and previous config saved to /var/cache/conftool/dbconfig/20230410-062434-root.json
  • 06:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 3%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46165 and previous config saved to /var/cache/conftool/dbconfig/20230410-061541-root.json
  • 06:09 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46164 and previous config saved to /var/cache/conftool/dbconfig/20230410-060929-root.json
  • 06:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 2%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46163 and previous config saved to /var/cache/conftool/dbconfig/20230410-060037-root.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46162 and previous config saved to /var/cache/conftool/dbconfig/20230410-055424-root.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1161 T334080', diff saved to https://phabricator.wikimedia.org/P46160 and previous config saved to /var/cache/conftool/dbconfig/20230410-055005-marostegui.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1207 (re)pooling @ 1%: Pooling T326669', diff saved to https://phabricator.wikimedia.org/P46159 and previous config saved to /var/cache/conftool/dbconfig/20230410-054532-root.json
  • 05:45 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1207 to dbctl T326669', diff saved to https://phabricator.wikimedia.org/P46158 and previous config saved to /var/cache/conftool/dbconfig/20230410-054504-marostegui.json
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'es1022 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46157 and previous config saved to /var/cache/conftool/dbconfig/20230410-053919-root.json

2023-04-08

  • 17:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1073']

2023-04-07

  • 18:19 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@5c4ebda]: (no justification provided) (duration: 00m 35s)
  • 18:18 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@5c4ebda]: (no justification provided)
  • 17:02 urandom: restart Cassandra, sessionstore1001-a (re-enabling CQL) — T327954
  • 11:05 aqu@deploy2002: Finished deploy [analytics/refinery@e70da10] (hadoop-test): Deploy analytics_refinery including last webrquest load scripts in TEST 2nd try [analytics/refinery@e70da10] (duration: 01m 33s)
  • 11:03 aqu@deploy2002: Started deploy [analytics/refinery@e70da10] (hadoop-test): Deploy analytics_refinery including last webrquest load scripts in TEST 2nd try [analytics/refinery@e70da10]
  • 10:40 aqu@deploy2002: Finished deploy [analytics/refinery@eb4c2b2] (hadoop-test): Deploy analytics_refinery including last webrquest load scripts in TEST [analytics/refinery@eb4c2b2] (duration: 00m 06s)
  • 10:40 aqu@deploy2002: Started deploy [analytics/refinery@eb4c2b2] (hadoop-test): Deploy analytics_refinery including last webrquest load scripts in TEST [analytics/refinery@eb4c2b2]
  • 10:34 aqu: About to deploy analytics/refinery in test cluster
  • 09:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sonicmgmt - ayounsi@cumin1001"
  • 09:22 ayounsi@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sonicmgmt - ayounsi@cumin1001"
  • 09:20 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 01:17 urandom: rebooting sessionstore1001 — T327954
  • 01:10 urandom: rebooting sessionstore1001 — T327954
  • 01:02 urandom: rebooting sessionstore1001 — T327954
  • 00:39 urandom: rebooting sessionstore1001 — T327954

2023-04-06

  • 22:05 ejegg: SmashPig upgraded from 7c19151f to 24d700f4
  • 22:04 ejegg: payments-wiki upgraded from 75b068a1 to 0f15a101
  • 21:52 sbassett: Deployed updated mitigation for T333140
  • 21:19 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:18 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T333332)', diff saved to https://phabricator.wikimedia.org/P46154 and previous config saved to /var/cache/conftool/dbconfig/20230406-211054-ladsgroup.json
  • 21:05 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:04 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:02 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:02 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:00 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:59 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:57 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P46153 and previous config saved to /var/cache/conftool/dbconfig/20230406-205548-ladsgroup.json
  • 20:53 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:50 eevans@cumin1001: conftool action : set/pooled=yes; selector: name=ms-fe1014.eqiad.wmnet
  • 20:49 eevans@cumin1001: conftool action : set/pooled=yes; selector: name=ms-fe1013.eqiad.wmnet
  • 20:49 eevans@cumin1001: conftool action : set/weight=40; selector: name=ms-fe1014.eqiad.wmnet
  • 20:49 eevans@cumin1001: conftool action : set/weight=40; selector: name=ms-fe1013.eqiad.wmnet
  • 20:45 eevans@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 20:45 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "remove info for new ssw as need to set back to planned to make homer happy - cmooney@cumin1001 - T322937"
  • 20:43 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "remove info for new ssw as need to set back to planned to make homer happy - cmooney@cumin1001 - T322937"
  • 20:41 eevans@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P46152 and previous config saved to /var/cache/conftool/dbconfig/20230406-204041-ladsgroup.json
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T333332)', diff saved to https://phabricator.wikimedia.org/P46151 and previous config saved to /var/cache/conftool/dbconfig/20230406-202535-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T333332)', diff saved to https://phabricator.wikimedia.org/P46150 and previous config saved to /var/cache/conftool/dbconfig/20230406-202319-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 20:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46149 and previous config saved to /var/cache/conftool/dbconfig/20230406-202256-ladsgroup.json
  • 20:16 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1014.eqiad.wmnet
  • 20:15 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1013.eqiad.wmnet
  • 20:09 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe1014.eqiad.wmnet
  • 20:09 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe1013.eqiad.wmnet
  • 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P46148 and previous config saved to /var/cache/conftool/dbconfig/20230406-200750-ladsgroup.json
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P46147 and previous config saved to /var/cache/conftool/dbconfig/20230406-195243-ladsgroup.json
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46146 and previous config saved to /var/cache/conftool/dbconfig/20230406-193737-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46145 and previous config saved to /var/cache/conftool/dbconfig/20230406-193510-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46144 and previous config saved to /var/cache/conftool/dbconfig/20230406-193447-ladsgroup.json
  • 19:26 mforns@deploy2002: Finished deploy [airflow-dags/analytics@b454afd]: (no justification provided) (duration: 00m 11s)
  • 19:26 mforns@deploy2002: Started deploy [airflow-dags/analytics@b454afd]: (no justification provided)
  • 19:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P46143 and previous config saved to /var/cache/conftool/dbconfig/20230406-191941-ladsgroup.json
  • 19:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P46142 and previous config saved to /var/cache/conftool/dbconfig/20230406-190435-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46141 and previous config saved to /var/cache/conftool/dbconfig/20230406-184929-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46140 and previous config saved to /var/cache/conftool/dbconfig/20230406-184701-ladsgroup.json
  • 18:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 18:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 18:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46139 and previous config saved to /var/cache/conftool/dbconfig/20230406-184638-ladsgroup.json
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P46138 and previous config saved to /var/cache/conftool/dbconfig/20230406-183132-ladsgroup.json
  • 18:18 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3007.esams.wmnet with OS bullseye
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P46137 and previous config saved to /var/cache/conftool/dbconfig/20230406-181625-ladsgroup.json
  • 18:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3007.esams.wmnet with reason: host reimage
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46136 and previous config saved to /var/cache/conftool/dbconfig/20230406-180119-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T333332)', diff saved to https://phabricator.wikimedia.org/P46135 and previous config saved to /var/cache/conftool/dbconfig/20230406-175854-ladsgroup.json
  • 17:58 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3007.esams.wmnet with reason: host reimage
  • 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T333332)', diff saved to https://phabricator.wikimedia.org/P46134 and previous config saved to /var/cache/conftool/dbconfig/20230406-175813-ladsgroup.json
  • 17:49 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 17:49 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 17:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P46133 and previous config saved to /var/cache/conftool/dbconfig/20230406-174306-ladsgroup.json
  • 17:36 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs3007.esams.wmnet with OS bullseye
  • 17:34 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 17:34 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 17:32 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-f1-eqiad down.
  • 17:32 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-f1-eqiad down.
  • 17:32 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-e1-eqiad down.
  • 17:31 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad.mgmt with reason: test on ssw1-e1-eqiad will take ospf on lsw1-e1-eqiad down.
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P46132 and previous config saved to /var/cache/conftool/dbconfig/20230406-172800-ladsgroup.json
  • 17:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts lvs3007.esams.wmnet
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T333332)', diff saved to https://phabricator.wikimedia.org/P46131 and previous config saved to /var/cache/conftool/dbconfig/20230406-171254-ladsgroup.json
  • 17:12 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs3007.esams.wmnet
  • 17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T333332)', diff saved to https://phabricator.wikimedia.org/P46130 and previous config saved to /var/cache/conftool/dbconfig/20230406-171028-ladsgroup.json
  • 17:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 17:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 17:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 17:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T333332)', diff saved to https://phabricator.wikimedia.org/P46129 and previous config saved to /var/cache/conftool/dbconfig/20230406-170928-ladsgroup.json
  • 17:05 aqu@deploy2002: Finished deploy [airflow-dags/analytics@318480e]: Fix for dump_month_of_daily_pageviews dag - Analytics [airflow-dags@318480e] (duration: 00m 14s)
  • 17:05 aqu@deploy2002: Started deploy [airflow-dags/analytics@318480e]: Fix for dump_month_of_daily_pageviews dag - Analytics [airflow-dags@318480e]
  • 16:58 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P46128 and previous config saved to /var/cache/conftool/dbconfig/20230406-165422-ladsgroup.json
  • 16:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs6003.drmrs.wmnet
  • 16:41 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs6003.drmrs.wmnet
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P46127 and previous config saved to /var/cache/conftool/dbconfig/20230406-163916-ladsgroup.json
  • 16:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6003.drmrs.wmnet with OS bullseye
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T333332)', diff saved to https://phabricator.wikimedia.org/P46126 and previous config saved to /var/cache/conftool/dbconfig/20230406-162409-ladsgroup.json
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T333332)', diff saved to https://phabricator.wikimedia.org/P46125 and previous config saved to /var/cache/conftool/dbconfig/20230406-162144-ladsgroup.json
  • 16:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 16:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 16:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T333332)', diff saved to https://phabricator.wikimedia.org/P46124 and previous config saved to /var/cache/conftool/dbconfig/20230406-162120-ladsgroup.json
  • 16:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
  • 16:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P46123 and previous config saved to /var/cache/conftool/dbconfig/20230406-160614-ladsgroup.json
  • 16:05 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 16:05 topranks: Enable BGP EVPN sessions between eqiad row e/f Leaf and Spine devices
  • 15:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs6003.drmrs.wmnet with OS bullseye
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P46122 and previous config saved to /var/cache/conftool/dbconfig/20230406-155108-ladsgroup.json
  • 15:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6003.drmrs.wmnet with OS bullseye
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T333332)', diff saved to https://phabricator.wikimedia.org/P46121 and previous config saved to /var/cache/conftool/dbconfig/20230406-153602-ladsgroup.json
  • 15:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T333332)', diff saved to https://phabricator.wikimedia.org/P46120 and previous config saved to /var/cache/conftool/dbconfig/20230406-153335-ladsgroup.json
  • 15:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 15:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 15:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T333332)', diff saved to https://phabricator.wikimedia.org/P46119 and previous config saved to /var/cache/conftool/dbconfig/20230406-153312-ladsgroup.json
  • 15:28 ladsgroup@deploy2002: Finished scap: Backport for Disable writes on group2 for DT backend (duration: 08m 11s)
  • 15:21 ladsgroup@deploy2002: ladsgroup: Backport for Disable writes on group2 for DT backend synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 15:20 fab@deploy2002: Finished deploy [airflow-dags/research@2192f15]: (no justification provided) (duration: 00m 11s)
  • 15:20 fab@deploy2002: Started deploy [airflow-dags/research@2192f15]: (no justification provided)
  • 15:20 ladsgroup@deploy2002: Started scap: Backport for Disable writes on group2 for DT backend
  • 15:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P46118 and previous config saved to /var/cache/conftool/dbconfig/20230406-151806-ladsgroup.json
  • 15:18 jgiannelos@deploy2002: Finished deploy [restbase/deploy@8fb20e9]: (no justification provided) (duration: 21m 01s)
  • 15:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P46117 and previous config saved to /var/cache/conftool/dbconfig/20230406-150300-ladsgroup.json
  • 14:57 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs6003.drmrs.wmnet with OS bullseye
  • 14:57 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs6003.drmrs.wmnet with OS bullseye
  • 14:57 jgiannelos@deploy2002: Started deploy [restbase/deploy@8fb20e9]: (no justification provided)
  • 14:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T333332)', diff saved to https://phabricator.wikimedia.org/P46116 and previous config saved to /var/cache/conftool/dbconfig/20230406-144753-ladsgroup.json
  • 14:46 ladsgroup@deploy2002: Finished scap: Backport for Disable DT backend on enwiki (duration: 07m 14s)
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T333332)', diff saved to https://phabricator.wikimedia.org/P46115 and previous config saved to /var/cache/conftool/dbconfig/20230406-144437-ladsgroup.json
  • 14:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 14:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 14:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 14:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T333332)', diff saved to https://phabricator.wikimedia.org/P46114 and previous config saved to /var/cache/conftool/dbconfig/20230406-144332-ladsgroup.json
  • 14:42 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Sync data for new ssw1 spine switches in eqiad. - cmooney@cumin1001 - T322937"
  • 14:40 ladsgroup@deploy2002: ladsgroup: Backport for Disable DT backend on enwiki synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:40 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Sync data for new ssw1 spine switches in eqiad. - cmooney@cumin1001 - T322937"
  • 14:39 ladsgroup@deploy2002: Started scap: Backport for Disable DT backend on enwiki
  • 14:39 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T320967)
  • 14:37 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2009*} and A:lvs (T320967)
  • 14:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
  • 14:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
  • 14:33 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
  • 14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T320967)
  • 14:30 hnowlan@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2010*} and A:lvs (T320967)
  • 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P46113 and previous config saved to /var/cache/conftool/dbconfig/20230406-142826-ladsgroup.json
  • 14:21 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:21 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:21 elukey: upgrade istioctl on deploy[12]002 and istio-cni on ml-serve[12]00[1-8] manually - T334068
  • 14:14 elukey: upload new istio-cni and istioctl 1.15.7 debian package versions to bullseye-wikimedia - T334068
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P46112 and previous config saved to /var/cache/conftool/dbconfig/20230406-141319-ladsgroup.json
  • 14:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs6003.drmrs.wmnet with OS bullseye
  • 14:10 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Add session schema config for mobile apps (T331481) (duration: 07m 54s)
  • 14:08 fab@deploy2002: Finished deploy [airflow-dags/research@2192f15]: (no justification provided) (duration: 00m 11s)
  • 14:08 fab@deploy2002: Started deploy [airflow-dags/research@2192f15]: (no justification provided)
  • 14:03 lucaswerkmeister-wmde@deploy2002: mazevedo and lucaswerkmeister-wmde: Backport for Add session schema config for mobile apps (T331481) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 14:02 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Add session schema config for mobile apps (T331481)
  • 14:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts lvs6003.drmrs.wmnet
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T333332)', diff saved to https://phabricator.wikimedia.org/P46111 and previous config saved to /var/cache/conftool/dbconfig/20230406-135813-ladsgroup.json
  • 13:56 urandom: rebooting sessionstore1001 — T327954
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T333332)', diff saved to https://phabricator.wikimedia.org/P46110 and previous config saved to /var/cache/conftool/dbconfig/20230406-135604-ladsgroup.json
  • 13:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T333332)', diff saved to https://phabricator.wikimedia.org/P46109 and previous config saved to /var/cache/conftool/dbconfig/20230406-135541-ladsgroup.json
  • 13:51 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs6003.drmrs.wmnet
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P46108 and previous config saved to /var/cache/conftool/dbconfig/20230406-134035-ladsgroup.json
  • 13:40 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 13:34 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
  • 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P46106 and previous config saved to /var/cache/conftool/dbconfig/20230406-132528-ladsgroup.json
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T333332)', diff saved to https://phabricator.wikimedia.org/P46104 and previous config saved to /var/cache/conftool/dbconfig/20230406-131022-ladsgroup.json
  • 13:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T333332)', diff saved to https://phabricator.wikimedia.org/P46103 and previous config saved to /var/cache/conftool/dbconfig/20230406-130812-ladsgroup.json
  • 13:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T333332)', diff saved to https://phabricator.wikimedia.org/P46102 and previous config saved to /var/cache/conftool/dbconfig/20230406-130749-ladsgroup.json
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P46101 and previous config saved to /var/cache/conftool/dbconfig/20230406-125242-ladsgroup.json
  • 12:50 godog: import grafana 9.4 T317887
  • 12:41 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 12:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P46100 and previous config saved to /var/cache/conftool/dbconfig/20230406-123735-ladsgroup.json
  • 12:26 dcausse: restarting blazegraph on wdqs1012 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 12:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T333332)', diff saved to https://phabricator.wikimedia.org/P46099 and previous config saved to /var/cache/conftool/dbconfig/20230406-122229-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T333332)', diff saved to https://phabricator.wikimedia.org/P46098 and previous config saved to /var/cache/conftool/dbconfig/20230406-122018-ladsgroup.json
  • 12:20 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 12:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T333332)', diff saved to https://phabricator.wikimedia.org/P46097 and previous config saved to /var/cache/conftool/dbconfig/20230406-121955-ladsgroup.json
  • 12:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P46096 and previous config saved to /var/cache/conftool/dbconfig/20230406-120448-ladsgroup.json
  • 11:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P46095 and previous config saved to /var/cache/conftool/dbconfig/20230406-114942-ladsgroup.json
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T333332)', diff saved to https://phabricator.wikimedia.org/P46094 and previous config saved to /var/cache/conftool/dbconfig/20230406-113436-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T333332)', diff saved to https://phabricator.wikimedia.org/P46093 and previous config saved to /var/cache/conftool/dbconfig/20230406-113226-ladsgroup.json
  • 11:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T333332)', diff saved to https://phabricator.wikimedia.org/P46092 and previous config saved to /var/cache/conftool/dbconfig/20230406-113203-ladsgroup.json
  • 11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P46091 and previous config saved to /var/cache/conftool/dbconfig/20230406-111657-ladsgroup.json
  • 11:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P46090 and previous config saved to /var/cache/conftool/dbconfig/20230406-110151-ladsgroup.json
  • 10:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T333332)', diff saved to https://phabricator.wikimedia.org/P46089 and previous config saved to /var/cache/conftool/dbconfig/20230406-104644-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T333332)', diff saved to https://phabricator.wikimedia.org/P46088 and previous config saved to /var/cache/conftool/dbconfig/20230406-104435-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46087 and previous config saved to /var/cache/conftool/dbconfig/20230406-104319-ladsgroup.json
  • 10:41 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:41 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:40 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirtlocal1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P46086 and previous config saved to /var/cache/conftool/dbconfig/20230406-102813-ladsgroup.json
  • 10:28 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 10:27 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 10:27 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:26 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:13 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1003.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P46085 and previous config saved to /var/cache/conftool/dbconfig/20230406-101306-ladsgroup.json
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46084 and previous config saved to /var/cache/conftool/dbconfig/20230406-095800-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T333332)', diff saved to https://phabricator.wikimedia.org/P46083 and previous config saved to /var/cache/conftool/dbconfig/20230406-095640-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 09:43 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:42 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:39 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:38 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:38 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 09:30 elukey: kafka main codfw cluster migrated to PKI TLS certs for brokers - T319372
  • 09:22 jelto@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2003.wikimedia.org with OS bullseye
  • 09:19 cgoubert@deploy2002: Finished scap: Backport for jobrunners: Raise memory_limit to match parsoid (T333528) (duration: 07m 11s)
  • 09:13 cgoubert@deploy2002: cgoubert: Backport for jobrunners: Raise memory_limit to match parsoid (T333528) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:12 cgoubert@deploy2002: Started scap: Backport for jobrunners: Raise memory_limit to match parsoid (T333528)
  • 08:40 elukey: powercycle ml-serve2004 - host frozen, racadm getsel shows multi-bit errors in various DIMM slots
  • 08:28 jelto@cumin2002: START - Cookbook sre.hosts.reimage for host gitlab2003.wikimedia.org with OS bullseye
  • 08:09 hashar@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.3 refs T330209
  • 08:08 volans: restarting update-ubuntu-mirror.service on mirror1001 o check if it was a transient erro
  • 07:56 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 07:31 apergos: UTC morning backport and config training window done
  • 07:28 moritzm: installing ghostscript security updates
  • 07:19 kartik@deploy2002: Finished scap: Backport for Enable Section Translation on Kashmiri Wikipedia (T326541) (duration: 09m 31s)
  • 07:16 zabe: zabe@mwmaint2002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Abuse filter maintainer" "Abuse filter maintainers" "Zabe" --reason "per request T334147"
  • 07:11 kartik@deploy2002: kartik: Backport for Enable Section Translation on Kashmiri Wikipedia (T326541) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:09 kartik@deploy2002: Started scap: Backport for Enable Section Translation on Kashmiri Wikipedia (T326541)
  • 02:07 fab@deploy2002: Finished deploy [airflow-dags/research@2192f15]: (no justification provided) (duration: 00m 21s)
  • 02:06 fab@deploy2002: Started deploy [airflow-dags/research@2192f15]: (no justification provided)
  • 00:50 urandom: rebooting sessionstore1001 — T327954
  • 00:19 urandom: rebooting Cassandra on sessionstore1001 — T327954

2023-04-05

  • 23:58 legoktm@deploy2002: Finished scap: Backport for Remove misleading "disable" of Special:Mostlinkedcategories (T310456) (duration: 07m 55s)
  • 23:55 urandom: rebooting Cassandra on sessionstore1001 — T327954
  • 23:52 legoktm@deploy2002: legoktm: Backport for Remove misleading "disable" of Special:Mostlinkedcategories (T310456) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 23:50 legoktm@deploy2002: Started scap: Backport for Remove misleading "disable" of Special:Mostlinkedcategories (T310456)
  • 23:44 legoktm@deploy2002: Finished scap: Backport for Add <link rel="me"> to verify Mastodon account on mediawiki.org (duration: 07m 47s)
  • 23:38 legoktm@deploy2002: legoktm: Backport for Add <link rel="me"> to verify Mastodon account on mediawiki.org synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 23:36 legoktm@deploy2002: Started scap: Backport for Add <link rel="me"> to verify Mastodon account on mediawiki.org
  • 22:36 topranks: enabling lsw1-e1-eqiad port et-0/0/51 to ssw1-e1-eqiad et-0/0/80 T322937
  • 22:33 urandom: rebooting Cassandra on sessionstore1001 — T327954
  • 22:21 urandom: restarting Cassandra on sessionstore1001 to apply (intentionally) unreachable native transport — T327954
  • 22:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5005.eqsin.wmnet with OS bullseye
  • 21:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
  • 21:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
  • 21:31 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:31 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
  • 21:30 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
  • 21:28 cjming: end of UTC late backport window
  • 21:23 cjming@deploy2002: Finished scap: Backport for [mgwiki] Replace the wordmark on Vector 2022 (T334022) (duration: 07m 58s)
  • 21:21 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:16 cjming@deploy2002: superpes and cjming: Backport for [mgwiki] Replace the wordmark on Vector 2022 (T334022) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:16 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5005.eqsin.wmnet with OS bullseye
  • 21:15 cjming@deploy2002: Started scap: Backport for [mgwiki] Replace the wordmark on Vector 2022 (T334022)
  • 21:10 cjming@deploy2002: Finished scap: Backport for Add static mobile United_States page to facilitate synthetic testing of T331681 (T331681) (duration: 10m 06s)
  • 21:10 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:10 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
  • 21:09 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for ssw link addresses in eqiad - cmooney@cumin1001"
  • 21:07 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:02 cjming@deploy2002: cjming and nray: Backport for Add static mobile United_States page to facilitate synthetic testing of T331681 (T331681) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 21:01 cjming: UTC late backport & config window continuing
  • 21:00 cjming@deploy2002: Started scap: Backport for Add static mobile United_States page to facilitate synthetic testing of T331681 (T331681)
  • 20:58 cjming@deploy2002: Finished scap: Backport for Undeploy SimilarEditors from Beta (T331718) (duration: 35m 41s)
  • 20:57 brett: Disable Puppet/PyBal on lvs5005 in preparation for reimaging - T321309
  • 20:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5004.eqsin.wmnet with OS bullseye
  • 20:44 cjming@deploy2002: tsepothoabala and cjming: Backport for Undeploy SimilarEditors from Beta (T331718) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:24 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
  • 20:22 cjming@deploy2002: Started scap: Backport for Undeploy SimilarEditors from Beta (T331718)
  • 20:21 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
  • 20:17 mforns@deploy2002: Finished deploy [airflow-dags/analytics@2192f15]: (no justification provided) (duration: 00m 12s)
  • 20:17 mforns@deploy2002: Started deploy [airflow-dags/analytics@2192f15]: (no justification provided)
  • 20:03 mforns@deploy2002: Finished deploy [analytics/refinery@eb4c2b2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@eb4c2b2] (duration: 01m 34s)
  • 20:01 mforns@deploy2002: Started deploy [analytics/refinery@eb4c2b2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@eb4c2b2]
  • 20:01 mforns@deploy2002: Finished deploy [analytics/refinery@eb4c2b2] (thin): Regular analytics weekly train THIN [analytics/refinery@eb4c2b2] (duration: 00m 08s)
  • 20:01 mforns@deploy2002: Started deploy [analytics/refinery@eb4c2b2] (thin): Regular analytics weekly train THIN [analytics/refinery@eb4c2b2]
  • 20:01 mforns@deploy2002: Finished deploy [analytics/refinery@eb4c2b2]: Regular analytics weekly train [analytics/refinery@eb4c2b2] (duration: 06m 26s)
  • 19:54 mforns@deploy2002: Started deploy [analytics/refinery@eb4c2b2]: Regular analytics weekly train [analytics/refinery@eb4c2b2]
  • 19:52 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5004.eqsin.wmnet with OS bullseye
  • 19:30 brett: Disable Puppet/PyBal on lvs5004 in preparation for reimaging - T321309
  • 19:27 mforns@deploy2002: Finished deploy [analytics/refinery@944a995] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@944a995] (duration: 01m 29s)
  • 19:25 mforns@deploy2002: Started deploy [analytics/refinery@944a995] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@944a995]
  • 19:25 mforns@deploy2002: Finished deploy [analytics/refinery@944a995] (thin): Regular analytics weekly train THIN [analytics/refinery@944a995] (duration: 00m 08s)
  • 19:25 mforns@deploy2002: Started deploy [analytics/refinery@944a995] (thin): Regular analytics weekly train THIN [analytics/refinery@944a995]
  • 19:25 mforns@deploy2002: Finished deploy [analytics/refinery@944a995]: Regular analytics weekly train [analytics/refinery@944a995] (duration: 06m 31s)
  • 19:19 mforns@deploy2002: Started deploy [analytics/refinery@944a995]: Regular analytics weekly train [analytics/refinery@944a995]
  • 19:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4009.ulsfo.wmnet with OS bullseye
  • 18:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4009.ulsfo.wmnet with reason: host reimage
  • 18:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4009.ulsfo.wmnet with reason: host reimage
  • 18:37 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4009.ulsfo.wmnet with OS bullseye
  • 18:37 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs4009.ulsfo.wmnet with OS bullseye
  • 17:50 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4009.ulsfo.wmnet with OS bullseye
  • 17:32 brett: Disable Puppet/PyBal on lvs4009 in preparation for reimaging - T321309
  • 17:28 cjming: deploying labs-only change
  • 17:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bullseye
  • 17:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 17:03 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 16:56 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lists1003.wikimedia.org with reason: Moar CPUs!
  • 16:56 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on lists1003.wikimedia.org with reason: Moar CPUs!
  • 16:54 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 16:54 hnowlan@puppetmaster1001: conftool action : set/weight=10; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
  • 16:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: Depool from primary DC following network maintenance
  • 16:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bullseye
  • 16:47 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs4008.ulsfo.wmnet with OS bullseye
  • 16:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
  • 16:47 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
  • 16:47 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: Depool from primary DC following network maintenance
  • 16:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 16:37 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 16:36 hnowlan@puppetmaster1001: conftool action : set/weight=6; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
  • 16:30 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 16:30 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:20 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bullseye
  • 16:18 hnowlan@puppetmaster1001: conftool action : set/weight=8; selector: service=thumbor,name=thumbor100[1256].eqiad.wmnet
  • 16:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 16:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 16:02 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafka-test1010.eqiad.wmnet with OS bullseye
  • 15:55 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 15:50 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 15:47 brett: Disable Puppet/PyBal on lvs4008 in preparation for reimaging - T321309
  • 15:44 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage
  • 15:42 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 15:42 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 15:41 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1010.eqiad.wmnet with reason: host reimage
  • 15:39 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 15:31 moritzm: restarting FPM on mediawiki canaries to pick up pcre security update
  • 15:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=8; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 15:27 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host kafka-test1010.eqiad.wmnet with OS bullseye
  • 15:25 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:21 moritzm: installing pcre2 security updates on buster
  • 15:21 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=7; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 15:16 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 15:15 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Revert "VisualEditorFeatureUse sampling rate to 1 everywhere" (duration: 07m 42s)
  • 15:14 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:11 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:10 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafka-test1009.eqiad.wmnet with OS bullseye
  • 15:09 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and phuedx: Backport for Revert "VisualEditorFeatureUse sampling rate to 1 everywhere" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 15:09 moritzm: installing nodejs security updates on buster
  • 15:09 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 15:08 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 15:07 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Revert "VisualEditorFeatureUse sampling rate to 1 everywhere"
  • 15:05 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:04 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:03 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:03 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage
  • 14:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1009.eqiad.wmnet with reason: host reimage
  • 14:48 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:48 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:48 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 14:36 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host kafka-test1009.eqiad.wmnet with OS bullseye
  • 14:33 elukey: restart kafka on kafka-main1005 to pick up the new TLS certificate (PKI based) - T319372
  • 14:31 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafka-test1008.eqiad.wmnet with OS bullseye
  • 14:31 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-main1005.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 14:30 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-main1005.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 14:14 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage
  • 14:14 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 14:14 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 14:11 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1008.eqiad.wmnet with reason: host reimage
  • 14:00 elukey: powercycle an-worker1132
  • 13:58 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host kafka-test1008.eqiad.wmnet with OS bullseye
  • 13:57 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1010.eqiad.wmnet
  • 13:54 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 13:54 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 13:53 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 13:53 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 13:52 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1010.eqiad.wmnet
  • 13:52 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1009.eqiad.wmnet
  • 13:52 elukey: restart kafka on kafka-main1004 to pick up the new TLS certificate (PKI based) - T319372
  • 13:49 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-main1004.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 13:48 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-main1004.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 13:48 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1009.eqiad.wmnet
  • 13:46 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for VisualEditorFeatureUse sampling rate to 1 everywhere (T333168) (duration: 14m 47s)
  • 13:33 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and phuedx: Backport for VisualEditorFeatureUse sampling rate to 1 everywhere (T333168) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:31 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for VisualEditorFeatureUse sampling rate to 1 everywhere (T333168)
  • 13:29 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for mediawiki.edit_attempt: Ignore events from PHP MPC (T309985) (duration: 10m 52s)
  • 13:28 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:28 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:27 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46079 and previous config saved to /var/cache/conftool/dbconfig/20230405-132318-root.json
  • 13:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and phuedx: Backport for mediawiki.edit_attempt: Ignore events from PHP MPC (T309985) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:18 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for mediawiki.edit_attempt: Ignore events from PHP MPC (T309985)
  • 13:17 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for GrowthExperiments: enable add link backend in wiki rounds (8,9th) (T308133 T308134) (duration: 08m 00s)
  • 13:16 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:15 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:14 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:10 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde and sgimeno: Backport for GrowthExperiments: enable add link backend in wiki rounds (8,9th) (T308133 T308134) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:09 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for GrowthExperiments: enable add link backend in wiki rounds (8,9th) (T308133 T308134)
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46078 and previous config saved to /var/cache/conftool/dbconfig/20230405-130813-root.json
  • 13:03 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1008.eqiad.wmnet
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46077 and previous config saved to /var/cache/conftool/dbconfig/20230405-130315-root.json
  • 13:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46076 and previous config saved to /var/cache/conftool/dbconfig/20230405-130121-root.json
  • 12:58 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1008.eqiad.wmnet
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46075 and previous config saved to /var/cache/conftool/dbconfig/20230405-125308-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46074 and previous config saved to /var/cache/conftool/dbconfig/20230405-124810-root.json
  • 12:46 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46073 and previous config saved to /var/cache/conftool/dbconfig/20230405-124616-root.json
  • 12:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46072 and previous config saved to /var/cache/conftool/dbconfig/20230405-123804-root.json
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46071 and previous config saved to /var/cache/conftool/dbconfig/20230405-123305-root.json
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46070 and previous config saved to /var/cache/conftool/dbconfig/20230405-123111-root.json
  • 12:27 moritzm: installing xapian-core security updates
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46069 and previous config saved to /var/cache/conftool/dbconfig/20230405-122259-root.json
  • 12:20 samtar@deploy2002: Finished scap: Backport for Remove WikiEditor's Realtime Preview config vars (T327515) (duration: 07m 41s)
  • 12:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46068 and previous config saved to /var/cache/conftool/dbconfig/20230405-121801-root.json
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46067 and previous config saved to /var/cache/conftool/dbconfig/20230405-121606-root.json
  • 12:13 samtar@deploy2002: samwilson and samtar: Backport for Remove WikiEditor's Realtime Preview config vars (T327515) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 12:12 samtar@deploy2002: Started scap: Backport for Remove WikiEditor's Realtime Preview config vars (T327515)
  • 12:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46066 and previous config saved to /var/cache/conftool/dbconfig/20230405-120754-root.json
  • 12:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46065 and previous config saved to /var/cache/conftool/dbconfig/20230405-120256-root.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46064 and previous config saved to /var/cache/conftool/dbconfig/20230405-120101-root.json
  • 11:54 moritzm: installing apache2 security updates on buster
  • 11:53 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P46063 and previous config saved to /var/cache/conftool/dbconfig/20230405-115249-root.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46062 and previous config saved to /var/cache/conftool/dbconfig/20230405-114751-root.json
  • 11:47 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2004.codfw.wmnet with OS bullseye
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46061 and previous config saved to /var/cache/conftool/dbconfig/20230405-114557-root.json
  • 11:45 TheresNoTime: `[samtar@mwmaint2002 ~]$ echo 'https://en.wikipedia.org/robots.txt' | mwscript purgeList.php` T334038
  • 11:40 samtar@deploy2002: Finished scap: Backport for Remove possibly significant whitespace from robots.txt (T334038) (duration: 07m 14s)
  • 11:38 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P46060 and previous config saved to /var/cache/conftool/dbconfig/20230405-113745-root.json
  • 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw1414.eqiad.wmnet
  • 11:34 samtar@deploy2002: legoktm and samtar: Backport for Remove possibly significant whitespace from robots.txt (T334038) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 11:34 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
  • 11:33 samtar@deploy2002: Started scap: Backport for Remove possibly significant whitespace from robots.txt (T334038)
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 4%: Repooling', diff saved to https://phabricator.wikimedia.org/P46059 and previous config saved to /var/cache/conftool/dbconfig/20230405-113246-root.json
  • 11:31 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2004.codfw.wmnet with reason: host reimage
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1120 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46058 and previous config saved to /var/cache/conftool/dbconfig/20230405-113052-root.json
  • 11:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P46057 and previous config saved to /var/cache/conftool/dbconfig/20230405-113031-root.json
  • 11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mw1414.eqiad.wmnet
  • 11:28 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:24 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:23 ladsgroup@deploy2002: Finished scap: Backport for Revert "Revert "Revert "Revert "mwscript: Switch to use run.php"""" (T326800) (duration: 08m 45s)
  • 11:23 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P46056 and previous config saved to /var/cache/conftool/dbconfig/20230405-112240-root.json
  • 11:22 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host testvm2004.codfw.wmnet with OS bullseye
  • 11:17 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 3%: Repooling', diff saved to https://phabricator.wikimedia.org/P46055 and previous config saved to /var/cache/conftool/dbconfig/20230405-111742-root.json
  • 11:17 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:17 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:16 ladsgroup@deploy2002: ladsgroup: Backport for Revert "Revert "Revert "Revert "mwscript: Switch to use run.php"""" (T326800) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 11:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P46054 and previous config saved to /var/cache/conftool/dbconfig/20230405-111527-root.json
  • 11:15 ladsgroup@deploy2002: Started scap: Backport for Revert "Revert "Revert "Revert "mwscript: Switch to use run.php"""" (T326800)
  • 11:14 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:12 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:12 moritzm: installing systemd security updates on buster
  • 11:12 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
  • 11:10 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:07 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1100 with 1% weight', diff saved to https://phabricator.wikimedia.org/P46053 and previous config saved to /var/cache/conftool/dbconfig/20230405-110717-root.json
  • 11:05 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1130 to s5 primary T331302', diff saved to https://phabricator.wikimedia.org/P46052 and previous config saved to /var/cache/conftool/dbconfig/20230405-110530-root.json
  • 11:05 marostegui: Starting s5 eqiad failover from db1100 to db1130 - T331302
  • 11:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 2%: Repooling', diff saved to https://phabricator.wikimedia.org/P46051 and previous config saved to /var/cache/conftool/dbconfig/20230405-110237-root.json
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P46050 and previous config saved to /var/cache/conftool/dbconfig/20230405-110022-root.json
  • 11:00 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 11:00 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 10:59 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 10:59 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:56 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 10:56 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 10:50 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:50 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:50 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:49 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:48 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:48 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:47 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye
  • 10:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1107 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46049 and previous config saved to /var/cache/conftool/dbconfig/20230405-104732-root.json
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P46048 and previous config saved to /var/cache/conftool/dbconfig/20230405-104517-root.json
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1130 with weight 0 T331302', diff saved to https://phabricator.wikimedia.org/P46047 and previous config saved to /var/cache/conftool/dbconfig/20230405-104422-marostegui.json
  • 10:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T331302
  • 10:43 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T331302
  • 10:43 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:41 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:40 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P46046 and previous config saved to /var/cache/conftool/dbconfig/20230405-103012-root.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1120 T326669', diff saved to https://phabricator.wikimedia.org/P46044 and previous config saved to /var/cache/conftool/dbconfig/20230405-102215-marostegui.json
  • 10:20 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
  • 10:17 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 10:17 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 10:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P46043 and previous config saved to /var/cache/conftool/dbconfig/20230405-101507-root.json
  • 10:14 elukey: restart purged on cp5032, cp1082, cp6004, cp1090 - errors after restart of kafka main eqiad brokers
  • 10:12 elukey: restart purged on cp6015 to verify if connection to brokers failed are only temporary or not
  • 10:11 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafka-test1007.eqiad.wmnet with OS bullseye
  • 10:09 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 10:06 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 10:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P46041 and previous config saved to /var/cache/conftool/dbconfig/20230405-100003-root.json
  • 09:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1122', diff saved to https://phabricator.wikimedia.org/P46040 and previous config saved to /var/cache/conftool/dbconfig/20230405-095954-marostegui.json
  • 09:57 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye
  • 09:56 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1162 to s2 primary T334067', diff saved to https://phabricator.wikimedia.org/P46039 and previous config saved to /var/cache/conftool/dbconfig/20230405-095600-root.json
  • 09:55 marostegui: Starting s2 eqiad failover from db1122 to db1162 - T334067
  • 09:54 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage
  • 09:51 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1007.eqiad.wmnet with reason: host reimage
  • 09:42 slyngshede@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host testvm2002.codfw.wmnet with OS bullseye
  • 09:36 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host kafka-test1007.eqiad.wmnet with OS bullseye
  • 09:35 elukey: restart kafka on kafka-main1003 to pick up the new TLS certificate (PKI based) - T319372
  • 09:34 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM kafka-test1007.eqiad.wmnet
  • 09:34 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-main1003.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 09:34 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-main1003.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1162 with weight 0 T334067', diff saved to https://phabricator.wikimedia.org/P46038 and previous config saved to /var/cache/conftool/dbconfig/20230405-093155-marostegui.json
  • 09:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T334067
  • 09:30 elukey@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM kafka-test1007.eqiad.wmnet
  • 09:29 slyngshede@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 09:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T334067
  • 09:26 slyngshede@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2002.codfw.wmnet with reason: host reimage
  • 09:15 slyngshede@cumin1001: START - Cookbook sre.ganeti.reimage for host testvm2002.codfw.wmnet with OS bullseye
  • 08:58 hashar@deploy2002: Synchronized php: group1 wikis to 1.41.0-wmf.3 refs T330209 (duration: 05m 46s)
  • 08:52 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.3 refs T330209
  • 08:39 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1003.eqiad.wmnet,service=thanos-web
  • 08:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2067.codfw.wmnet with OS bullseye
  • 08:27 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kafka-test1006.eqiad.wmnet with OS bullseye
  • 08:25 hashar@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Remove akwiki from CX config (take 2, it was not fully deployed due to a scap lock issue on the spare server) (duration: 06m 06s)
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1107 T326669', diff saved to https://phabricator.wikimedia.org/P46036 and previous config saved to /var/cache/conftool/dbconfig/20230405-082240-root.json
  • 08:09 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-test1006.eqiad.wmnet with reason: host reimage
  • 08:07 elukey: restart kafka on kafka-main1002 to pick up the new TLS certificate (PKI based) - T319372
  • 08:06 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-test1006.eqiad.wmnet with reason: host reimage
  • 08:02 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-main1002.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 08:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-main1002.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 07:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1104.eqiad.wmnet
  • 07:59 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:59 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1104.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:56 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host kafka-test1006.eqiad.wmnet with OS bullseye
  • 07:54 elukey@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kafka-test1006.eqiad.wmnet with OS bullseye
  • 07:54 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1104.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:52 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 07:49 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
  • 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1104.eqiad.wmnet
  • 07:46 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2067.codfw.wmnet with reason: host reimage
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1104 from dbctl T329481', diff saved to https://phabricator.wikimedia.org/P46035 and previous config saved to /var/cache/conftool/dbconfig/20230405-073102-marostegui.json
  • 07:30 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2067.codfw.wmnet with OS bullseye
  • 07:24 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host kafka-test1006.eqiad.wmnet with OS bullseye
  • 07:20 marostegui: Stop mariadb on db1101 T331381
  • 07:11 kartik@deploy2002: Finished scap: Backport for Remove akwiki from CX config (duration: 07m 22s)
  • 07:11 marostegui: Failover m5-master T333377
  • 07:05 kartik@deploy2002: kartik: Backport for Remove akwiki from CX config synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:04 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
  • 07:04 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
  • 07:04 kartik@deploy2002: Started scap: Backport for Remove akwiki from CX config
  • 07:03 marostegui: Failover m3-master T333377
  • 04:17 TimStarling: restarted swift-proxy on ms-fe* T328872

2023-04-04

  • 23:40 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:34 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:25 tstarling@deploy2002: Synchronized src/Profiler.php: re-enable excimer T331882 (duration: 06m 25s)
  • 23:21 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1002.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:21 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:00 jclark@cumin1001: START - Cookbook sre.hosts.provision for host cloudvirtlocal1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:58 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:58 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns cloudvirtlocal - jclark@cumin1001"
  • 22:57 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns cloudvirtlocal - jclark@cumin1001"
  • 22:55 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 22:33 cstone: civicrm upgraded from 4231191f to 223f655a
  • 22:26 mutante: deploying change to block scap execution on inactive deployment server via gerrit:904502 T330756
  • 22:19 ejegg: payments-wiki upgraded from 49a2e104 to 75b068a1
  • 21:39 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts miscweb2002.codfw.wmnet
  • 21:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:39 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: miscweb2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 21:37 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: miscweb2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - dzahn@cumin1001"
  • 21:26 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 21:22 dzahn@cumin1001: START - Cookbook sre.hosts.decommission for hosts miscweb2002.codfw.wmnet
  • 20:56 sbassett: Deployed mitigation for T333140
  • 20:44 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on miscweb2002.codfw.wmnet with reason: decom
  • 20:44 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on miscweb2002.codfw.wmnet with reason: decom
  • 20:44 TheresNoTime: closing UTC late backport window
  • 20:38 samtar@deploy2002: Finished scap: Backport for Clean up history page visual diffs beta feature config (T333448) (duration: 06m 42s)
  • 20:33 samtar@deploy2002: matmarex and samtar: Backport for Clean up history page visual diffs beta feature config (T333448) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:31 samtar@deploy2002: Started scap: Backport for Clean up history page visual diffs beta feature config (T333448)
  • 20:27 samtar@deploy2002: Finished scap: Backport for EditCheck: catch errors from TransactionSquasher (T324733) (duration: 08m 23s)
  • 20:23 inflatador: bking@cumin1001 unban elastic nodes post switch maintenance T331882
  • 20:20 samtar@deploy2002: matmarex and samtar: Backport for EditCheck: catch errors from TransactionSquasher (T324733) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:18 samtar@deploy2002: Started scap: Backport for EditCheck: catch errors from TransactionSquasher (T324733)
  • 20:11 samtar@deploy2002: Finished scap: Backport for Revert "Revert "Enable hidden tag for "Edit Check" project on Wikipedias"" (T324733) (duration: 07m 30s)
  • 20:10 mutante: deploying ATS config change on cp2* for query.wikidata.org
  • 20:06 ryankemper: T331896 Running puppet on wcqs fleet to pickup new miscweb gui_url: `ryankemper@cumin1001:~$ sudo -E cumin -b 2 'wcqs*' 'run-puppet-agent'`
  • 20:05 samtar@deploy2002: matmarex and samtar: Backport for Revert "Revert "Enable hidden tag for "Edit Check" project on Wikipedias"" (T324733) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:03 samtar@deploy2002: Started scap: Backport for Revert "Revert "Enable hidden tag for "Edit Check" project on Wikipedias"" (T324733)
  • 20:03 mutante: running puppet on cp5*, cp4*...
  • 20:00 ryankemper: T331896 Running puppet on wdqs fleet to pickup new miscweb gui_url: `ryankemper@cumin1001:~$ sudo -E cumin -b 6 'wdqs*' 'run-puppet-agent'`
  • 19:58 hashar@deploy2002: Finished deploy [gerrit/gerrit@dbaaa7a]: wm-zuul-status: change pending jobs SUCCESS > INFO | T214068 (duration: 00m 07s)
  • 19:58 hashar@deploy2002: Started deploy [gerrit/gerrit@dbaaa7a]: wm-zuul-status: change pending jobs SUCCESS > INFO | T214068
  • 19:55 mutante: https://query.wikidata.org and WCQS GUIs are switching to new backend VMs on bullseye in codfw T330090 T331896
  • 19:46 hashar@deploy2002: Finished scap: Backport for Replace usages of Hooks::register() (T334005) (duration: 06m 55s)
  • 19:40 hashar@deploy2002: hashar: Backport for Replace usages of Hooks::register() (T334005) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 19:39 hashar@deploy2002: Started scap: Backport for Replace usages of Hooks::register() (T334005)
  • 19:10 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.3 refs T330209
  • 18:05 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 18:05 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 17:22 ladsgroup@deploy2002: Finished scap: Backport for Revert "mergeMessageFileList.php: move code out of file scope." (T333966) (duration: 38m 18s)
  • 17:04 ladsgroup@deploy2002: ladsgroup: Backport for Revert "mergeMessageFileList.php: move code out of file scope." (T333966) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 16:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:55 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:55 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:55 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:44 ladsgroup@deploy2002: Started scap: Backport for Revert "mergeMessageFileList.php: move code out of file scope." (T333966)
  • 16:37 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:37 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 16:17 ladsgroup@deploy2002: Finished scap: Backport for Revert "external store: Depool es4 (cluster26) from writes for maintenance" (T333961) (duration: 07m 31s)
  • 16:11 ladsgroup@deploy2002: ladsgroup: Backport for Revert "external store: Depool es4 (cluster26) from writes for maintenance" (T333961) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 16:10 ladsgroup@deploy2002: Started scap: Backport for Revert "external store: Depool es4 (cluster26) from writes for maintenance" (T333961)
  • 16:07 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1021 for reads', diff saved to https://phabricator.wikimedia.org/P46031 and previous config saved to /var/cache/conftool/dbconfig/20230404-160702-jynus.json
  • 16:01 jynus@cumin1001: dbctl commit (dc=all): 'Repool es1021 for reads (only 10%)', diff saved to https://phabricator.wikimedia.org/P46030 and previous config saved to /var/cache/conftool/dbconfig/20230404-160146-jynus.json
  • 15:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on es1022.eqiad.wmnet with reason: T333961
  • 15:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on es1022.eqiad.wmnet with reason: T333961
  • 15:58 jynus: restart es1021, several connections in a "stuck" state T333961
  • 15:50 dancy@deploy2002: Installation of scap version "4.48.0" completed for 592 hosts
  • 15:49 dancy@deploy2002: Installing scap version "4.48.0" for 592 hosts
  • 15:31 jynus: restart es1021, several connections in a "stuck" state T333961
  • 15:25 jynus@cumin1001: dbctl commit (dc=all): 'Depool es1021 reads', diff saved to https://phabricator.wikimedia.org/P46029 and previous config saved to /var/cache/conftool/dbconfig/20230404-152501-jynus.json
  • 15:23 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:19 jiji@cumin1001: END (FAIL) - Cookbook sre.discovery.datacenter (exit_code=93) pool all active/active services in eqiad: eqiad row C switches upgrade - T331882
  • 15:18 ladsgroup@deploy2002: Finished scap: Backport for external store: Depool es4 (cluster26) from writes for maintenance (T333961) (duration: 11m 30s)
  • 15:16 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1150.eqiad.wmnet with reason: pending s3 reprovisioning
  • 15:16 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1150.eqiad.wmnet with reason: pending s3 reprovisioning
  • 15:12 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:08 ladsgroup@deploy2002: ladsgroup and jynus: Backport for external store: Depool es4 (cluster26) from writes for maintenance (T333961) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:06 ladsgroup@deploy2002: Started scap: Backport for external store: Depool es4 (cluster26) from writes for maintenance (T333961)
  • 14:54 urbanecm: [urbanecm@mwmaint2002 /srv/mediawiki/php]$ mwscript extensions/CentralAuth/maintenance/migrateAccount.php --wiki=metawiki -u 'Translation Notification Bot (T255246)' --auto # T255246
  • 14:43 jiji@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: eqiad row C switches upgrade - T331882
  • 14:39 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 14:39 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 14:38 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 14:38 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 14:38 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 14:37 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 14:36 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 14:36 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 14:28 vgutierrez: switch cp6008 (upload) and cp6016 (text) to use a single UDS socket between haproxy and varnish - T333965
  • 14:21 jynus: stop es1022 for debugging T333961
  • 14:15 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:15 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Use HookContainer to register hooks inside hooks (T333926) (duration: 10m 50s)
  • 14:10 stevemunene@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1018.eqiad.wmnet
  • 14:09 stevemunene@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1013.eqiad.wmnet
  • 14:09 stevemunene@puppetmaster1001: conftool action : set/pooled=yes; selector: name=aqs1012.eqiad.wmnet
  • 14:09 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.debug (exit_code=0) for Netbox circuit ID 33
  • 14:09 ayounsi@cumin1001: START - Cookbook sre.network.debug for Netbox circuit ID 33
  • 14:09 stevemunene@puppetmaster1001: conftool action : set/pooled=yes; selector: name=datahubsearch1003.eqiad.wmnet
  • 14:05 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Use HookContainer to register hooks inside hooks (T333926) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 14:04 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Use HookContainer to register hooks inside hooks (T333926)
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool es1022 T333961', diff saved to https://phabricator.wikimedia.org/P46027 and previous config saved to /var/cache/conftool/dbconfig/20230404-134415-ladsgroup.json
  • 13:42 Emperor: repool thanos-fe1003 re T331882
  • 13:41 Emperor: repool ms-fe1011 re T331882
  • 13:38 steve_munene: leave hdfs safemode T331882
  • 13:38 inflatador: reboot elastic2038 to clear soft lock
  • 13:34 sukhe: run authdns-update for CR 905612, reverting depool of eqiad
  • 13:30 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=thumbor1006.eqiad.wmnet
  • 13:25 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 13:25 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 13:11 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1009.eqiad.wmnet
  • 13:11 XioNoX: asw2-c-eqiad> request system reboot all-members - T331882
  • 13:10 urbanecm@deploy2002: Finished scap: Backport for ckbwiktionary: Add logo (T331831) (duration: 07m 00s)
  • 13:05 akosiaris@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all active/active services in eqiad: eqiad row C switches upgrade - T331882
  • 13:03 urbanecm@deploy2002: Started scap: Backport for ckbwiktionary: Add logo (T331831)
  • 13:02 ayounsi@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 227 hosts with reason: eqiad row C upgrade
  • 12:57 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 227 hosts with reason: eqiad row C upgrade
  • 12:57 steve_munene: putting pdfs into safe mode as part of T331882
  • 12:52 ayounsi@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on 228 hosts with reason: eqiad row C upgrade
  • 12:52 ayounsi@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 228 hosts with reason: eqiad row C upgrade
  • 12:44 akosiaris@cumin1001: START - Cookbook sre.discovery.datacenter depool all active/active services in eqiad: eqiad row C switches upgrade - T331882
  • 12:43 Emperor: depool thanos-fe1003 re T331882
  • 12:38 Emperor: depool ms-fe1011 re T331882
  • 12:32 sukhe: [finished] run authdns-update for CR: 905603 depool eqiad
  • 12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on 38 hosts with reason: Row c switch maint T331882
  • 12:31 sukhe: run authdns-update for CR: 905603 depool eqiad
  • 12:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on 38 hosts with reason: Row c switch maint T331882
  • 12:28 stevemunene@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1018.eqiad.wmnet
  • 12:28 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox
  • 12:28 stevemunene@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1013.eqiad.wmnet
  • 12:28 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox
  • 12:28 stevemunene@puppetmaster1001: conftool action : set/pooled=no; selector: name=aqs1012.eqiad.wmnet
  • 12:28 volans@cumin1001: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling update on A:netbox-canary
  • 12:27 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
  • 12:26 stevemunene@puppetmaster1001: conftool action : set/pooled=no; selector: name=datahubsearch1003.eqiad.wmnet
  • 12:24 TimStarling: I noticed that mw2382 was still talking to mwlog1002. It still had old php-fpm7.4 processes despite the scap. So I manually restarted php-fpm on it.
  • 12:17 tstarling@deploy2002: Synchronized src/Profiler.php: T331882 disable profiling for switch maintenance (duration: 05m 58s)
  • 11:35 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 11:24 moritzm: installing joblib security updates
  • 10:17 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=5; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 09:51 hashar@deploy2002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.41.0-wmf.3" | T330209
  • 09:42 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.3 refs T330209
  • 09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T333332)', diff saved to https://phabricator.wikimedia.org/P46025 and previous config saved to /var/cache/conftool/dbconfig/20230404-091639-ladsgroup.json
  • 09:19 hashar@deploy2002: Pruned MediaWiki: 1.41.0-wmf.1 (duration: 02m 16s)
  • 09:13 hashar@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.3 refs T330209 (duration: 40m 20s)
  • 09:09 moritzm: installing libmicrohttpd security updates
  • 09:07 moritzm: installing libdatetime-timezone-perl updates
  • 09:04 akosiaris@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:04 akosiaris@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:04 akosiaris@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:04 akosiaris@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:03 akosiaris@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:03 akosiaris@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:03 akosiaris@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:03 akosiaris@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 09:03 akosiaris@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:02 akosiaris@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:02 akosiaris@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:02 akosiaris@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 09:02 akosiaris@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:02 akosiaris@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 09:01 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 09:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P46024 and previous config saved to /var/cache/conftool/dbconfig/20230404-090133-ladsgroup.json
  • 09:01 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P46023 and previous config saved to /var/cache/conftool/dbconfig/20230404-085553-ladsgroup.json
  • 08:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
  • 08:53 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 08:46 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:46 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P46022 and previous config saved to /var/cache/conftool/dbconfig/20230404-084627-ladsgroup.json
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P46021 and previous config saved to /var/cache/conftool/dbconfig/20230404-084048-ladsgroup.json
  • 08:35 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
  • 08:35 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 08:32 hashar@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.3 refs T330209
  • 08:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T333332)', diff saved to https://phabricator.wikimedia.org/P46020 and previous config saved to /var/cache/conftool/dbconfig/20230404-083120-ladsgroup.json
  • 08:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T333332)', diff saved to https://phabricator.wikimedia.org/P46019 and previous config saved to /var/cache/conftool/dbconfig/20230404-082911-ladsgroup.json
  • 08:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 08:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 08:28 hashar: Deleting mediawiki/core branch `wmf/branch_cut_pretest` pointing at `430d25d1a1858edfa4a6199dfe1f0eb3743a219a` # T330209
  • 08:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P46017 and previous config saved to /var/cache/conftool/dbconfig/20230404-082543-ladsgroup.json
  • 08:25 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
  • 08:22 godog: upgrade grafana* to grafana 9.3.11 - T333915
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P46016 and previous config saved to /var/cache/conftool/dbconfig/20230404-081039-ladsgroup.json
  • 08:01 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
  • 08:01 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
  • 08:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
  • 08:00 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 07:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1162 T333918', diff saved to https://phabricator.wikimedia.org/P46015 and previous config saved to /var/cache/conftool/dbconfig/20230404-074848-ladsgroup.json
  • 07:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1122 to s2 primary T333918', diff saved to https://phabricator.wikimedia.org/P46014 and previous config saved to /var/cache/conftool/dbconfig/20230404-074656-ladsgroup.json
  • 07:46 Amir1: Starting s2 eqiad failover from db1162 to db1122 - T333918
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2001.codfw.wmnet
  • 07:36 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
  • 07:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2001.codfw.wmnet
  • 07:35 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
  • 07:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
  • 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2002.codfw.wmnet
  • 07:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2002.codfw.wmnet
  • 07:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1122 with weight 0 T333918', diff saved to https://phabricator.wikimedia.org/P46013 and previous config saved to /var/cache/conftool/dbconfig/20230404-072817-ladsgroup.json
  • 07:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T333918
  • 07:27 hashar@deploy2002: Finished deploy [gerrit/gerrit@453b038]: Gerrit plugin update and switching from git-fat to git-lfs (duration: 00m 08s)
  • 07:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T333918
  • 07:27 hashar@deploy2002: Started deploy [gerrit/gerrit@453b038]: Gerrit plugin update and switching from git-fat to git-lfs
  • 07:23 hashar@deploy2002: Finished deploy [gerrit/gerrit@453b038]: Gerrit plugin update and switching from git-fat to git-lfs (duration: 00m 05s)
  • 07:23 hashar@deploy2002: Started deploy [gerrit/gerrit@453b038]: Gerrit plugin update and switching from git-fat to git-lfs
  • 06:09 XioNoX: stage new Junos on asw2-c-eqiad - T331882

2023-04-03

  • 21:53 ryankemper: T331896 `sudo -E cumin -b 4 'wdqs*' 'sudo run-puppet-agent'`
  • 21:42 maryum: undeployed mitigation for T333140
  • 21:25 inflatador: bking@cumin ban cloudelastic1003 from all cloudelastic clusters T331882
  • 21:22 maryum: deployed mitigation for T333140
  • 21:17 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 10 hosts with reason: T331882 eqiad row C maint
  • 21:16 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 10 hosts with reason: T331882 eqiad row C maint
  • 21:12 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs1003.eqiad.wmnet,wdqs[1010,1013-1014].eqiad.wmnet with reason: T331882 eqiad row C maint
  • 21:12 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs1003.eqiad.wmnet,wdqs[1010,1013-1014].eqiad.wmnet with reason: T331882 eqiad row C maint
  • 20:37 kindrobot: close UTC late backport window
  • 20:36 kindrobot@deploy2002: Finished scap: Backport for make "advanced mode" default on ptwikinews mobile (T290812) (duration: 10m 47s)
  • 20:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5006.eqsin.wmnet with OS bullseye
  • 20:26 kindrobot@deploy2002: jdlrobson and kindrobot: Backport for make "advanced mode" default on ptwikinews mobile (T290812) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:25 kindrobot@deploy2002: Started scap: Backport for make "advanced mode" default on ptwikinews mobile (T290812)
  • 20:19 kindrobot@deploy2002: Finished scap: Backport for [refactor] split out Minerva configuration from main config, Disable Vector js/css sharing on pl.wikipedia (T332809) (duration: 12m 05s)
  • 20:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
  • 20:08 kindrobot@deploy2002: kindrobot and jdlrobson: Backport for [refactor] split out Minerva configuration from main config, Disable Vector js/css sharing on pl.wikipedia (T332809) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:07 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
  • 20:07 kindrobot@deploy2002: Started scap: Backport for [refactor] split out Minerva configuration from main config, Disable Vector js/css sharing on pl.wikipedia (T332809)
  • 20:03 kindrobot: start UTC late backport window
  • 19:41 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5006.eqsin.wmnet with OS bullseye
  • 19:38 brett@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host lvs5006.eqsin.wmnet with OS bullseye
  • 19:36 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:35 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 19:09 cwhite: manually upgrade vopsbot on alert2001 to version 0.3.3
  • 18:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
  • 18:55 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
  • 18:30 brett@cumin2002: START - Cookbook sre.hosts.reimage for host lvs5006.eqsin.wmnet with OS bullseye
  • 18:14 brett: Disable Puppet/PyBal on lvs5006 in preparation for reimaging - T321309
  • 16:02 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
  • 15:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin
  • 15:52 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 05m 33s)
  • 15:51 cstone: payments-wiki upgraded from 60d0aed5 to 49a2e104
  • 15:46 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 14s)
  • 15:37 volans: restarted sirenbot (vopsbot) on alert2001 (msg="could not find the topic for this channel stored. Is the bot in the channel?")
  • 15:36 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@04b4841]: (no justification provided) (duration: 00m 12s)
  • 15:36 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@04b4841]: (no justification provided)
  • 15:30 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin
  • 15:30 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
  • 15:27 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
  • 15:26 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
  • 15:12 sukhe: rolling restart of bird.service on doh* and not doh2002
  • 15:07 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
  • 15:07 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
  • 15:05 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@fabc2cf]: Deploy refine webrequest job on analytics_test to fix matching Oozie job (duration: 00m 11s)
  • 15:04 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@fabc2cf]: Deploy refine webrequest job on analytics_test to fix matching Oozie job
  • 14:30 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Investigate service failures from bullseye upgrade
  • 14:30 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-test-worker1001.eqiad.wmnet with reason: Investigate service failures from bullseye upgrade
  • 13:50 claime: Testing deploy server dsh group inclusion - T329857
  • 13:47 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1075.eqiad.wmnet']
  • 13:47 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1074.eqiad.wmnet']
  • 13:46 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 13:45 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 13:44 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 13:44 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 13:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 13:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 13:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 13:42 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 13:35 taavi@deploy2002: Finished scap: Backport for GrowthExperiments: add link backend amends (T308133) (duration: 07m 15s)
  • 13:34 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1073']
  • 13:32 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be1072']
  • 13:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11062
  • 13:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 11062
  • 13:29 taavi@deploy2002: sgimeno and taavi: Backport for GrowthExperiments: add link backend amends (T308133) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:28 taavi@deploy2002: Started scap: Backport for GrowthExperiments: add link backend amends (T308133)
  • 13:25 taavi@deploy2002: Finished scap: Backport for Enable visual enhancements on pages using on huwiki (T333570) (duration: 16m 06s)
  • 13:18 taavi@deploy2002: matmarex and taavi: Backport for Enable visual enhancements on pages using on huwiki (T333570) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:09 taavi@deploy2002: Started scap: Backport for Enable visual enhancements on pages using on huwiki (T333570)
  • 12:55 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:54 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 12:11 jbond@cumin2002: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=netbox
  • 12:02 jbond: testing netbox failover cookbook
  • 12:02 jbond@cumin1001: conftool action : set/pooled=false; selector: name=codfw,dnsdisc=netbox
  • 11:31 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:31 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:31 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:31 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:29 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 11:29 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 11:06 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:04 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 11:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:58 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:35 vgutierrez: Extend the ESI test to text@eqsin, revert https://gerrit.wikimedia.org/r/c/operations/puppet/+/905173/ if this gives any issue - T308799
  • 10:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:26 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:23 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1073.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:23 jclark@cumin1001: START - Cookbook sre.hosts.provision for host ms-be1072.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:20 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 10:19 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['ms-be1072.eqiad.wmnet']
  • 09:19 elukey: move kafka-jumbo1006's kafka broker cert to PKI - T296064
  • 09:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1006.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 09:19 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1006.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 08:54 elukey: move kafka-jumbo1009's kafka broker cert to PKI - T296064
  • 08:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1009.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 08:53 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1009.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 08:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 08:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 08:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 08:31 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 08:31 vgutierrez: rolling upgrade to HAProxy 2.6.12 in A:cp-ulsfo
  • 08:29 elukey: move kafka-main1001's kafka broker to PKI - T319372
  • 08:26 vgutierrez: fetch HAProxy 2.6.12 on thirdparty/haproxy26 for bullseye (apt.wm.o)
  • 08:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-main1001.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 08:23 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-main1001.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 08:03 elukey: move kafka-jumbo1008's kafka broker cert to PKI - T296064
  • 08:03 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1008.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 08:02 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1008.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 07:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1007.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 07:43 elukey: move kafka-jumbo1007's kafka broker cert to PKI - T296064
  • 06:53 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on kafka-jumbo1005.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 06:52 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:15:00 on kafka-jumbo1005.eqiad.wmnet with reason: restart kafka, switch to PKI
  • 06:52 elukey: move kafka-jumbo1005's kafka broker cert to PKI - T296064

2023-04-01

  • 00:13 denisse@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host prometheus5002.eqsin.wmnet with OS bullseye

Other archives

2000s

2010s

2020s