Server Admin Log

From Wikitech
Jump to navigation Jump to search

2023-03-22

2023-03-21

  • 23:46 zabe@deploy2002: Finished scap: Backport for Add messages for Angika Wikipedia (anpwiki) (T332115), Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831) (duration: 30m 08s)
  • 23:35 zabe@deploy2002: zabe: Backport for Add messages for Angika Wikipedia (anpwiki) (T332115), Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 23:15 zabe@deploy2002: Started scap: Backport for Add messages for Angika Wikipedia (anpwiki) (T332115), Add messages for Central Kurdish Wiktionary (ckbwiktionary) (T331831)
  • 23:07 zabe@deploy2002: Finished scap: Revert "dewiki: Allow 'crats to remove sysopship and manage importers" (duration: 07m 10s)
  • 23:00 zabe@deploy2002: Started scap: Revert "dewiki: Allow 'crats to remove sysopship and manage importers"
  • 22:47 ejegg: payments-wiki upgraded from 0fd66b1f to ab0a55a2
  • 22:10 urbanecm@deploy2002: Finished scap: Backport for [Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235) (duration: 07m 15s)
  • 22:04 urbanecm@deploy2002: urbanecm: Backport for [Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:03 urbanecm@deploy2002: Started scap: Backport for [Growth] eswiki: Enable mentorship for 35% newcomers (T332737 T285235)
  • 21:30 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 21:21 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 21:02 AndyRussG: update SmashPig config 6e651fd4 -> 035f602a
  • 20:58 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 20:48 taavi: start T315510 migration script on group2 s7 wikis
  • 20:39 taavi@deploy2002: Finished scap: Backport for Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config (duration: 09m 01s)
  • 20:31 taavi@deploy2002: matmarex and taavi: Backport for Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:30 taavi@deploy2002: Started scap: Backport for Simplify/Fix wgDiscussionToolsEnablePermalinksBackend config
  • 20:20 taavi@deploy2002: Finished scap: Backport for Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing, Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353) (duration: 17m 40s)
  • 20:10 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 20:09 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 20:04 taavi@deploy2002: esanders and taavi and matmarex: Backport for Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing, Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:02 taavi@deploy2002: Started scap: Backport for Enable DiscussionTools_visualenhancements_newsectionlink_enable on labs for testing, Enable wgDiscussionToolsEnablePermalinksBackend on group2 wikis (T315353)
  • 19:52 stevemunene@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 19:44 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 19:43 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 19:41 jhathaway@cumin1001: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host dborch1002.wikimedia.org with OS bullseye
  • 19:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 19:09 dancy@deploy2002: Installation of scap version "4.47.1" completed for 587 hosts
  • 19:07 dancy@deploy2002: Installing scap version "4.47.1" for 587 hosts
  • 19:04 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
  • 19:03 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag (duration: 00m 14s)
  • 19:03 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e7b1d0b]: initial deployment of glent dag
  • 19:01 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
  • 18:52 jhathaway@cumin1001: START - Cookbook sre.ganeti.reimage for host dborch1002.wikimedia.org with OS bullseye
  • 18:38 stevemunene@cumin1001: START - Cookbook sre.ganeti.reimage for host an-test-client1002.eqiad.wmnet with OS bullseye
  • 18:36 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.1 refs T330207
  • 18:00 AndyRussG: update SmashPig config 59a8b2d2 -> 6e651fd
  • 17:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host dborch1002.wikimedia.org
  • 17:40 joal@deploy2002: Finished deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b] (duration: 00m 11s)
  • 17:39 joal@deploy2002: Started deploy [airflow-dags/analytics@e7b1d0b]: Fix analytics HDFSArchiver tasks [airflow-dags/analytics@e7b1d0b]
  • 17:25 stevemunene@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host an-test-client1002.eqiad.wmnet
  • 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:07 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:53 mutante: sudo cumin -b 4 -s 40 'C:role::cache::text' 'run-puppet-agent'
  • 16:50 jbond: copy /usr/bin/prometheus-ipmi-exporter from bullseye to buster
  • 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
  • 16:46 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
  • 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
  • 16:45 jhathaway@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - jhathaway@cumin1001"
  • 16:43 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 16:43 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
  • 16:33 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:30 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:28 jbond: upload prometheus-ipmi-exporter_1.6.1 to bullseye
  • 16:15 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-test-client1002.eqiad.wmnet on all recursors
  • 16:15 stevemunene@cumin1001: START - Cookbook sre.dns.wipe-cache an-test-client1002.eqiad.wmnet on all recursors
  • 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
  • 16:13 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM an-test-client1002.eqiad.wmnet - stevemunene@cumin1001"
  • 16:10 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 16:10 stevemunene@cumin1001: START - Cookbook sre.ganeti.makevm for new host an-test-client1002.eqiad.wmnet
  • 15:57 jynus: running from cumin1001: transfer.py --type=decompress dbprov1003.eqiad.wmnet:/srv/backups/snapshots/latest/snapshot.s5.2023-03-20--04-00-30.tar.gz db1145.eqiad.wmnet:/srv/sqldata.s5
  • 15:53 jhathaway@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1002.wikimedia.org
  • 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
  • 15:53 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
  • 15:53 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 15:52 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
  • 15:52 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
  • 15:52 jhathaway@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:52 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1005.eqiad.wmnet with OS bullseye
  • 15:51 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 15:51 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
  • 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:47 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:42 jbond: stop puppet from deploying this further
  • 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:34 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:32 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
  • 15:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 15:26 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1005.eqiad.wmnet with reason: host reimage
  • 15:26 samtar@deploy2002: Finished scap: Backport for InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521) (duration: 09m 11s)
  • 15:22 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:19 samtar@deploy2002: samtar: Backport for InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 15:17 samtar@deploy2002: Started scap: Backport for InitialiseSettings: Set wgAbuseFilterLocallyDisabledGlobalActions (T332521)
  • 15:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:16 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:10 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
  • 15:10 samtar@deploy2002: Finished scap: Backport for wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609) (duration: 09m 32s)
  • 15:09 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 15:02 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
  • 15:02 samtar@deploy2002: samtar: Backport for wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:02 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 15:00 samtar@deploy2002: Started scap: Backport for wgAbuseFilterConditionLimit: Set default condition limit to 2000 (T309609)
  • 14:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet
  • 14:51 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=maps1005.eqiad.wmnet
  • 14:49 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=kartotherian,name=maps1005.eqiad.wmnet
  • 14:47 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=maps1005.eqiad.wmnet
  • 14:38 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
  • 14:38 hnowlan: disabling puppet on maps* before merging 760619
  • 14:37 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kafka-main1005.eqiad.wmnet with OS bullseye
  • 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:29 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:27 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
  • 14:17 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
  • 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:15 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:14 jnuche@deploy2002: Installing scap version "latest" for 587 hosts
  • 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:11 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:10 urbanecm@deploy2002: Finished scap: Backport for Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443) (duration: 07m 53s)
  • 14:10 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
  • 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:08 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:05 elukey@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
  • 14:02 urbanecm@deploy2002: Started scap: Backport for Growth: Disable GEPersonalizedPraiseEnabled everywhere (T322443)
  • 14:00 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:58 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 13:42 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:40 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:38 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:33 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
  • 13:29 elukey@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts kafka-main1005.eqiad.wmnet
  • 13:28 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:25 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:21 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:16 elukey@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts kafka-main1005.eqiad.wmnet
  • 13:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
  • 13:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, update idrac/bios/nic-firmware
  • 13:05 elukey: move kafka mirror maker instances to PKI migration settings (new truststores) - T319372
  • 11:20 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:09 joal: Unpause mediacounts_load airflow job with start_date set to 2023-03-21T10:00
  • 11:08 joal: Kill mediacounts_load oozie job
  • 11:07 joal: Unpause mediawiki_history_denormalize airflow job
  • 11:06 joal: Kill mediawiki_denormalize oozie job
  • 11:04 joal@deploy2002: Finished deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b] (duration: 00m 11s)
  • 11:04 joal@deploy2002: Started deploy [airflow-dags/analytics@42e862b]: Regular analytics weekly train [airflow-dags/analytics@42e862b]
  • 10:43 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:32 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:24 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9] (duration: 01m 30s)
  • 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@0bb61e9]
  • 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9] (duration: 00m 09s)
  • 10:22 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9] (thin): Regular analytics weekly train THIN [analytics/refinery@0bb61e9]
  • 10:22 joal@deploy2002: Finished deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9] (duration: 07m 48s)
  • 10:14 joal@deploy2002: Started deploy [analytics/refinery@0bb61e9]: Regular analytics weekly train [analytics/refinery@0bb61e9]
  • 09:43 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-main1005.eqiad.wmnet with OS bullseye
  • 09:39 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
  • 09:39 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on kafka-main1005.eqiad.wmnet with reason: Stop kafka, attempt to reimage
  • 09:25 phedenskog@deploy2002: Finished deploy [performance/navtiming@d2b97ad]: (no justification provided) (duration: 00m 06s)
  • 09:25 phedenskog@deploy2002: Started deploy [performance/navtiming@d2b97ad]: (no justification provided)
  • 09:06 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
  • 09:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
  • 08:31 elukey: move purged daemons on cp nodes to a new CA bundle (to allow accepting kafka clients using PKI tls certs) - T319372
  • 06:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13150
  • 06:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13150
  • 03:57 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.26 (duration: 02m 18s)
  • 03:55 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.41.0-wmf.1 refs T330207 (duration: 52m 38s)
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.41.0-wmf.1 refs T330207

2023-03-20

  • 22:00 samtar@deploy2002: Finished scap: Backport for Add languages to Minerva HTML (T331905) (duration: 09m 45s)
  • 21:52 samtar@deploy2002: jdlrobson and samtar: Backport for Add languages to Minerva HTML (T331905) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:50 samtar@deploy2002: Started scap: Backport for Add languages to Minerva HTML (T331905)
  • 21:34 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki shwiki --fix` T332614
  • 21:25 TheresNoTime: closing UTC late backport window, extended
  • 21:22 samtar@deploy2002: Finished scap: Backport for Rename project and project talk namespace for shwiki (T332614) (duration: 12m 22s)
  • 21:11 samtar@deploy2002: samtar and aleksandar: Backport for Rename project and project talk namespace for shwiki (T332614) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:10 samtar@deploy2002: Started scap: Backport for Rename project and project talk namespace for shwiki (T332614)
  • 21:09 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer (duration: 00m 13s)
  • 21:09 ebernhardson@deploy2002: Started deploy [airflow-dags/search@1302ca2]: ensure swift_upload delete_after is an integer
  • 21:09 samtar@deploy2002: Finished scap: Backport for Enable new Vector (2022) "Add topic" button at arwiki (T331313), Enable DiscussionTools usability improvements at arwiki (T329407) (duration: 08m 34s)
  • 21:02 samtar@deploy2002: matmarex and samtar: Backport for Enable new Vector (2022) "Add topic" button at arwiki (T331313), Enable DiscussionTools usability improvements at arwiki (T329407) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:00 TheresNoTime: extending UTC late backport window
  • 21:00 samtar@deploy2002: Started scap: Backport for Enable new Vector (2022) "Add topic" button at arwiki (T331313), Enable DiscussionTools usability improvements at arwiki (T329407)
  • 20:58 kharlan@deploy2002: Finished scap: Backport for TryNewTask: Set an array fallback if TryNewTaskOptOuts is null, PostEdit: Increment the edit-count-for-task-type count (T332319), LevelingUpManager: Handle links/link-recommendation collision (T332309) (duration: 10m 28s)
  • 20:49 kharlan@deploy2002: kharlan: Backport for TryNewTask: Set an array fallback if TryNewTaskOptOuts is null, PostEdit: Increment the edit-count-for-task-type count (T332319), LevelingUpManager: Handle links/link-recommendation collision (T332309) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmn
  • 20:47 kharlan@deploy2002: Started scap: Backport for TryNewTask: Set an array fallback if TryNewTaskOptOuts is null, PostEdit: Increment the edit-count-for-task-type count (T332319), LevelingUpManager: Handle links/link-recommendation collision (T332309)
  • 19:49 mutante: miscweb1003 - manually edit /srv/deployment/iegreview/iegreview-cache/.config and replace tin.eqiad.wmnet with deployment.eqiad.wmnet (which is an alias for deploy2002.codfw.wmnet) T257317 T332623 T331896
  • 19:13 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator (duration: 00m 13s)
  • 19:13 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b16917e]: fix templating in SimpleSkeinOperator
  • 18:56 ejegg: switched back to new PayPal pending transaction resolver
  • 18:48 akosiaris@deploy2002: Synchronized private/PrivateSettings.php: (no justification provided) (duration: 06m 28s)
  • 18:47 akosiaris: emergency rollover of redis password complete
  • 18:45 akosiaris: re-enable puppet on rdb*, netbox*, ores*, registry*
  • 18:42 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script (duration: 00m 13s)
  • 18:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3aaecb7]: safely quote spark args in skein script
  • 18:42 ejegg: civicrm upgraded from 3d3606f1 to 09373b9d
  • 18:32 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:32 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:32 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:32 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:31 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:30 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:30 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:30 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:30 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:30 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:28 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:28 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:18 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:18 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 18:18 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 18:16 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 18:16 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 18:16 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 18:15 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 18:15 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 18:15 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 18:11 akosiaris@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 18:11 akosiaris@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 18:11 akosiaris@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 18:11 akosiaris@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 18:11 akosiaris@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 18:11 akosiaris@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 18:05 mutante: miscweb1003 - syntax error in httpd config due to "Unknown Authn provider: ldap" - comes from static-rt vhost (T331896)
  • 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1019.eqiad.wmnet
  • 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1019.eqiad.wmnet
  • 17:59 mutante: when applying apache role for the first time on new hosts we still have the same old conflict: miscweb1003 - manual "a2dismod mpm_event" to be able to let puppet enable mod PHP (T196968)
  • 17:57 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
  • 17:57 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on miscweb1003.eqiad.wmnet with reason: maintenance
  • 17:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
  • 17:55 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1019.eqiad.wmnet with reason: reboot for kernel update
  • 17:26 akosiaris: disable puppet on rdb*, netbox*, ores*, registry*
  • 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
  • 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs3006.esams.wmnet with reason: reboot for kernel update
  • 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
  • 17:14 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs2009.codfw.wmnet,lvs1019.eqiad.wmnet with reason: reboot for kernel update
  • 16:43 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:43 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:36 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:32 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:21 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:52 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:56 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 14:56 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:53 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 14:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 14:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2552
  • 14:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2552
  • 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2029 and promote es2027 to es3 master', diff saved to https://phabricator.wikimedia.org/P45896 and previous config saved to /var/cache/conftool/dbconfig/20230320-143951-root.json
  • 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:35 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2008.codfw.wmnet with reason: T326564
  • 14:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2008.codfw.wmnet with reason: T326564
  • 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:17 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:17 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:11 TheresNoTime: close UTC afternoon backport window
  • 14:10 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
  • 14:10 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs1018.eqiad.wmnet with reason: rebooting for kernel updates
  • 14:08 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autopatrol' 'autopatrolled'` T331762
  • 14:06 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:05 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreview' 'autopatrol'` T331762
  • 14:03 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/namespaceDupes.php --wiki slwiki --fix` T332351
  • 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'reviewer' 'patrol'` T331762
  • 14:01 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/migrateUserGroup.php --wiki ptwikisource 'autoreviewer' 'autopatrol'` ("nothing to do") T331762
  • 14:00 TheresNoTime: `[samtar@mwmaint2002 ~]$ mwscript maintenance/emptyUserGroup.php --wiki ptwikisource editor` T331762
  • 13:58 samtar@deploy2002: Finished scap: Backport for Remove meaningless restriction level "none", Remove FlaggedRevs from ptwikisource (T331762) (duration: 09m 44s)
  • 13:50 samtar@deploy2002: thiemowmde and samtar and zoranzoki21: Backport for Remove meaningless restriction level "none", Remove FlaggedRevs from ptwikisource (T331762) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:49 samtar@deploy2002: Started scap: Backport for Remove meaningless restriction level "none", Remove FlaggedRevs from ptwikisource (T331762)
  • 13:47 samtar@deploy2002: Finished scap: Backport for SITENAME change of Serbo-Croatian Wikipedia (T332468) (duration: 09m 26s)
  • 13:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host cuminunpriv1001.eqiad.wmnet with OS bullseye
  • 13:39 samtar@deploy2002: aleksandar and samtar: Backport for SITENAME change of Serbo-Croatian Wikipedia (T332468) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:38 samtar@deploy2002: Started scap: Backport for SITENAME change of Serbo-Croatian Wikipedia (T332468)
  • 13:37 samtar@deploy2002: Finished scap: Backport for kuwiktionary: Add wordmark (T326067), trwikivoyage: Update wordmark (T332439) (duration: 08m 46s)
  • 13:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
  • 13:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2008.codfw.wmnet with reason: rebooting for kernel updates
  • 13:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
  • 13:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs3005.esams.wmnet with reason: rebooting for kernel updates
  • 13:30 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to a6e9843 (duration: 01m 30s)
  • 13:29 samtar@deploy2002: stang and samtar: Backport for kuwiktionary: Add wordmark (T326067), trwikivoyage: Update wordmark (T332439) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
  • 13:29 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (eqiad): Update kartotherian to a6e9843
  • 13:28 samtar@deploy2002: Started scap: Backport for kuwiktionary: Add wordmark (T326067), trwikivoyage: Update wordmark (T332439)
  • 13:28 kharlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:26 awight@deploy2002: Finished deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to a6e9843 (duration: 01m 39s)
  • 13:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cuminunpriv1001.eqiad.wmnet with reason: host reimage
  • 13:24 awight@deploy2002: Started deploy [kartotherian/deploy@906be32] (codfw): Update kartotherian to a6e9843
  • 13:18 samtar@deploy2002: Finished scap: Backport for bewiki: Remove group "autoeditor", "reviewer" (T326012), slwiki: Create Draft namespace (T332351) (duration: 11m 36s)
  • 13:18 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host cuminunpriv1001.eqiad.wmnet with OS bullseye
  • 13:17 kharlan@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:17 kharlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:15 kharlan@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:14 kharlan@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:14 kharlan@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:08 samtar@deploy2002: stang and samtar: Backport for bewiki: Remove group "autoeditor", "reviewer" (T326012), slwiki: Create Draft namespace (T332351) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:06 samtar@deploy2002: Started scap: Backport for bewiki: Remove group "autoeditor", "reviewer" (T326012), slwiki: Create Draft namespace (T332351)
  • 11:35 krinkle@deploy2002: Synchronized php-1.40.0-wmf.27/includes/libs/rdbms/: (no justification provided) (duration: 15m 28s)
  • 09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
  • 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36692
  • 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
  • 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
  • 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 141082
  • 09:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 141082
  • 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58655
  • 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58655
  • 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2552
  • 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2552
  • 09:21 claime: Repooling parse2004 - T332119
  • 08:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'show' for AS: 138915
  • 08:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'show' for AS: 138915
  • 08:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138915
  • 08:00 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138915

2023-03-19

  • 18:27 AndyRussG: update config (to re-enable old PayPal orphan slayer job) 27a5b481 -> 6359222d
  • 16:44 apergos: dumpsdata1005 conversion to primary dumps nfs server done
  • 15:12 AndyRussG: update config (to disable paypal_ec pending transaction resolver) 5dd37c9c -> 3d3606f1
  • 14:18 apergos: work starting now to swap dumpsdata1005 in for primary nfs server, replacing dumpsdata1003 which will become dumps spare host
  • 00:17 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
  • 00:17 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)

2023-03-18

  • 22:47 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
  • 22:47 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 14:26 apergos: rsync of xmldata public dir from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
  • 13:46 apergos: rsync of xmldata private dir from screen as ariel on dumpsdata1004 to dumpsdata1005, no bandwidth cap
  • 07:55 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
  • 07:55 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Systemd units failing, pupper tries to bring them up periodically, spam on IRC
  • 02:57 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 05s)
  • 02:57 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 01:21 urandom: powercycling restbase2025 — T332462
  • 00:06 AndyRussG: Updating civicrm from 5dd37c9c to 3d3606f1

2023-03-17

  • 19:53 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching (duration: 00m 13s)
  • 19:53 ebernhardson@deploy2002: Started deploy [airflow-dags/search@4aeffc6]: improve handling of ores threshold fetching
  • 19:52 bd808: Testing Mastodon account changes. This should post to @wikimedia_sal@botsin.space
  • 19:06 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch (duration: 00m 13s)
  • 19:06 ebernhardson@deploy2002: Started deploy [airflow-dags/search@7d75578]: enable templating of ores threshold fetch
  • 18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
  • 18:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6002.drmrs.wmnet with reason: rebooting for kernel updates
  • 18:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
  • 18:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5005.eqsin.wmnet with reason: rebooting for kernel updates
  • 18:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
  • 18:31 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: rebooting for kernel updates
  • 18:10 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 19s)
  • 18:09 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 18:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
  • 18:04 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2007.codfw.wmnet with reason: rebooting for kernel updates
  • 17:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
  • 17:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs6001.drmrs.wmnet with reason: rebooting for kernel updates
  • 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs5004.eqsin.wmnet
  • 17:31 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs5004.eqsin.wmnet
  • 17:29 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
  • 17:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
  • 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
  • 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs5004.eqsin.wmnet with reason: rebooting for kernel updates
  • 15:50 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 15:29 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 15:24 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 14:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 14:55 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 14:54 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 14:54 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 14:35 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 14:13 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 14:05 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 13:59 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 13:59 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 13:57 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 13:57 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 13:55 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 13:51 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 13:51 bking@cumin1001: START - Cookbook sre.wdqs.restart
  • 13:21 cgoubert@cumin1001: conftool action : set/pooled=inactive; selector: name=parse2004.codfw.wmnet
  • 13:21 claime: Depooling parse2004.codfw.wmnet for broken PSU - T332119
  • 12:06 mutante: systemct-reset failed on gitlab-runner*
  • 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:16 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 11:03 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:02 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:45 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 09:38 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:57 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 07:28 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 05:56 marostegui@cumin1001: dbctl commit (dc=all): 'Add db1106 to dbctl', diff saved to https://phabricator.wikimedia.org/P45887 and previous config saved to /var/cache/conftool/dbconfig/20230317-055643-marostegui.json
  • 02:10 ejegg: civicrm upgraded from 672950d9 to 5dd37c9c
  • 01:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2010.codfw.wmnet
  • 01:05 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs2010.codfw.wmnet
  • 00:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
  • 00:35 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs1020.eqiad.wmnet with reason: rebooting for kernel updates
  • 00:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
  • 00:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs2010.codfw.wmnet with reason: rebooting for kernel updates
  • 00:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates
  • 00:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs5006.eqsin.wmnet with reason: rebooting for kernel updates

2023-03-16

  • 23:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
  • 23:40 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:10:00 on lvs6003.drmrs.wmnet with reason: rebooting for kernel updates
  • 23:33 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
  • 23:33 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs3007.esams.wmnet with reason: rebooting for kernel updates
  • 23:31 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb2003.codfw.wmnet with OS bullseye
  • 23:28 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host miscweb1003.eqiad.wmnet with OS bullseye
  • 23:20 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0 (duration: 00m 19s)
  • 23:20 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e6f0142]: bump discolytics env to 0.7.0
  • 23:18 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
  • 23:15 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb2003.codfw.wmnet with reason: host reimage
  • 23:14 dzahn@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
  • 23:11 dzahn@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on miscweb1003.eqiad.wmnet with reason: host reimage
  • 23:01 dzahn@cumin1001: START - Cookbook sre.ganeti.reimage for host miscweb1003.eqiad.wmnet with OS bullseye
  • 23:00 dzahn@cumin2002: START - Cookbook sre.ganeti.reimage for host miscweb2003.codfw.wmnet with OS bullseye
  • 22:49 dzahn@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb1003.eqiad.wmnet
  • 22:42 dzahn@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host miscweb2003.codfw.wmnet
  • 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb1003.eqiad.wmnet on all recursors
  • 22:39 dzahn@cumin1001: START - Cookbook sre.dns.wipe-cache miscweb1003.eqiad.wmnet on all recursors
  • 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:39 dzahn@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
  • 22:38 dzahn@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb1003.eqiad.wmnet - dzahn@cumin1001"
  • 22:35 dzahn@cumin1001: START - Cookbook sre.dns.netbox
  • 22:35 dzahn@cumin1001: START - Cookbook sre.ganeti.makevm for new host miscweb1003.eqiad.wmnet
  • 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) miscweb2003.codfw.wmnet on all recursors
  • 22:32 dzahn@cumin2002: START - Cookbook sre.dns.wipe-cache miscweb2003.codfw.wmnet on all recursors
  • 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:32 dzahn@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
  • 22:31 dzahn@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM miscweb2003.codfw.wmnet - dzahn@cumin2002"
  • 22:29 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 22:29 dzahn@cumin2002: START - Cookbook sre.ganeti.makevm for new host miscweb2003.codfw.wmnet
  • 22:24 ejegg: civicrm upgraded from 68fa85cf to 672950d9
  • 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:09 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 22:04 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:54 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:47 brennen@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.27 refs T330205
  • 20:36 brennen: 1.40.0-wmf.27 train (T330205): blockers hopefully resolved, rolling to all wikis
  • 20:35 TheresNoTime: close UTC late backport window
  • 20:35 samtar@deploy2002: Finished scap: Backport for Remove sampling from breadCrumbs schema (duration: 08m 18s)
  • 20:28 samtar@deploy2002: samtar and sharvaniharan: Backport for Remove sampling from breadCrumbs schema synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:26 samtar@deploy2002: Started scap: Backport for Remove sampling from breadCrumbs schema
  • 20:21 brennen@deploy2002: Finished scap: Backport for Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160) (duration: 09m 06s)
  • 20:14 brennen@deploy2002: brennen and jforrester: Backport for Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:12 brennen@deploy2002: Started scap: Backport for Revert "Upgrading lcobucci/jwt (4.1.5 => 4.3.0)" (T321160)
  • 19:28 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@a587106]: (no justification provided) (duration: 00m 12s)
  • 19:27 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@a587106]: (no justification provided)
  • 18:41 wfan: enable monthlyconvert for cz
  • 18:40 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided) (duration: 00m 13s)
  • 18:40 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@5c2c701]: (no justification provided)
  • 18:38 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2067.codfw.wmnet
  • 18:37 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 18:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4009.ulsfo.wmnet
  • 18:03 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4009.ulsfo.wmnet
  • 17:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
  • 17:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:25:00 on lvs4009.ulsfo.wmnet with reason: rebooting for kernel updates
  • 17:40 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 17:40 ayounsi@cumin2002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling update on A:netbox-canary
  • 17:40 ayounsi@cumin2002: START - Cookbook sre.netbox.update-extras rolling update on A:netbox-canary
  • 17:36 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 17:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host thanos-fe1004.eqiad.wmnet with OS bullseye
  • 17:21 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host ms-fe1013.eqiad.wmnet with OS bullseye
  • 17:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
  • 17:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 0:15:00 on lvs4008.ulsfo.wmnet with reason: rebooting for kernel updates
  • 16:59 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade. (duration: 00m 24s)
  • 16:58 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@e17ee96]: First deploy after Airflow 2.5.1 upgrade.
  • 16:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs4010.ulsfo.wmnet
  • 16:56 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs4010.ulsfo.wmnet
  • 16:47 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
  • 16:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs4010.ulsfo.wmnet with reason: rebooting for kernel updates
  • 16:31 Emperor: reboot ms-be2067 again to see if the missing drive comes back
  • 16:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 15:39 claime: Pooled new mw hosts mw24[20-51].codfw.wmnet - T326363
  • 15:28 sukhe: enable puppet on R:class = dnsrecursor to merge CR: 898957 [done]
  • 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
  • 15:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
  • 15:19 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
  • 15:15 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
  • 15:15 claime: Pooling new mw hosts mw24[20-51].codfw.wmnet - T326363
  • 15:13 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=videoscaler
  • 15:12 cgoubert@cumin1001: conftool action : set/weight=25; selector: name=mw24[2345].*.codfw.wmnet,cluster=jobrunner
  • 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=api_appserver
  • 15:11 cgoubert@cumin1001: conftool action : set/weight=30; selector: name=mw24[2345].*.codfw.wmnet,cluster=appserver
  • 15:10 sukhe: disable puppet on R:class = dnsrecursor to merge CR: 898957
  • 15:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 32 hosts
  • 15:09 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 32 hosts
  • 14:50 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
  • 14:49 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
  • 14:44 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:31 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:06 urandom: ALTER-ing image_suggestions.suggestion table — T328670
  • 13:35 kostajh: UTC afternoon deploys done
  • 13:34 kharlan@deploy2002: Finished scap: Backport for GrowthExperiments: Remove unused GENewImpactD3Enabled flag (duration: 07m 44s)
  • 13:28 kharlan@deploy2002: kharlan: Backport for GrowthExperiments: Remove unused GENewImpactD3Enabled flag synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:27 kharlan@deploy2002: Started scap: Backport for GrowthExperiments: Remove unused GENewImpactD3Enabled flag
  • 13:15 kharlan@deploy2002: Finished scap: Backport for GrowthExperiments: Enable LevelingUp features on testwiki (T317813) (duration: 09m 48s)
  • 13:07 kharlan@deploy2002: kharlan: Backport for GrowthExperiments: Enable LevelingUp features on testwiki (T317813) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:05 kharlan@deploy2002: Started scap: Backport for GrowthExperiments: Enable LevelingUp features on testwiki (T317813)
  • 12:16 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
  • 12:14 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
  • 12:05 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
  • 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
  • 11:56 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 11:56 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
  • 11:54 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
  • 11:43 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 11:37 hnowlan@puppetmaster1001: conftool action : set/weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
  • 11:32 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqsin
  • 11:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
  • 11:30 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
  • 11:29 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 11:27 hnowlan@puppetmaster1001: conftool action : set/weight=3; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 11:16 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 32 hosts with reason: new_install
  • 11:16 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 5:00:00 on 32 hosts with reason: new_install
  • 11:10 hnowlan@puppetmaster1001: conftool action : set/weight=2; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 11:07 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqsin
  • 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
  • 11:06 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 11:04 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=4; selector: service=thumbor,name=kubernetes101[0123].eqiad.wmnet
  • 10:52 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
  • 10:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
  • 10:42 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:40 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:38 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqsin
  • 10:37 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqsin
  • 10:33 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:33 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: new_install
  • 10:32 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:32 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: new_install
  • 10:32 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
  • 10:31 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
  • 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:31 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:31 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:30 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:29 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:28 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:26 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179 to move it to x1', diff saved to https://phabricator.wikimedia.org/P45885 and previous config saved to /var/cache/conftool/dbconfig/20230316-100945-root.json
  • 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1105.eqiad.wmnet
  • 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 08:49 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1105.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 08:48 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1105.eqiad.wmnet
  • 08:40 kostajh: UTC morning deploys (second round) done
  • 08:40 kharlan@deploy2002: Finished scap: Backport for SuggestedEditSession: Fix handling of post-save data refresh, Leveling up: always set wgGELevelingUpEnabledForUser (T332227) (duration: 12m 30s)
  • 08:29 kharlan@deploy2002: kharlan: Backport for SuggestedEditSession: Fix handling of post-save data refresh, Leveling up: always set wgGELevelingUpEnabledForUser (T332227) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:27 kharlan@deploy2002: Started scap: Backport for SuggestedEditSession: Fix handling of post-save data refresh, Leveling up: always set wgGELevelingUpEnabledForUser (T332227)
  • 08:11 apergos: additional deployments for the UTC morning backport and config training window, running into the next hour, so window re-opened
  • 07:36 tgr_: UTC morning deploys done
  • 07:34 tgr@deploy2002: Finished scap: Backport for Leveling up: Backport recent changes (duration: 08m 13s)
  • 07:28 tgr@deploy2002: tgr: Backport for Leveling up: Backport recent changes synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 07:26 tgr@deploy2002: Started scap: Backport for Leveling up: Backport recent changes
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1105 from dbctl T331874', diff saved to https://phabricator.wikimedia.org/P45883 and previous config saved to /var/cache/conftool/dbconfig/20230316-062307-root.json
  • 06:03 marostegui: Failover m5 from db1106 to db1176 - T332155
  • 05:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch T332155
  • 05:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch T332155
  • 03:29 ejegg: payments-wiki upgraded from 1532b107 to 0fd66b1f

2023-03-15

  • 22:55 tzatziki: Removing 1 file for legal compliance
  • 22:30 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee (T331915) (duration: 00m 55s)
  • 22:29 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee (T331915)
  • 22:29 brennen@deploy2002: Finished deploy [phabricator/deployment@95b4f4b]: revert other assignee (T331915) (duration: 00m 28s)
  • 22:28 brennen@deploy2002: Started deploy [phabricator/deployment@95b4f4b]: revert other assignee (T331915)
  • 22:08 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str (duration: 00m 14s)
  • 22:07 ebernhardson@deploy2002: Started deploy [airflow-dags/search@e17ee96]: max_partition macro now returns str
  • 21:59 brennen: end of phabricator update window (T331915)
  • 21:47 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message (T331915, T155130) (duration: 00m 40s)
  • 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message (T331915, T155130)
  • 21:46 brennen@deploy2002: Finished deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message (T331915, T155130) (duration: 00m 28s)
  • 21:46 brennen@deploy2002: Started deploy [phabricator/deployment@982c225]: follow-up deploy for too large file message (T331915, T155130)
  • 21:26 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 (T331915) (duration: 00m 52s)
  • 21:25 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: deploy latest wmf/stable to phab1004 (T331915)
  • 21:19 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893] (duration: 00m 11s)
  • 21:19 milimetric@deploy2002: Started deploy [airflow-dags/analytics@c316893]: Deploying analytics dags [airflow-dags@c316893]
  • 21:13 mutante: phab* - upgrading PHP packages
  • 21:13 mutante: phabricator - maintenance window starting - expect possible downtime
  • 21:08 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
  • 21:08 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on phab2002.codfw.wmnet,phab1004.eqiad.wmnet with reason: maintenance
  • 20:56 brennen@deploy2002: Finished deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 (T331915) (duration: 00m 31s)
  • 20:55 brennen@deploy2002: Started deploy [phabricator/deployment@9e9b406]: test deploy of current state to phab2002 (T331915)
  • 20:54 brennen: starting phabricator window a touch early with a test deploy to phab2002
  • 20:51 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor (duration: 00m 16s)
  • 20:51 ebernhardson@deploy2002: Started deploy [airflow-dags/search@10fea1f]: correct arguments to RangeHivePartitionSensor
  • 20:48 TheresNoTime: close UTC late backport window
  • 20:48 samtar@deploy2002: Finished scap: Backport for Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407), Clean up DiscussionTools config for mediawikiwiki (duration: 08m 46s)
  • 20:41 samtar@deploy2002: matmarex and samtar and esanders: Backport for Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407), Clean up DiscussionTools config for mediawikiwiki synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:39 samtar@deploy2002: Started scap: Backport for Enable remaining DiscussionTools visual enhancements at cswiki, huwiki (T329407), Clean up DiscussionTools config for mediawikiwiki
  • 20:35 samtar@deploy2002: Finished scap: Backport for Deploy action blocks on itwiki (T330533) (duration: 10m 30s)
  • 20:33 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3002.wikimedia.org with OS bullseye
  • 20:27 samtar@deploy2002: samtar and tsepothoabala: Backport for Deploy action blocks on itwiki (T330533) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:25 samtar@deploy2002: Started scap: Backport for Deploy action blocks on itwiki (T330533)
  • 20:23 samtar@deploy2002: Finished scap: Backport for GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550), GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134) (duration: 10m 12s)
  • 20:20 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1002.wikimedia.org with OS bullseye
  • 20:17 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2002.wikimedia.org with OS bullseye
  • 20:15 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
  • 20:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1001.eqiad.wmnet with OS bullseye
  • 20:15 samtar@deploy2002: sgimeno and samtar: Backport for GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550), GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:13 samtar@deploy2002: Started scap: Backport for GrowthExperiments: enable frontend of link recommendation for 6th round wikis (T304550), GrowthExperiments: Enable backend of link recommendation for 7, 8, 9th round wikis (T304551 T308133 T308134)
  • 20:12 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3002.wikimedia.org with reason: host reimage
  • 20:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 14s)
  • 20:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b33bb73]: newly ported dags, reduce failures in map_subgraph_queries
  • 20:11 taavi: deploy patch for T331192
  • 20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
  • 20:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
  • 20:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1002.wikimedia.org with reason: host reimage
  • 19:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2002.wikimedia.org with reason: host reimage
  • 19:54 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3002.wikimedia.org with OS bullseye
  • 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
  • 19:54 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
  • 19:53 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
  • 19:53 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh3001.wikimedia.org with OS bullseye
  • 19:50 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
  • 19:49 taavi@deploy2002: Finished scap: Backport for extdist: Add REL1_40 (T329085) (duration: 12m 04s)
  • 19:48 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1002.wikimedia.org with OS bullseye
  • 19:47 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1001.eqiad.wmnet with reason: host reimage
  • 19:46 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh1001.wikimedia.org with OS bullseye
  • 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
  • 19:45 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2002.wikimedia.org with OS bullseye
  • 19:45 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
  • 19:44 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh2001.wikimedia.org with OS bullseye
  • 19:41 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6002.wikimedia.org with OS bullseye
  • 19:39 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['thanos-fe1004']
  • 19:39 taavi@deploy2002: taavi: Backport for extdist: Add REL1_40 (T329085) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 19:38 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
  • 19:37 taavi@deploy2002: Started scap: Backport for extdist: Add REL1_40 (T329085)
  • 19:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
  • 19:35 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
  • 19:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ms-fe1013']
  • 19:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
  • 19:32 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1001.eqiad.wmnet with OS bullseye
  • 19:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh3001.wikimedia.org with reason: host reimage
  • 19:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
  • 19:28 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['thanos-fe1004']
  • 19:27 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1014.mgmt.eqiad.wmnet']
  • 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh2001.wikimedia.org with reason: host reimage
  • 19:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh1001.wikimedia.org with reason: host reimage
  • 19:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
  • 19:24 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-fe1013']
  • 19:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6002.wikimedia.org with reason: host reimage
  • 19:17 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh1001.wikimedia.org with OS bullseye
  • 19:16 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh2001.wikimedia.org with OS bullseye
  • 19:15 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5002.wikimedia.org with OS bullseye
  • 19:14 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh3001.wikimedia.org with OS bullseye
  • 19:05 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6002.wikimedia.org with OS bullseye
  • 19:03 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh6001.wikimedia.org with OS bullseye
  • 18:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
  • 18:49 mutante: adding new language prefix anp.wikipedia.org - Angika, an Eastern Indo-Aryan language spoken in some parts of the Indian states of Bihar and Jharkhand, as well as in parts of Nepal. (T332115)
  • 18:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5002.wikimedia.org with reason: host reimage
  • 18:46 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
  • 18:42 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh6001.wikimedia.org with reason: host reimage
  • 18:25 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh6001.wikimedia.org with OS bullseye
  • 18:24 brennen@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.27 refs T330205 (duration: 06m 08s)
  • 18:20 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
  • 18:19 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5002.wikimedia.org with OS bullseye
  • 18:18 brennen@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.27 refs T330205
  • 18:12 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries (duration: 00m 05s)
  • 18:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: newly ported dags, reduce failures in map_subgraph_queries
  • 18:06 brennen: 1.40.0-wmf.27 train (T330205): no current blockers, rolling to group1.
  • 18:04 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh5001.wikimedia.org with OS bullseye
  • 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
  • 17:45 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
  • 17:44 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
  • 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1005.eqiad.wmnet
  • 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
  • 17:43 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1002.eqiad.wmnet
  • 17:42 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
  • 17:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh5001.wikimedia.org with reason: host reimage
  • 17:37 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
  • 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.eqiad.wmnet
  • 17:36 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1001.wmnet
  • 17:35 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2006.codfw.wmnet
  • 17:34 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4001.wikimedia.org with OS bullseye
  • 17:34 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2006.codfw.wmnet
  • 17:33 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2004.codfw.wmnet
  • 17:32 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2004.codfw.wmnet
  • 17:29 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
  • 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2005.eqiad.wmnet
  • 17:27 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
  • 17:25 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2003.eqiad.wmnet
  • 17:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
  • 17:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4001.wikimedia.org with reason: host reimage
  • 17:12 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host doh5001.wikimedia.org with OS bullseye
  • 17:05 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4001.wikimedia.org with OS bullseye
  • 16:19 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 16:19 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 16:17 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 16:17 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 16:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1002.eqiad.wmnet with OS bullseye
  • 16:02 hnowlan: restarted thumbor-instances on thumbor1006
  • 16:01 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
  • 15:59 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor1006.eqiad.wmnet
  • 15:52 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
  • 15:49 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1002.eqiad.wmnet with reason: host reimage
  • 15:44 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host doh4002.wikimedia.org with OS bullseye
  • 15:34 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1002.eqiad.wmnet with OS bullseye
  • 15:33 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 15:30 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 15:19 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:11 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:10 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:04 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:01 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:54 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:54 Emperor: depool moss-fe1001 as rate of token denial is too high
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:54 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:53 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:53 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:53 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:53 claime: Redeploying mw-on-k8s for php7.4 update T330270
  • 14:52 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:49 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:46 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:41 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:41 cgoubert@deploy2002: Started scap: (no justification provided)
  • 14:41 claime: Rebuilding mw-on-k8s images - T330270
  • 14:38 claime: Updating php7.4 production images
  • 14:36 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 14:34 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
  • 14:27 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on doh4002.wikimedia.org with reason: host reimage
  • 14:24 daniel@deploy2002: Finished scap: Backport for Always write parsoid output to parser cache. (T320534) (duration: 09m 57s)
  • 14:22 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
  • 14:22 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
  • 14:22 jbond@cumin1001: conftool action : set/pooled=true; selector: name=eqiad,dnsdisc=pki
  • 14:22 jbond: switch pki to be active active
  • 14:20 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
  • 14:20 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
  • 14:19 jbond: update pki to use discovery record
  • 14:16 jbond@cumin1001: conftool action : set/pooled=true; selector: name=codfw,dnsdisc=pki
  • 14:15 daniel@deploy2002: daniel: Backport for Always write parsoid output to parser cache. (T320534) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:14 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host doh4002.wikimedia.org with OS bullseye
  • 14:14 daniel@deploy2002: Started scap: Backport for Always write parsoid output to parser cache. (T320534)
  • 14:12 sukhe: [correction] depool _doh4002_ for reimaging to bullseye: T321309
  • 14:12 sukhe: depool dns4002 for reimaging to bullseye: T321309
  • 14:00 moritzm: nodejs security updates on buster
  • 13:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1003.eqiad.wmnet with OS bullseye
  • 13:50 sukhe: reprepro -C component/pdns-recursor include bullseye-wikimedia pdns-recursor_4.6.2-1+wmf11u1_amd64.changes: T321309
  • 13:49 moritzm: installing graphite-web security updates
  • 13:32 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
  • 13:30 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:28 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:28 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:28 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 13:27 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:27 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1003.eqiad.wmnet with reason: host reimage
  • 13:26 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:25 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:25 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:25 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:24 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:22 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:22 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:21 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:20 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:18 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 13:17 taavi@deploy2002: Finished scap: Backport for Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313), Enable DiscussionTools usability improvements at cswiki, huwiki (T329407), Disable visual enhancements on newsectionlink pages initially (T331635) (duration: 09m 01s)
  • 13:12 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1003.eqiad.wmnet with OS bullseye
  • 13:10 taavi@deploy2002: matmarex and taavi and esanders: Backport for Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313), Enable DiscussionTools usability improvements at cswiki, huwiki (T329407), Disable visual enhancements on newsectionlink pages initially (T331635) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebu
  • 13:08 taavi@deploy2002: Started scap: Backport for Enable new Vector (2022) "Add topic" button at cswiki, huwiki (T331313), Enable DiscussionTools usability improvements at cswiki, huwiki (T329407), Disable visual enhancements on newsectionlink pages initially (T331635)
  • 13:08 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:07 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:27 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:24 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:18 marostegui: Failover m5 from db1176 to db1106 - T331877
  • 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:17 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: m5 master switch T331877
  • 12:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: m5 master switch T331877
  • 12:08 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:36 derick@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 11:34 derick@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 11:32 derick@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 11:30 derick@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 11:27 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 11:26 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 11:20 moritzm: imported packages into thirdparty/ceph-quincy
  • 11:16 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 11:16 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 11:14 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 11:13 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 11:00 claime: Redirecting test.wikidata.org to mw-on-k8s - T331268/25
  • 10:30 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:29 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:28 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:26 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:25 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:24 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:23 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:22 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:22 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:21 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:20 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:19 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:18 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:16 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 10:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 10:09 jayme@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 10:09 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 10:08 jayme@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 10:08 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 09:59 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
  • 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
  • 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/similar-users: apply
  • 09:58 jayme@deploy2002: helmfile [staging] START helmfile.d/services/similar-users: apply
  • 09:58 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 09:57 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 09:57 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 09:56 jayme@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 09:56 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/recommendation-api: apply
  • 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/recommendation-api: apply
  • 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 09:55 vgutierrez@cumin1001: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 09:55 jayme@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 09:55 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 09:54 jayme@deploy2002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 09:54 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 09:53 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 09:53 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 09:52 jayme@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 09:52 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:51 jayme@deploy2002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:51 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:50 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 09:49 jayme@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 09:49 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 09:46 jayme@deploy2002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 09:46 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 09:45 jayme@deploy2002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 09:39 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 09:36 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 09:26 moritzm: rolling restart of FPM/Apache to pick up gnutls28 security updates
  • 09:22 moritzm: installing gnutls28 security updates
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1106 from dbctl T331875', diff saved to https://phabricator.wikimedia.org/P45872 and previous config saved to /var/cache/conftool/dbconfig/20230315-090515-root.json
  • 08:40 hashar@deploy2002: Finished deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - T222199 (duration: 00m 19s)
  • 08:40 hashar@deploy2002: Started deploy [integration/docroot@5abe9c6]: Link Groovy doc of PipelineLib - T222199
  • 08:15 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=1) rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 08:15 vgutierrez@cumin1001: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 07:40 tgr_: UTC morning deploys done
  • 07:39 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ms-be2067.codfw.wmnet
  • 07:36 tgr@deploy2002: Finished scap: Backport for LevelingUpManager: Ensure that $suggestions is a TaskSet (duration: 07m 54s)
  • 07:30 tgr@deploy2002: tgr: Backport for LevelingUpManager: Ensure that $suggestions is a TaskSet synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:28 tgr@deploy2002: Started scap: Backport for LevelingUpManager: Ensure that $suggestions is a TaskSet
  • 06:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1105 (s1,s2) T331874', diff saved to https://phabricator.wikimedia.org/P45870 and previous config saved to /var/cache/conftool/dbconfig/20230315-062643-root.json
  • 06:20 marostegui: Remove pki2001 from m1 grants T332018

2023-03-14

  • 23:29 brennen@deploy2002: Finished scap: Backport for action: Restrict action.delete.js to action=delete pages (T330205) (duration: 10m 32s)
  • 23:20 brennen@deploy2002: brennen and umherirrender: Backport for action: Restrict action.delete.js to action=delete pages (T330205) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 23:19 brennen@deploy2002: Started scap: Backport for action: Restrict action.delete.js to action=delete pages (T330205)
  • 22:50 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 22:34 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 22:34 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 22:25 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 22:08 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:38 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:38 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:20 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:17 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:16 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:11 jhathaway@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:11 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 21:11 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 20:47 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 20:47 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 20:43 ejegg: payments-wiki upgraded from 61c30a4f to 1532b107
  • 20:35 zabe@deploy2002: Finished scap: Backport for dewiki: Allow 'crats to remove sysopship and manage importers (T331921) (duration: 08m 36s)
  • 20:28 zabe@deploy2002: zabe: Backport for dewiki: Allow 'crats to remove sysopship and manage importers (T331921) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:27 zabe@deploy2002: Started scap: Backport for dewiki: Allow 'crats to remove sysopship and manage importers (T331921)
  • 20:04 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 20:03 jhathaway@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 19:47 topranks: Reboot cloudsw1-b1-codfw to upgrade JunOS version T327919
  • 19:44 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
  • 19:44 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cloudsw1-b1-codfw,cloudsw1-b1-codfw IPv6,cloudsw1-b1-codfw.mgmt with reason: cloudsw1-b1-codfw OS upgrade
  • 19:32 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 19:30 brennen: 1.40.0-wmf.27 train (T330205): uneventful at group0. i'm afk for about an hour.
  • 19:13 ejegg: civicrm upgraded from dbe3b716 to 68fa85cf
  • 18:51 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2002.codfw.wmnet with OS bullseye
  • 18:32 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
  • 18:28 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 11s)
  • 18:27 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 18:27 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2002.codfw.wmnet with reason: host reimage
  • 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 18:25 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 18:25 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 18:22 fab@deploy2002: Finished deploy [airflow-dags/research@5edcd7b]: (no justification provided) (duration: 00m 30s)
  • 18:22 fab@deploy2002: Started deploy [airflow-dags/research@5edcd7b]: (no justification provided)
  • 18:15 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 18:13 brennen@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.27 refs T330205
  • 18:13 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2002.codfw.wmnet with OS bullseye
  • 18:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 18:06 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 18:03 brennen: 1.40.0-wmf.27 train (T330205): no current blockers, rolling to group0.
  • 17:59 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 17:59 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 17:58 hnowlan@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:56 hnowlan@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 17:56 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:55 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:53 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:52 hnowlan@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 17:52 hnowlan@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 17:11 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2003-dev.codfw.wmnet with OS bullseye
  • 17:08 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 16:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 16:47 sukhe: rolling restart of pdns-rec in A:wikidough to pick up config changes
  • 16:47 sukhe: rolling restart of pdns-rec to pick up config changes
  • 16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts pki2001.codfw.wmnet
  • 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:16 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 16:13 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pki2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jbond@cumin1001"
  • 16:11 jbond@cumin1001: START - Cookbook sre.dns.netbox
  • 16:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph
  • 16:04 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on cephosd[1001-1005].eqiad.wmnet with reason: Bootstrapping ceph
  • 16:00 jbond@cumin1001: START - Cookbook sre.hosts.decommission for hosts pki2001.codfw.wmnet
  • 15:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2003.codfw.wmnet with OS bullseye
  • 15:36 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
  • 15:35 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 15:35 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 15:32 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2003.codfw.wmnet with reason: host reimage
  • 15:30 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission
  • 15:30 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on pki2001.codfw.wmnet with reason: decommission
  • 15:19 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2003.codfw.wmnet with OS bullseye
  • 15:00 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:59 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 14:58 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:54 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:53 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:53 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:52 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:52 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:51 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:43 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
  • 14:42 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for pki1001.eqiad.wmnet: Renew puppet certificate - jbond@cumin1001
  • 14:38 jayme@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:37 jayme@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:37 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:37 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:37 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:37 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1001.eqiad.wmnet with OS bullseye
  • 14:19 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage
  • 14:16 claime: All active/active services in eqiad repooled, DNS issues resolved - T331541
  • 14:16 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1001.eqiad.wmnet with reason: host reimage
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Decrease db2122 weight', diff saved to https://phabricator.wikimedia.org/P45866 and previous config saved to /var/cache/conftool/dbconfig/20230314-140926-root.json
  • 14:01 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki1001.eqiad.wmnet with OS bullseye
  • 14:00 jbond: reimage pki1001
  • 13:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:58 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:33 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results (again, with sukhe's more-correct variant!)
  • 13:27 TheresNoTime: close UTC afternoon backport window
  • 13:26 samtar@deploy2002: Finished scap: Backport for arwiki: Add new throttle rule (T331973) (duration: 07m 24s)
  • 13:20 samtar@deploy2002: samtar and urbanecm: Backport for arwiki: Add new throttle rule (T331973) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:19 samtar@deploy2002: Started scap: Backport for arwiki: Add new throttle rule (T331973)
  • 13:18 bblack: rolling out recdns fixup for missing 10/8 ECS affecting local inter-dc discovery/geoip results
  • 13:18 samtar@deploy2002: Finished scap: Backport for Enable VE on more namespaces on foundationwiki (T331079) (duration: 07m 55s)
  • 13:11 samtar@deploy2002: esanders and samtar: Backport for Enable VE on more namespaces on foundationwiki (T331079) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:10 samtar@deploy2002: Started scap: Backport for Enable VE on more namespaces on foundationwiki (T331079)
  • 13:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:04 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
  • 13:02 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
  • 12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
  • 12:58 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
  • 12:44 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2003-dev.codfw.wmnet with OS bullseye
  • 12:43 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 12:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 12:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T329260)', diff saved to https://phabricator.wikimedia.org/P45864 and previous config saved to /var/cache/conftool/dbconfig/20230314-123515-marostegui.json
  • 12:23 moritzm: installing git security updates
  • 12:20 samtar@deploy2002: Finished scap: Backport for [foundationwiki] Grant translation admin rights to 'editor' group (T297396), docroot: Update privacy policy footer link (T331680) (duration: 09m 12s)
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45863 and previous config saved to /var/cache/conftool/dbconfig/20230314-122009-marostegui.json
  • 12:20 TheresNoTime: `Command '['helmfile', '-e', 'eqiad', '--selector', 'name=canary', 'apply']' returned non-zero exit status 1.` (P45862) during scap deployment of T297396 + T331680 — scap rolled back
  • 12:18 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host pki-root1001.eqiad.wmnet with OS bullseye
  • 12:13 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool appservers-ro in eqiad: T331541
  • 12:13 samtar@deploy2002: samtar and varnent: Backport for [foundationwiki] Grant translation admin rights to 'editor' group (T297396), docroot: Update privacy policy footer link (T331680) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 12:11 samtar@deploy2002: Started scap: Backport for [foundationwiki] Grant translation admin rights to 'editor' group (T297396), docroot: Update privacy policy footer link (T331680)
  • 12:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors
  • 12:08 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors
  • 12:08 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route pool appservers-ro in eqiad: T331541
  • 12:06 claime: Unlocked scap deployments - T331541
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45861 and previous config saved to /var/cache/conftool/dbconfig/20230314-120503-marostegui.json
  • 12:03 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 12:03 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool appservers-ro in eqiad: T331541
  • 11:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) appservers-ro.discovery.wmnet on all recursors
  • 11:51 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache appservers-ro.discovery.wmnet on all recursors
  • 11:51 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool appservers-ro in eqiad: T331541
  • 11:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T329260)', diff saved to https://phabricator.wikimedia.org/P45860 and previous config saved to /var/cache/conftool/dbconfig/20230314-114957-marostegui.json
  • 11:42 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 11:41 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 11:39 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 11:38 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 11:27 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 11:27 elukey@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T329260)', diff saved to https://phabricator.wikimedia.org/P45857 and previous config saved to /var/cache/conftool/dbconfig/20230314-112354-marostegui.json
  • 11:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45856 and previous config saved to /var/cache/conftool/dbconfig/20230314-112333-marostegui.json
  • 11:19 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors
  • 11:19 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors
  • 11:13 claime: We are encountering unexpected DNS anycast issued following T331541, latencies are increased but no production outage.
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45855 and previous config saved to /var/cache/conftool/dbconfig/20230314-110826-marostegui.json
  • 11:03 akosiaris@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mathoid.discovery.wmnet on all recursors
  • 11:03 akosiaris@cumin1001: START - Cookbook sre.dns.wipe-cache mathoid.discovery.wmnet on all recursors
  • 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) api-ro.discovery.wmnet on all recursors
  • 11:02 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache api-ro.discovery.wmnet on all recursors
  • 11:02 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage
  • 10:58 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on pki-root1001.eqiad.wmnet with reason: host reimage
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45854 and previous config saved to /var/cache/conftool/dbconfig/20230314-105319-marostegui.json
  • 10:48 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: T331541
  • 10:48 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: T331541
  • 10:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - T331541
  • 10:43 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host pki-root1001.eqiad.wmnet with OS bullseye
  • 10:42 jbond: reimage pki-root1001
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45853 and previous config saved to /var/cache/conftool/dbconfig/20230314-103813-marostegui.json
  • 10:33 cgoubert@cumin1001: START - Cookbook sre.discovery.datacenter pool all active/active services in eqiad: Datacenter Switchover - eqiad RO repool - T331541
  • 10:32 claime: Repooling all active/active services in eqiad - T331541
  • 10:32 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=0)
  • 10:29 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet on all recursors
  • 10:28 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet on all recursors
  • 10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
  • 10:28 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches (exit_code=99)
  • 10:28 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-optional-warmup-caches
  • 10:28 claime: Running sre.switchdc.mediawiki.00-optional-warmup-caches - T331541
  • 10:21 jbond: move pki.discovery.wmnet to pki2002 (buyllseye)
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45852 and previous config saved to /var/cache/conftool/dbconfig/20230314-101918-marostegui.json
  • 10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 10:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 10:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 10:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T329260)', diff saved to https://phabricator.wikimedia.org/P45851 and previous config saved to /var/cache/conftool/dbconfig/20230314-101840-marostegui.json
  • 10:15 jayme: enabling puppet on P:calico::kubernetes for T325268
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45850 and previous config saved to /var/cache/conftool/dbconfig/20230314-100334-marostegui.json
  • 10:02 claime: Locking scap deployment for service switchover - T331541
  • 10:00 claime: Locking scap deployment for service switchover - T330651
  • 09:56 jayme: disabling puppet on P:calico::kubernetes for T325268
  • 09:54 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:53 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:51 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:51 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45849 and previous config saved to /var/cache/conftool/dbconfig/20230314-094828-marostegui.json
  • 09:42 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:36 moritzm: installing NSS security updates
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T329260)', diff saved to https://phabricator.wikimedia.org/P45848 and previous config saved to /var/cache/conftool/dbconfig/20230314-093321-marostegui.json
  • 09:32 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:23 Emperor: reboot ms-be2040 T331860
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T329260)', diff saved to https://phabricator.wikimedia.org/P45847 and previous config saved to /var/cache/conftool/dbconfig/20230314-090649-marostegui.json
  • 09:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 09:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 08:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 08:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T329260)', diff saved to https://phabricator.wikimedia.org/P45846 and previous config saved to /var/cache/conftool/dbconfig/20230314-084249-marostegui.json
  • 08:38 vgutierrez: test HAProxy 2.6.10 in cp4044 and cp4045
  • 08:31 vgutierrez: fetch haproxy 2.6.10 for thirdparty/haproxy26 (buster && bullseye) @ apt.wm.o
  • 08:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45845 and previous config saved to /var/cache/conftool/dbconfig/20230314-082743-marostegui.json
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45843 and previous config saved to /var/cache/conftool/dbconfig/20230314-081236-marostegui.json
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T329260)', diff saved to https://phabricator.wikimedia.org/P45842 and previous config saved to /var/cache/conftool/dbconfig/20230314-075730-marostegui.json
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T329260)', diff saved to https://phabricator.wikimedia.org/P45841 and previous config saved to /var/cache/conftool/dbconfig/20230314-073210-marostegui.json
  • 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T329260)', diff saved to https://phabricator.wikimedia.org/P45840 and previous config saved to /var/cache/conftool/dbconfig/20230314-073149-marostegui.json
  • 07:26 marostegui: Migrate db1183 to mariadb m5 eqiad dbmaint 10.6 T322294
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45839 and previous config saved to /var/cache/conftool/dbconfig/20230314-071643-marostegui.json
  • 07:13 marostegui: Migrate db2135 to mariadb m5 codfw dbmaint 10.6
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45838 and previous config saved to /var/cache/conftool/dbconfig/20230314-070137-marostegui.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T329260)', diff saved to https://phabricator.wikimedia.org/P45837 and previous config saved to /var/cache/conftool/dbconfig/20230314-064630-marostegui.json
  • 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts centrallog1001
  • 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:42 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 06:41 hashar: gerrit: changed `operations/puppet` merge strategy to allow "content merges" (see `ops` list for the rationale)
  • 06:36 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: centrallog1001 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 06:34 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 06:28 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts centrallog1001
  • 06:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T329260)', diff saved to https://phabricator.wikimedia.org/P45836 and previous config saved to /var/cache/conftool/dbconfig/20230314-061633-marostegui.json
  • 06:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 06:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:07 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 05:07 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 05:05 ryankemper@deploy2002: Finished deploy [wdqs/wdqs@61ef435]: 0.3.122 (duration: 08m 45s)
  • 04:57 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.122` on canary `wdqs1003`; proceeding to rest of fleet
  • 04:56 ryankemper@deploy2002: Started deploy [wdqs/wdqs@61ef435]: 0.3.122
  • 04:56 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.122`. Pre-deploy tests passing on canary `wdqs1003`
  • 03:55 mwpresync@deploy2002: Pruned MediaWiki: 1.40.0-wmf.25 (duration: 02m 20s)
  • 03:53 mwpresync@deploy2002: Finished scap: testwikis wikis to 1.40.0-wmf.27 refs T330205 (duration: 51m 02s)
  • 03:02 mwpresync@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.27 refs T330205
  • 02:22 legoktm: removed user's 2FA on wikitech for T331955
  • 02:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T329260)', diff saved to https://phabricator.wikimedia.org/P45835 and previous config saved to /var/cache/conftool/dbconfig/20230314-022023-marostegui.json
  • 02:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45834 and previous config saved to /var/cache/conftool/dbconfig/20230314-020517-marostegui.json
  • 01:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45833 and previous config saved to /var/cache/conftool/dbconfig/20230314-015011-marostegui.json
  • 01:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T329260)', diff saved to https://phabricator.wikimedia.org/P45832 and previous config saved to /var/cache/conftool/dbconfig/20230314-013504-marostegui.json
  • 01:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T329260)', diff saved to https://phabricator.wikimedia.org/P45831 and previous config saved to /var/cache/conftool/dbconfig/20230314-012442-marostegui.json
  • 01:24 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 01:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 01:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T329260)', diff saved to https://phabricator.wikimedia.org/P45830 and previous config saved to /var/cache/conftool/dbconfig/20230314-012421-marostegui.json
  • 01:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45829 and previous config saved to /var/cache/conftool/dbconfig/20230314-010915-marostegui.json
  • 00:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45828 and previous config saved to /var/cache/conftool/dbconfig/20230314-005409-marostegui.json
  • 00:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T329260)', diff saved to https://phabricator.wikimedia.org/P45827 and previous config saved to /var/cache/conftool/dbconfig/20230314-003903-marostegui.json
  • 00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T329260)', diff saved to https://phabricator.wikimedia.org/P45826 and previous config saved to /var/cache/conftool/dbconfig/20230314-002840-marostegui.json
  • 00:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T329260)', diff saved to https://phabricator.wikimedia.org/P45825 and previous config saved to /var/cache/conftool/dbconfig/20230314-002819-marostegui.json
  • 00:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45824 and previous config saved to /var/cache/conftool/dbconfig/20230314-001313-marostegui.json

2023-03-13

  • 23:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45823 and previous config saved to /var/cache/conftool/dbconfig/20230313-235807-marostegui.json
  • 23:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T329260)', diff saved to https://phabricator.wikimedia.org/P45822 and previous config saved to /var/cache/conftool/dbconfig/20230313-234301-marostegui.json
  • 23:39 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
  • 23:33 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
  • 23:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T329260)', diff saved to https://phabricator.wikimedia.org/P45821 and previous config saved to /var/cache/conftool/dbconfig/20230313-233127-marostegui.json
  • 23:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 23:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 23:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45820 and previous config saved to /var/cache/conftool/dbconfig/20230313-233050-marostegui.json
  • 23:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45819 and previous config saved to /var/cache/conftool/dbconfig/20230313-231544-marostegui.json
  • 23:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45818 and previous config saved to /var/cache/conftool/dbconfig/20230313-230038-marostegui.json
  • 22:48 zabe@deploy2002: Finished scap: noc: Switch default selection on db.php from eqiad to codfw (duration: 06m 56s)
  • 22:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45817 and previous config saved to /var/cache/conftool/dbconfig/20230313-224532-marostegui.json
  • 22:41 zabe@deploy2002: Started scap: noc: Switch default selection on db.php from eqiad to codfw
  • 22:40 zabe@deploy2002: scap failed: BrokenPipeError [Errno 32] Broken pipe (duration: 00m 00s)
  • {{safesubst:SAL entry|1=22:40 zabe@deploy2002: Started scap: [[gerrit:898037}}
  • 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45816 and previous config saved to /var/cache/conftool/dbconfig/20230313-223331-marostegui.json
  • 22:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45815 and previous config saved to /var/cache/conftool/dbconfig/20230313-223309-marostegui.json
  • 22:30 sbassett@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Set ext:StopForumSpam to enforce on es.wikiversity (duration: 06m 59s)
  • 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45814 and previous config saved to /var/cache/conftool/dbconfig/20230313-221803-marostegui.json
  • 22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45813 and previous config saved to /var/cache/conftool/dbconfig/20230313-220257-marostegui.json
  • 21:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45812 and previous config saved to /var/cache/conftool/dbconfig/20230313-214751-marostegui.json
  • 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T329260)', diff saved to https://phabricator.wikimedia.org/P45811 and previous config saved to /var/cache/conftool/dbconfig/20230313-213544-marostegui.json
  • 21:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 21:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 21:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T329260)', diff saved to https://phabricator.wikimedia.org/P45810 and previous config saved to /var/cache/conftool/dbconfig/20230313-213523-marostegui.json
  • 21:23 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging2001.codfw.wmnet with OS bullseye
  • 21:21 wfan: remove -d for jobs-dlocal queue runner
  • 21:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45809 and previous config saved to /var/cache/conftool/dbconfig/20230313-212017-marostegui.json
  • 21:06 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 21:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45808 and previous config saved to /var/cache/conftool/dbconfig/20230313-210510-marostegui.json
  • 21:04 herron@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
  • 21:01 herron@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging2001.codfw.wmnet with reason: host reimage
  • 21:01 ejegg: enabled jobs-dlocal queue runner
  • 21:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 20:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T329260)', diff saved to https://phabricator.wikimedia.org/P45807 and previous config saved to /var/cache/conftool/dbconfig/20230313-205004-marostegui.json
  • 20:47 herron@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging2001.codfw.wmnet with OS bullseye
  • 20:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@8685c9e]: drop_dated_directories.py must run through skein (duration: 00m 14s)
  • 20:43 ebernhardson@deploy2002: Started deploy [airflow-dags/search@8685c9e]: drop_dated_directories.py must run through skein
  • 20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T329260)', diff saved to https://phabricator.wikimedia.org/P45806 and previous config saved to /var/cache/conftool/dbconfig/20230313-203824-marostegui.json
  • 20:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 20:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T329260)', diff saved to https://phabricator.wikimedia.org/P45805 and previous config saved to /var/cache/conftool/dbconfig/20230313-203802-marostegui.json
  • 20:27 kindrobot: close UTC late backport window
  • 20:26 kindrobot@deploy2002: Finished scap: Backport for Add header at top of main page (T325362) (duration: 12m 11s)
  • 20:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45804 and previous config saved to /var/cache/conftool/dbconfig/20230313-202256-marostegui.json
  • 20:16 kindrobot@deploy2002: kindrobot and ksarabia: Backport for Add header at top of main page (T325362) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:15 kindrobot: start UTC late backport window
  • 20:14 kindrobot@deploy2002: Started scap: Backport for Add header at top of main page (T325362)
  • 20:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45803 and previous config saved to /var/cache/conftool/dbconfig/20230313-200750-marostegui.json
  • 20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 20:02 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 19:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T329260)', diff saved to https://phabricator.wikimedia.org/P45802 and previous config saved to /var/cache/conftool/dbconfig/20230313-195244-marostegui.json
  • 19:52 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 19:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 19:51 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 19:51 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1003.eqiad.wmnet
  • 19:50 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1003.eqiad.wmnet
  • 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T329260)', diff saved to https://phabricator.wikimedia.org/P45801 and previous config saved to /var/cache/conftool/dbconfig/20230313-194148-marostegui.json
  • 19:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 19:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T329260)', diff saved to https://phabricator.wikimedia.org/P45800 and previous config saved to /var/cache/conftool/dbconfig/20230313-194116-marostegui.json
  • 19:39 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1003.eqiad.wmnet
  • 19:38 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1003.eqiad.wmnet
  • 19:38 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1003.eqiad.wmnet
  • 19:30 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1003.eqiad.wmnet
  • 19:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45799 and previous config saved to /var/cache/conftool/dbconfig/20230313-192610-marostegui.json
  • 19:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45798 and previous config saved to /var/cache/conftool/dbconfig/20230313-191104-marostegui.json
  • 19:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 19:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 18:59 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:59 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
  • 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T329260)', diff saved to https://phabricator.wikimedia.org/P45797 and previous config saved to /var/cache/conftool/dbconfig/20230313-185558-marostegui.json
  • 18:49 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
  • 18:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:48 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:48 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:47 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T329260)', diff saved to https://phabricator.wikimedia.org/P45796 and previous config saved to /var/cache/conftool/dbconfig/20230313-184502-marostegui.json
  • 18:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 18:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 18:43 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@196e10d]: allow spark3-submit as a valid spark exeutable (duration: 00m 13s)
  • 18:43 ebernhardson@deploy2002: Started deploy [airflow-dags/search@196e10d]: allow spark3-submit as a valid spark exeutable
  • 18:38 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:36 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@a8d066e]: Parameterize streaming updater reconcile start date (duration: 00m 14s)
  • 18:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 18:36 ebernhardson@deploy2002: Started deploy [airflow-dags/search@a8d066e]: Parameterize streaming updater reconcile start date
  • 18:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 18:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T329260)', diff saved to https://phabricator.wikimedia.org/P45795 and previous config saved to /var/cache/conftool/dbconfig/20230313-183628-marostegui.json
  • 18:33 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:32 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1002.eqiad.wmnet
  • 18:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45794 and previous config saved to /var/cache/conftool/dbconfig/20230313-182121-marostegui.json
  • 18:17 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
  • 18:11 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
  • 18:07 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 18:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 18:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45793 and previous config saved to /var/cache/conftool/dbconfig/20230313-180615-marostegui.json
  • 17:56 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 17:55 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T329260)', diff saved to https://phabricator.wikimedia.org/P45792 and previous config saved to /var/cache/conftool/dbconfig/20230313-175109-marostegui.json
  • 17:50 dancy@deploy2002: Finished scap: test cleanup (duration: 06m 40s)
  • 17:44 dancy@deploy2002: Started scap: test cleanup
  • 17:43 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T329260)', diff saved to https://phabricator.wikimedia.org/P45791 and previous config saved to /var/cache/conftool/dbconfig/20230313-174030-marostegui.json
  • 17:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 17:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 17:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T329260)', diff saved to https://phabricator.wikimedia.org/P45790 and previous config saved to /var/cache/conftool/dbconfig/20230313-174009-marostegui.json
  • 17:35 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:33 eevans@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:32 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45789 and previous config saved to /var/cache/conftool/dbconfig/20230313-172503-marostegui.json
  • 17:22 dancy@deploy2002: Finished scap: testing T329857 (duration: 06m 54s)
  • 17:16 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 17:15 dancy@deploy2002: Started scap: testing T329857
  • 17:13 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:13 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:12 eevans@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:12 eevans@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts sessionstore1001.eqiad.wmnet
  • 17:11 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:11 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:codfw and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 17:11 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:10 Emperor: roll-restart of codfw eqiad frontends
  • 17:10 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45788 and previous config saved to /var/cache/conftool/dbconfig/20230313-170955-marostegui.json
  • 17:09 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:08 dancy@deploy2002: Installation of scap version "4.46.0" completed for 553 hosts
  • 17:07 dancy@deploy2002: Installing scap version "4.46.0" for 553 hosts
  • 17:04 bd808: Ran cache.purge_openstack_users() for Striker following deploy of e1f7491 (T331674)
  • 17:04 dancy@deploy2002: Installing scap version "4.46.0" for 553 hosts
  • 16:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T329260)', diff saved to https://phabricator.wikimedia.org/P45787 and previous config saved to /var/cache/conftool/dbconfig/20230313-165449-marostegui.json
  • 16:47 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T329260)', diff saved to https://phabricator.wikimedia.org/P45785 and previous config saved to /var/cache/conftool/dbconfig/20230313-164410-marostegui.json
  • 16:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 16:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 16:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T329260)', diff saved to https://phabricator.wikimedia.org/P45784 and previous config saved to /var/cache/conftool/dbconfig/20230313-164349-marostegui.json
  • 16:36 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 16:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45783 and previous config saved to /var/cache/conftool/dbconfig/20230313-162843-marostegui.json
  • 16:20 moritzm: imported tideways 5.0.4-2+wmf1+buster1+icu67u1 T329491
  • 16:18 dancy@deploy2002: Finished scap: testing (duration: 06m 53s)
  • 16:17 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 16:17 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 16:17 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 16:16 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 16:16 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 16:16 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45782 and previous config saved to /var/cache/conftool/dbconfig/20230313-161337-marostegui.json
  • 16:11 dancy@deploy2002: Started scap: testing
  • 16:06 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 15s)
  • 16:00 moritzm: imported xdebug 3.0.3+2.9.8+2.8.1+2.5.5-0+deb11u1+wmf1+buster1+icu67u1 T329491
  • 16:00 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 43s)
  • 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T329260)', diff saved to https://phabricator.wikimedia.org/P45781 and previous config saved to /var/cache/conftool/dbconfig/20230313-155830-marostegui.json
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T329260)', diff saved to https://phabricator.wikimedia.org/P45780 and previous config saved to /var/cache/conftool/dbconfig/20230313-154641-marostegui.json
  • 15:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 15:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 15:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 15:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 15:35 moritzm: imported php-yaml 2.2.1+2.1.0+2.0.4+1.3.2-2+wmf1~buster1+icu67u1 T329491
  • 15:31 dancy@deploy2002: Finished scap: testing T329857 (duration: 10m 08s)
  • 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 15:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 15:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 15:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2118.codfw.wmnet with reason: Maintenance
  • 15:21 dancy@deploy2002: Started scap: testing T329857
  • 15:06 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 15:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45779 and previous config saved to /var/cache/conftool/dbconfig/20230313-150523-marostegui.json
  • 15:03 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 14:53 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 14:51 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 14:51 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 14:51 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P45778 and previous config saved to /var/cache/conftool/dbconfig/20230313-145016-marostegui.json
  • 14:50 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 14:38 jbond: disable puppet fleet wide to debug strange issue
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P45777 and previous config saved to /var/cache/conftool/dbconfig/20230313-143510-marostegui.json
  • 14:23 claime: switch noc.wikimedia.org from eqiad to codfw - T331634
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45776 and previous config saved to /var/cache/conftool/dbconfig/20230313-142004-marostegui.json
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45774 and previous config saved to /var/cache/conftool/dbconfig/20230313-141409-marostegui.json
  • 14:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 14:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 14:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45773 and previous config saved to /var/cache/conftool/dbconfig/20230313-141348-marostegui.json
  • 14:00 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:59 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P45772 and previous config saved to /var/cache/conftool/dbconfig/20230313-135842-marostegui.json
  • 13:50 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 13:49 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] START helmfile.d/services/termbox: apply
  • 13:48 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 13:48 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@4f393e6] (duration: 00m 11s)
  • 13:48 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@4f393e6]
  • 13:47 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 13:46 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 13:45 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 13:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P45770 and previous config saved to /var/cache/conftool/dbconfig/20230313-134336-marostegui.json
  • 13:40 moritzm: imported wikidiff2 1.13.0-1+wmf1+buster1+icu67u1 T329491
  • 13:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45769 and previous config saved to /var/cache/conftool/dbconfig/20230313-132829-marostegui.json
  • 13:25 moritzm: imported php-excimer 1.0.2-1+wmf2+buster1+icu67u1 T329491
  • 13:25 moritzm: imported php-excimer 1.0.2-1+wmf2+buster1+icu67u1T329491
  • 13:23 taavi@deploy2002: Finished scap: Backport for [trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399), [trwiki] Removing the temporary logo, previously added, and already reverted (T329047) (duration: 08m 10s)
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45768 and previous config saved to /var/cache/conftool/dbconfig/20230313-132123-marostegui.json
  • 13:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45767 and previous config saved to /var/cache/conftool/dbconfig/20230313-132101-marostegui.json
  • 13:16 taavi@deploy2002: taavi and superpes: Backport for [trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399), [trwiki] Removing the temporary logo, previously added, and already reverted (T329047) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:16 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:15 taavi@deploy2002: Started scap: Backport for [trwikiquote] Reverting temporary logo (Vector legacy + Vector 2022) (T329399), [trwiki] Removing the temporary logo, previously added, and already reverted (T329047)
  • 13:13 taavi@deploy2002: Finished scap: Backport for zhwiki: Add movefile to extendedconfirmed (T331691) (duration: 09m 29s)
  • 13:11 moritzm: imported php-luasandbox 4.0.2-3+wmf1+buster1+icu67u1 T329491
  • 13:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P45766 and previous config saved to /var/cache/conftool/dbconfig/20230313-130555-marostegui.json
  • 13:05 taavi@deploy2002: stang and taavi: Backport for zhwiki: Add movefile to extendedconfirmed (T331691) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:03 taavi@deploy2002: Started scap: Backport for zhwiki: Add movefile to extendedconfirmed (T331691)
  • 13:00 moritzm: imported php-wmerrors 2.0.0~git20190628.183ef7d-3+wmf1+buster1+icu67u1 T329491
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P45764 and previous config saved to /var/cache/conftool/dbconfig/20230313-125049-marostegui.json
  • 12:48 hnowlan: restarting codfw thumbor instances to attempt to remedy 502 issues
  • 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:48 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:46 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thumbor,name=thumbor2005.codfw.wmnet
  • 12:46 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=thumbor,name=thumbor2005.codfw.wmnet
  • 12:37 moritzm: imported php-geoip 1.1.1-7+wmf2+buster1+icu67u1 T329491
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45763 and previous config saved to /var/cache/conftool/dbconfig/20230313-123543-marostegui.json
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T329260)', diff saved to https://phabricator.wikimedia.org/P45762 and previous config saved to /var/cache/conftool/dbconfig/20230313-122928-marostegui.json
  • 12:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 12:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T329260)', diff saved to https://phabricator.wikimedia.org/P45761 and previous config saved to /var/cache/conftool/dbconfig/20230313-122906-marostegui.json
  • 12:29 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:29 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:19 moritzm: imported php-redis 5.3.2+4.3.0-2+deb11u1+wmf1+buster1+icu67u1 T329491
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P45760 and previous config saved to /var/cache/conftool/dbconfig/20230313-121400-marostegui.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P45759 and previous config saved to /var/cache/conftool/dbconfig/20230313-115854-marostegui.json
  • 11:58 moritzm: imported php-memcached 3.1.5+2.2.0-5+deb11u1+wmf1+buster1+icu67u1 T329491
  • 11:46 moritzm: imported php-igbinary 3.2.1+2.0.8-2+wmf1+buster1+icu67u1 T329491
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T329260)', diff saved to https://phabricator.wikimedia.org/P45758 and previous config saved to /var/cache/conftool/dbconfig/20230313-114348-marostegui.json
  • 11:31 moritzm: imported php-apcu 5.1.19+4.0.11-3+wmf2+buster1+icu67u1 T329491
  • 11:22 jnuche@deploy2002: Installation of scap version "latest" completed for 553 hosts
  • 11:21 jnuche@deploy2002: Installing scap version "latest" for 553 hosts
  • 11:11 moritzm: imported php-msgpack 2.1.2+0.5.7-2+wmf1+buster1+icu67u1 T329491
  • 10:55 moritzm: imported php-imagick 3.4.4+php8.0+3.4.4-2+deb11u2+wmf1+buster1+icu67u1 T329491
  • 10:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T329260)', diff saved to https://phabricator.wikimedia.org/P45757 and previous config saved to /var/cache/conftool/dbconfig/20230313-104322-marostegui.json
  • 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T329260)', diff saved to https://phabricator.wikimedia.org/P45756 and previous config saved to /var/cache/conftool/dbconfig/20230313-104246-marostegui.json
  • 10:38 moritzm: imported php-pcov 1.0.6-4+wmf1~buster1+icu67u1 T329491
  • 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:32 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:30 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P45755 and previous config saved to /var/cache/conftool/dbconfig/20230313-102740-marostegui.json
  • 10:26 moritzm: imported php-defaults 7.4+76+wmf1~buster2+icu67u1 T329491
  • 10:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 55701
  • 10:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P45754 and previous config saved to /var/cache/conftool/dbconfig/20230313-101234-marostegui.json
  • 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 55701
  • 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38193
  • 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38193
  • 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 46632
  • 10:10 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 46632
  • 10:10 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6663
  • 10:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6663
  • 10:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45558
  • 10:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 45558
  • 10:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 38082
  • 10:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 38082
  • 10:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 668
  • 10:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 668
  • 10:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:05 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 10:02 moritzm: imported dh-php 0.35+wmf1+buster1+icu67u1 T329491
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T329260)', diff saved to https://phabricator.wikimedia.org/P45753 and previous config saved to /var/cache/conftool/dbconfig/20230313-095728-marostegui.json
  • 09:55 vgutierrez: Enable haproxy hardening in cp hosts globally - T323944
  • 09:52 zabe@deploy2002: Finished scap: Backport for Drop loading of former extension Renameuser's i18n strings [Re-apply] (duration: 07m 40s)
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T329260)', diff saved to https://phabricator.wikimedia.org/P45752 and previous config saved to /var/cache/conftool/dbconfig/20230313-095119-marostegui.json
  • 09:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 09:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45751 and previous config saved to /var/cache/conftool/dbconfig/20230313-095058-marostegui.json
  • 09:48 jayme: pcc-worker1003:~# rm -r /srv/jenkins/puppet-compiler/40076 - / back to 70%
  • 09:46 zabe@deploy2002: jforrester and zabe: Backport for Drop loading of former extension Renameuser's i18n strings [Re-apply] synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 09:45 jayme: pcc-worker1002:~# rm -r /srv/jenkins/puppet-compiler/40078 - / back to 47% usage
  • 09:44 zabe@deploy2002: Started scap: Backport for Drop loading of former extension Renameuser's i18n strings [Re-apply]
  • 09:44 zabe@deploy2002: Finished scap: Backport for Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685) (duration: 07m 52s)
  • 09:40 jayme: pcc-worker1001:~# rm -r /srv/jenkins/puppet-compiler/40079 /srv/jenkins/puppet-compiler/38943 - / back to 68% usage
  • 09:38 zabe@deploy2002: zabe: Backport for Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:36 zabe@deploy2002: Started scap: Backport for Revert "Revert "Unload RenameUser, now part of core: Part I of II"" (T331685)
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P45750 and previous config saved to /var/cache/conftool/dbconfig/20230313-093552-marostegui.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P45749 and previous config saved to /var/cache/conftool/dbconfig/20230313-092045-marostegui.json
  • 09:16 moritzm: installing python-werkzeug security updates
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45748 and previous config saved to /var/cache/conftool/dbconfig/20230313-090539-marostegui.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45747 and previous config saved to /var/cache/conftool/dbconfig/20230313-085937-marostegui.json
  • 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T329260)', diff saved to https://phabricator.wikimedia.org/P45746 and previous config saved to /var/cache/conftool/dbconfig/20230313-085916-marostegui.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P45745 and previous config saved to /var/cache/conftool/dbconfig/20230313-084409-marostegui.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P45744 and previous config saved to /var/cache/conftool/dbconfig/20230313-082903-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T329260)', diff saved to https://phabricator.wikimedia.org/P45743 and previous config saved to /var/cache/conftool/dbconfig/20230313-081357-marostegui.json
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T329260)', diff saved to https://phabricator.wikimedia.org/P45742 and previous config saved to /var/cache/conftool/dbconfig/20230313-080759-marostegui.json
  • 08:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T329260)', diff saved to https://phabricator.wikimedia.org/P45741 and previous config saved to /var/cache/conftool/dbconfig/20230313-080738-marostegui.json
  • 08:05 nfraison@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:05 nfraison@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:02 moritzm: installing curl security updates
  • 07:58 zabe@deploy2002: Finished scap: Backport for use core Renameuser classes (T27482), UserRenameHandler: Use core RenameUser classes (T27482) (duration: 07m 02s)
  • 07:53 zabe@deploy2002: zabe: Backport for use core Renameuser classes (T27482), UserRenameHandler: Use core RenameUser classes (T27482) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P45740 and previous config saved to /var/cache/conftool/dbconfig/20230313-075232-marostegui.json
  • 07:51 zabe@deploy2002: Started scap: Backport for use core Renameuser classes (T27482), UserRenameHandler: Use core RenameUser classes (T27482)
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P45739 and previous config saved to /var/cache/conftool/dbconfig/20230313-073725-marostegui.json
  • 07:37 marostegui: Remove pagetriage_log from enwiki T328309
  • 07:32 kartik@deploy2002: Finished scap: Backport for testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541) (duration: 17m 04s)
  • 07:25 kartik@deploy2002: kartik: Backport for testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T329260)', diff saved to https://phabricator.wikimedia.org/P45738 and previous config saved to /var/cache/conftool/dbconfig/20230313-072219-marostegui.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T329260)', diff saved to https://phabricator.wikimedia.org/P45737 and previous config saved to /var/cache/conftool/dbconfig/20230313-071522-marostegui.json
  • 07:15 kartik@deploy2002: Started scap: Backport for testwiki: Enable Section Translation on 11 Wikipedias (T327102 T326541)
  • 07:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T329260)', diff saved to https://phabricator.wikimedia.org/P45736 and previous config saved to /var/cache/conftool/dbconfig/20230313-071501-marostegui.json
  • 06:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P45735 and previous config saved to /var/cache/conftool/dbconfig/20230313-065954-marostegui.json
  • 06:52 marostegui_: Remove pagetriage_log from testwiki and test2wiki T328309
  • 06:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P45734 and previous config saved to /var/cache/conftool/dbconfig/20230313-064448-marostegui.json
  • 06:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9873
  • 06:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9873
  • 06:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9507
  • 06:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9507
  • 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15830
  • 06:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15830
  • 06:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9902
  • 06:31 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9902
  • 06:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
  • 06:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T329260)', diff saved to https://phabricator.wikimedia.org/P45733 and previous config saved to /var/cache/conftool/dbconfig/20230313-062942-marostegui.json
  • 06:29 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8966
  • 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 34549
  • 06:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 34549
  • 06:25 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 29357
  • 06:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 29357
  • 06:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T329260)', diff saved to https://phabricator.wikimedia.org/P45732 and previous config saved to /var/cache/conftool/dbconfig/20230313-062244-marostegui.json
  • 06:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 06:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 06:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 138886
  • 06:19 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 138886
  • 06:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 06:16 marostegui_: Deploy schema change on s3 codfw dbmaint T329684
  • 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 04:37 kart_: Updated cxserver to 2023-03-09-061555-production (T331097, T327102, T326541)
  • 04:19 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 04:19 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 04:18 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 04:17 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 04:12 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 04:12 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply

2023-03-12

  • 10:47 elukey: reset offsets on kafka jumbo for benthos webrequest live (as indicated in https://phabricator.wikimedia.org/T331801#8685569)
  • 07:50 elukey: restart benthos-webrequest-live on centrallog1002 - T331801
  • 07:49 elukey: restart benthos-webrequest-live on centrallog2002 - T331801
  • 07:49 elukey: stop and mask benthos-webrequest-live on centrallog1001 - T331801

2023-03-10

  • 22:43 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:32 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 22:26 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 22:16 jhathaway@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:24 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:14 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:13 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:03 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:43 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@dd7fc78] (duration: 00m 10s)
  • 20:43 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@dd7fc78]
  • 20:20 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:20 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 19:39 milimetric@deploy2002: Finished deploy [analytics/refinery@898a942] (thin): Special deploy for pageview job migration [analytics/refinery@898a942] (duration: 00m 09s)
  • 19:38 milimetric@deploy2002: Started deploy [analytics/refinery@898a942] (thin): Special deploy for pageview job migration [analytics/refinery@898a942]
  • 19:38 milimetric@deploy2002: Finished deploy [analytics/refinery@898a942]: Special deploy for pageview job migration [analytics/refinery@898a942] (duration: 08m 08s)
  • 19:33 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-fe1014.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:30 milimetric@deploy2002: Started deploy [analytics/refinery@898a942]: Special deploy for pageview job migration [analytics/refinery@898a942]
  • 19:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host ms-fe1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new ms-fe servers - cmjohnson@cumin1001"
  • 19:17 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: new ms-fe servers - cmjohnson@cumin1001"
  • 19:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2003-dev.codfw.wmnet with OS bullseye
  • 19:11 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:07 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1002.eqiad.wmnet
  • 19:02 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:01 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 19:00 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 19:00 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1002.eqiad.wmnet
  • 18:55 milimetric@deploy2002: Finished deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@bb9a944] (duration: 00m 12s)
  • 18:55 milimetric@deploy2002: Started deploy [airflow-dags/analytics@4696eff]: Deploying analytics dags from origin/main_airflow_2.5 [airflow-dags@bb9a944]
  • 18:51 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 18:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
  • 18:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2003-dev.codfw.wmnet with reason: host reimage
  • 18:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
  • 18:31 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb2002-dev.codfw.wmnet with reason: host reimage
  • 18:22 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host cloudlb2003-dev.codfw.wmnet with OS bullseye
  • 18:13 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 18:12 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 18:04 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:59 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:53 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:52 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:51 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 17:47 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:44 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:44 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:40 cmooney@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:34 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:28 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:22 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 17:13 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb2002-dev.codfw.wmnet with OS bullseye
  • 16:49 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 16:42 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 16:24 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 16:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 16:04 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 16:04 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2003-dev']
  • 16:04 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:59 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:59 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:57 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:57 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:56 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:56 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:56 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:56 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:55 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2003-dev']
  • 15:53 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:53 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:50 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:50 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:36 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:35 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:35 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2003-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:35 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:34 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:34 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:31 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudlb2002-dev']
  • 15:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudlb2002-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:08 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host cloudlb2003-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:52 cmooney@cumin1001: START - Cookbook sre.hosts.provision for host cloudlb2002-dev.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:50 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 14:47 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 14:38 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
  • 14:36 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
  • 14:22 cmooney@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
  • 14:20 cmooney@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.1 update - cmooney@cumin1001
  • 14:09 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for pki2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
  • 14:08 jbond@cumin1001: START - Cookbook sre.puppet.renew-cert for pki2002.codfw.wmnet: Renew puppet certificate - jbond@cumin1001
  • 13:55 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:55 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new cloudlb. - cmooney@cumin1001"
  • 13:54 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add entries for new cloudlb. - cmooney@cumin1001"
  • 13:51 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:47 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:44 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:40 gmodena@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:39 gmodena@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 13:34 Emperor: restart swift-object-replicator on ms-be2067
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 12:50 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Sync data for new cloudsw1-b1-codfw device. - cmooney@cumin1001 - T327919"
  • 12:49 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Sync data for new cloudsw1-b1-codfw device. - cmooney@cumin1001 - T327919"
  • 12:46 moritzm: installing libsdl2 security updates
  • 12:32 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:32 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
  • 12:31 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
  • 12:28 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:24 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:23 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:23 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
  • 12:18 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new files for privte loopback ranges codfw. - cmooney@cumin1001"
  • 12:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:15 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:15 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:15 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 12:13 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 11:54 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:52 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host urldownloader1004.wikimedia.org with OS bullseye
  • 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on urldownloader1004.wikimedia.org with reason: host reimage
  • 11:36 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on urldownloader1004.wikimedia.org with reason: host reimage
  • 11:35 moritzm: instaling isc-dhcp bugfix updates from DLA 3326
  • 11:20 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:20 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 11:08 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader1004.wikimedia.org with OS bullseye
  • 11:04 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=jawiki --logwiki=metawiki --ignorestatus 'あ ーあーあーあーあー' 'ARIAUSO' # T331685
  • 11:03 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'ZSTK Lublin' 'Sonabet4' # T331685
  • 11:01 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'Yair.herman' 'Manor258' # T331685
  • 10:58 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=afwiki --logwiki=metawiki --ignorestatus 'Tranquill Komnin' 'Nevechear' # T331685
  • 10:58 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'Tosikuni Japan' 'Revisionist14' # T331685
  • 10:54 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'Studio 7 Piaseczno Jarosław Zawadzki' 'Jarosław Andrzej Zawadzki (muzyk)' # T331685
  • 10:52 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=afwiki --logwiki=metawiki --ignorestatus 'Siniy7' 'Viktorbublik' # T331685
  • 10:51 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=arwiki --logwiki=metawiki --ignorestatus 'Reza amjad(iran)' 'رضا امجد (تبریز)' # T331685
  • 10:48 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'Mac700' 'Unknown001100' # T331685
  • 10:48 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki --ignorestatus 'HonzaSTECH' 'ShadyMedic' # T331685
  • 10:48 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki --ignorestatus 'ExplosiveCreeper294' 'NotGalxyGaming' # T331685
  • 10:41 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'Mac700' 'Unknown001100' # T331685
  • 10:41 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'HonzaSTECH' 'ShadyMedic' # T331685
  • 10:40 zabe: zabe@mwmaint2002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'ExplosiveCreeper294' 'NotGalxyGaming' # T331685
  • 09:58 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:58 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove netbox-generated DNS records which have been defined manually. - cmooney@cumin1001"
  • 09:57 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove netbox-generated DNS records which have been defined manually. - cmooney@cumin1001"
  • 09:55 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 02:09 zabe@deploy2002: Finished scap: T331685 (duration: 07m 52s)
  • 02:02 zabe@deploy2002: Started scap: T331685
  • 02:01 zabe@deploy2002: Finished scap: T331685 (duration: 07m 28s)
  • 02:00 ejegg: SmashPig upgraded from c6775c60 to 3b84e4cb
  • 01:55 ejegg: payments-wiki upgraded from 05a5e09a to 61c30a4f
  • 01:54 zabe@deploy2002: Started scap: T331685
  • 01:28 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
  • 00:33 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye

2023-03-09

  • 23:52 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@7b25fbf]: import_ttl: correct date formatting (duration: 00m 14s)
  • 23:52 ebernhardson@deploy2002: Started deploy [airflow-dags/search@7b25fbf]: import_ttl: correct date formatting
  • 23:33 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@b122672]: import_ttl: replace HdfsSensor with URLSensor (duration: 00m 14s)
  • 23:32 ebernhardson@deploy2002: Started deploy [airflow-dags/search@b122672]: import_ttl: replace HdfsSensor with URLSensor
  • 23:09 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 23:09 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 23:04 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 23:04 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 23:01 sukhe: pool new dns hosts dns1003 and dns2003: T330670
  • 22:53 sukhe: run homer in cr*-{codfw,eqiad} for CR 896190
  • 22:47 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bullseye
  • 22:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2003.wikimedia.org with OS bullseye
  • 22:43 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 22:41 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 22:40 bd808: Forced puppet run on cloudweb100[34] to apply quick fix for T331674
  • 22:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:25 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for new links to cloudsw1-b1-codfw - cmooney@cumin1001"
  • 22:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for new links to cloudsw1-b1-codfw - cmooney@cumin1001"
  • 22:20 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1003.wikimedia.org with OS bullseye
  • 22:20 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 22:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2003.wikimedia.org with reason: host reimage
  • 22:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 22:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2003.wikimedia.org with reason: host reimage
  • 22:14 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 22:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2003.wikimedia.org with OS bullseye
  • 22:02 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns2003.wikimedia.org with OS bullseye
  • 21:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns2003.wikimedia.org with OS bullseye
  • 21:53 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1003.wikimedia.org with reason: host reimage
  • 21:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bullseye
  • 21:49 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1003.wikimedia.org with reason: host reimage
  • 21:38 TheresNoTime: close UTC late backport
  • 21:37 samtar@deploy2002: Finished scap: Backport for Replace Cleopatra page with United_States to facilitate synthetic testing of T326829 (T326829) (duration: 10m 43s)
  • 21:35 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
  • 21:35 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1003.wikimedia.org with OS bullseye
  • 21:28 samtar@deploy2002: samtar and nray: Backport for Replace Cleopatra page with United_States to facilitate synthetic testing of T326829 (T326829) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:27 samtar@deploy2002: Started scap: Backport for Replace Cleopatra page with United_States to facilitate synthetic testing of T326829 (T326829)
  • 21:24 samtar@deploy2002: Finished scap: Backport for Unload RenameUser, now part of core: Part II of II (duration: 07m 38s)
  • 21:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust and remove reverse DNS records after cloudsw1-b1-codfw migration. - cmooney@cumin1001"
  • 21:19 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster restart to enable incr shard recovery throughput - ryankemper@cumin1001 - T317816
  • 21:18 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Adjust and remove reverse DNS records after cloudsw1-b1-codfw migration. - cmooney@cumin1001"
  • 21:18 samtar@deploy2002: samtar and jforrester: Backport for Unload RenameUser, now part of core: Part II of II synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:17 samtar@deploy2002: Started scap: Backport for Unload RenameUser, now part of core: Part II of II
  • 21:16 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 21:14 samtar@deploy2002: Finished scap: Backport for Unload RenameUser, now part of core: Part I of II (duration: 12m 19s)
  • 21:10 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns2003
  • 21:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
  • 21:09 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns2003
  • 21:09 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1003
  • 21:08 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns1003
  • 21:07 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1003.wikimedia.org with OS bullseye
  • 21:03 samtar@deploy2002: samtar and jforrester: Backport for Unload RenameUser, now part of core: Part I of II synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:02 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dns2003.mgmt.codfw.wmnet on all recursors
  • 21:02 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache dns2003.mgmt.codfw.wmnet on all recursors
  • 21:02 samtar@deploy2002: Started scap: Backport for Unload RenameUser, now part of core: Part I of II
  • 20:59 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dns2003.wikimedia.org on all recursors
  • 20:59 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache dns2003.wikimedia.org on all recursors
  • 20:53 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
  • 20:47 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:47 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns2003 (renamed from authdns2001) - sukhe@cumin2002"
  • 20:46 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns2003 (renamed from authdns2001) - sukhe@cumin2002"
  • 20:44 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 20:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['dns1003.wikimedia.org']
  • 20:30 sukhe@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dns1003.wikimedia.org']
  • 20:25 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1003.wikimedia.org with OS bullseye
  • 20:24 topranks: move cloud-hosts1-b-codfw GW from core routers to cloudsw1-b1-codfw T327919
  • 20:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1003.wikimedia.org with OS bullseye
  • 20:12 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dns1003.wikimedia.org on all recursors
  • 20:12 sukhe@cumin2002: START - Cookbook sre.dns.wipe-cache dns1003.wikimedia.org on all recursors
  • 20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:09 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
  • 20:07 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
  • 20:06 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 19:51 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster restart to enable incr shard recovery throughput - ryankemper@cumin1001 - T317816
  • 19:46 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on an-worker1078.eqiad.wmnet with reason: Replacing RAID BBU
  • 19:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 12:00:00 on an-worker1078.eqiad.wmnet with reason: Replacing RAID BBU
  • 19:15 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns1003
  • 19:15 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns1003
  • 19:14 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:14 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
  • 19:12 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns1003 (renamed from authdns1001) - sukhe@cumin2002"
  • 19:10 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.26 refs T330204
  • 19:06 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:53 sukhe: enable puppet on A:dns-rec and force puppet run: T330670
  • 18:50 mforns@deploy2002: Finished deploy [airflow-dags/analytics@3419b7d]: (no justification provided) (duration: 00m 10s)
  • 18:50 mforns@deploy2002: Started deploy [airflow-dags/analytics@3419b7d]: (no justification provided)
  • 18:47 sukhe: enable puppet on dns4003 to merge 895894
  • 18:44 sukhe: disable puppet on A:dns-rec to merge CR 895894
  • 18:38 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 18:38 jhathaway@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 18:34 sukhe: [correction] homer "cr*-codfw*" commit "Remove authdns2001 from homer, T330670"
  • 18:34 sukhe: homer "cr*-codfw*" commit "Remove authdns1001 from homer, T330670"
  • 18:31 sukhe: homer "cr*-eqiad*" commit "Remove authdns1001 from homer, T330670"
  • 18:26 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts authdns[1001,2001].wikimedia.org
  • 18:26 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:25 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: authdns[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:24 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: authdns[1001,2001].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:22 sukhe: running puppet-agent on A:dns-auth to remove deprecated authdns[12]001
  • 18:22 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:21 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:15 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts authdns[1001,2001].wikimedia.org
  • 18:11 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:10 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:10 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:10 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:09 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:09 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:09 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 18:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:08 cmooney@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:01 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 18:00 sukhe: cr*-codfw [ns0]: set routing-options static route 208.80.154.238/32 next-hop 208.80.153.77: T330670
  • 17:53 sukhe: cr*-codfw [ns1]: set routing-options static route 208.80.153.231/32 next-hop 208.80.153.77: T330670
  • 17:50 zabe@deploy2002: Finished scap: Backport for Revert "TransformHandler: Load stashed page bundle based on ETag." (T331629) (duration: 11m 57s)
  • 17:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T329260)', diff saved to https://phabricator.wikimedia.org/P45725 and previous config saved to /var/cache/conftool/dbconfig/20230309-174723-marostegui.json
  • 17:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:42 sukhe: [ns1] set routing-options static route 208.80.153.231/32 next-hop 208.80.154.10: T330670
  • 17:39 zabe@deploy2002: zabe and ssastry: Backport for Revert "TransformHandler: Load stashed page bundle based on ETag." (T331629) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 17:38 zabe@deploy2002: Started scap: Backport for Revert "TransformHandler: Load stashed page bundle based on ETag." (T331629)
  • 17:37 sukhe: cr2-eqiad: set routing-options static route 208.80.154.238/32 next-hop 208.80.154.10: T330670
  • 17:37 sukhe: cr1-eqiad: set routing-options static route 208.80.154.238/32 next-hop 208.80.154.10: T330670
  • 17:36 sukhe: cr1-eqiad: set routing-options static route 208.80.154.238/32 next-hop 208.80.154.10
  • 17:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45724 and previous config saved to /var/cache/conftool/dbconfig/20230309-173217-marostegui.json
  • 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45723 and previous config saved to /var/cache/conftool/dbconfig/20230309-171711-marostegui.json
  • 17:13 topranks: Add EBGP peering from cr1-codfw to cloudsw1-b1-codfw (prod links) T327919
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T329260)', diff saved to https://phabricator.wikimedia.org/P45722 and previous config saved to /var/cache/conftool/dbconfig/20230309-170205-marostegui.json
  • 16:55 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T329260)', diff saved to https://phabricator.wikimedia.org/P45721 and previous config saved to /var/cache/conftool/dbconfig/20230309-165210-marostegui.json
  • 16:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 16:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T329260)', diff saved to https://phabricator.wikimedia.org/P45720 and previous config saved to /var/cache/conftool/dbconfig/20230309-165149-marostegui.json
  • 16:51 topranks: Add EBGP peering from cr1-codfw to cloudsw1-b1-codfw (cloud vrf) T327919
  • 16:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45719 and previous config saved to /var/cache/conftool/dbconfig/20230309-163643-marostegui.json
  • 16:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'db2163 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45718 and previous config saved to /var/cache/conftool/dbconfig/20230309-162608-root.json
  • 16:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45717 and previous config saved to /var/cache/conftool/dbconfig/20230309-162137-marostegui.json
  • 16:18 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host acmechief1001.eqiad.wmnet with OS bullseye
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'db2163 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45716 and previous config saved to /var/cache/conftool/dbconfig/20230309-161103-root.json
  • 16:09 zabe@deploy2002: Finished scap: T308932 (duration: 07m 19s)
  • 16:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T329260)', diff saved to https://phabricator.wikimedia.org/P45715 and previous config saved to /var/cache/conftool/dbconfig/20230309-160630-marostegui.json
  • 16:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief1001.eqiad.wmnet with reason: host reimage
  • 16:03 marostegui: Restart mailman service T331626
  • 16:02 zabe@deploy2002: Started scap: T308932
  • 16:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief1001.eqiad.wmnet with reason: host reimage
  • 16:00 marostegui: Failover m5 from db1183 to db1176 - T330847
  • 15:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2163 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45714 and previous config saved to /var/cache/conftool/dbconfig/20230309-155558-root.json
  • 15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T329260)', diff saved to https://phabricator.wikimedia.org/P45713 and previous config saved to /var/cache/conftool/dbconfig/20230309-155520-marostegui.json
  • 15:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 15:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T329260)', diff saved to https://phabricator.wikimedia.org/P45712 and previous config saved to /var/cache/conftool/dbconfig/20230309-155459-marostegui.json
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2163 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45711 and previous config saved to /var/cache/conftool/dbconfig/20230309-154053-root.json
  • 15:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45710 and previous config saved to /var/cache/conftool/dbconfig/20230309-153953-marostegui.json
  • 15:29 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host acmechief1001.eqiad.wmnet with OS bullseye
  • 15:27 brett: Enable puppet on R:acme_chief::cert - T321309
  • 15:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45709 and previous config saved to /var/cache/conftool/dbconfig/20230309-152447-marostegui.json
  • 15:15 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:15 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for codfw cr links to cloudsw-b1-codfw. - cmooney@cumin1001"
  • 15:15 moritzm: installing PHP 7.3 security updates (as shipped in Debian)
  • 15:14 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS records for codfw cr links to cloudsw-b1-codfw. - cmooney@cumin1001"
  • 15:14 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:13 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:12 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:11 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:11 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T329203)', diff saved to https://phabricator.wikimedia.org/P45707 and previous config saved to /var/cache/conftool/dbconfig/20230309-151100-marostegui.json
  • 15:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 15:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 15:10 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:10 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T329260)', diff saved to https://phabricator.wikimedia.org/P45706 and previous config saved to /var/cache/conftool/dbconfig/20230309-150940-marostegui.json
  • 15:06 brett: Disable puppet on R:acme_chief::cert for acmechief maintenance - T321309
  • 15:04 zabe@deploy2002: Finished scap: Backport for Drop unused FlaggedRevs threshold level names (T277883) (duration: 10m 48s)
  • 15:04 TheresNoTime: close UTC afternoon backport window
  • 15:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on db[2135,2160].codfw.wmnet,db[1117,1176,1183].eqiad.wmnet with reason: m5 master switch T330847
  • 15:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:30:00 on db[2135,2160].codfw.wmnet,db[1117,1176,1183].eqiad.wmnet with reason: m5 master switch T330847
  • 15:01 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:01 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:00 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:00 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:56 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:55 zabe@deploy2002: awight and zabe: Backport for Drop unused FlaggedRevs threshold level names (T277883) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:55 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:54 zabe@deploy2002: Started scap: Backport for Drop unused FlaggedRevs threshold level names (T277883)
  • 14:34 moritzm: installing apr security updates
  • 14:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 14:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 14:30 jgiannelos@deploy2002: Finished deploy [restbase/deploy@f774711]: (no justification provided) (duration: 19m 03s)
  • 14:13 samtar@deploy2002: Finished scap: Backport for Bump parsoid parser cache writes to 50%. (T320534) (duration: 07m 28s)
  • 14:11 jgiannelos@deploy2002: Started deploy [restbase/deploy@f774711]: (no justification provided)
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T329260)', diff saved to https://phabricator.wikimedia.org/P45705 and previous config saved to /var/cache/conftool/dbconfig/20230309-140915-marostegui.json
  • 14:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T329260)', diff saved to https://phabricator.wikimedia.org/P45704 and previous config saved to /var/cache/conftool/dbconfig/20230309-140850-marostegui.json
  • 14:08 Emperor: testing disk-swap in ms-be1066 T329305
  • 14:07 samtar@deploy2002: daniel and samtar: Backport for Bump parsoid parser cache writes to 50%. (T320534) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:05 samtar@deploy2002: Started scap: Backport for Bump parsoid parser cache writes to 50%. (T320534)
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45703 and previous config saved to /var/cache/conftool/dbconfig/20230309-140510-marostegui.json
  • 14:00 aqu@deploy2002: Finished deploy [airflow-dags/analytics@9fba86b]: Upgrade to 2.5.1 from origin/T326194_airflow_deb_creation_with_gitlab_ci [airflow-dags@9fba86b] (duration: 00m 13s)
  • 14:00 aqu@deploy2002: Started deploy [airflow-dags/analytics@9fba86b]: Upgrade to 2.5.1 from origin/T326194_airflow_deb_creation_with_gitlab_ci [airflow-dags@9fba86b]
  • 13:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45702 and previous config saved to /var/cache/conftool/dbconfig/20230309-135343-marostegui.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45701 and previous config saved to /var/cache/conftool/dbconfig/20230309-135004-marostegui.json
  • 13:42 moritzm: restarting FPM/Apache on mw canaries to pick up curl updates
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45700 and previous config saved to /var/cache/conftool/dbconfig/20230309-133837-marostegui.json
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45699 and previous config saved to /var/cache/conftool/dbconfig/20230309-133458-marostegui.json
  • 13:34 moritzm: installing curl security updates
  • 13:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2135,2160].codfw.wmnet,db[1117,1176,1183].eqiad.wmnet with reason: Topology changes
  • 13:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2135,2160].codfw.wmnet,db[1117,1176,1183].eqiad.wmnet with reason: Topology changes
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T329260)', diff saved to https://phabricator.wikimedia.org/P45698 and previous config saved to /var/cache/conftool/dbconfig/20230309-132331-marostegui.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45697 and previous config saved to /var/cache/conftool/dbconfig/20230309-131951-marostegui.json
  • 13:17 vgutierrez: rolling restart of pybal in lvs2009 and lvs2010
  • 13:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T329260)', diff saved to https://phabricator.wikimedia.org/P45696 and previous config saved to /var/cache/conftool/dbconfig/20230309-131136-marostegui.json
  • 13:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 13:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 13:04 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:04 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: btullis-T331115 - btullis@cumin1001"
  • 13:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:03 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: btullis-T331115 - btullis@cumin1001"
  • 13:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45695 and previous config saved to /var/cache/conftool/dbconfig/20230309-130315-marostegui.json
  • 12:57 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=aqs,dc=codfw
  • 12:55 btullis@puppetmaster1001: conftool action : set/weight=10; selector: cluster=aqs,dc=codfw
  • 12:53 btullis@puppetmaster1001: conftool action : set/weight=10; selector: name=aqs2001.codfw.wmnet
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45694 and previous config saved to /var/cache/conftool/dbconfig/20230309-124809-marostegui.json
  • 12:46 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45693 and previous config saved to /var/cache/conftool/dbconfig/20230309-124025-marostegui.json
  • 12:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 12:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T329203)', diff saved to https://phabricator.wikimedia.org/P45692 and previous config saved to /var/cache/conftool/dbconfig/20230309-124004-marostegui.json
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45691 and previous config saved to /var/cache/conftool/dbconfig/20230309-123303-marostegui.json
  • 12:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45690 and previous config saved to /var/cache/conftool/dbconfig/20230309-123015-root.json
  • 12:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45689 and previous config saved to /var/cache/conftool/dbconfig/20230309-122458-marostegui.json
  • 12:22 moritzm: rebalancing ganeti eqiad/C after completion of bullseye updates T311687
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45688 and previous config saved to /var/cache/conftool/dbconfig/20230309-121756-marostegui.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45687 and previous config saved to /var/cache/conftool/dbconfig/20230309-121510-root.json
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45686 and previous config saved to /var/cache/conftool/dbconfig/20230309-120951-marostegui.json
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45685 and previous config saved to /var/cache/conftool/dbconfig/20230309-120559-marostegui.json
  • 12:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45684 and previous config saved to /var/cache/conftool/dbconfig/20230309-120537-marostegui.json
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45683 and previous config saved to /var/cache/conftool/dbconfig/20230309-120005-root.json
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T329203)', diff saved to https://phabricator.wikimedia.org/P45682 and previous config saved to /var/cache/conftool/dbconfig/20230309-115445-marostegui.json
  • 11:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45681 and previous config saved to /var/cache/conftool/dbconfig/20230309-115031-marostegui.json
  • 11:47 marostegui: Deploy schema change on s1 codfw dbmaint T329684
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45680 and previous config saved to /var/cache/conftool/dbconfig/20230309-114500-root.json
  • 11:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 11:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T329684)', diff saved to https://phabricator.wikimedia.org/P45679 and previous config saved to /var/cache/conftool/dbconfig/20230309-114338-marostegui.json
  • 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 11:40 moritzm: installing git security updates
  • 11:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45678 and previous config saved to /var/cache/conftool/dbconfig/20230309-113525-marostegui.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T329203)', diff saved to https://phabricator.wikimedia.org/P45677 and previous config saved to /var/cache/conftool/dbconfig/20230309-112804-marostegui.json
  • 11:28 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T329203)', diff saved to https://phabricator.wikimedia.org/P45676 and previous config saved to /var/cache/conftool/dbconfig/20230309-112739-marostegui.json
  • 11:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45675 and previous config saved to /var/cache/conftool/dbconfig/20230309-112019-marostegui.json
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45674 and previous config saved to /var/cache/conftool/dbconfig/20230309-111233-marostegui.json
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T329260)', diff saved to https://phabricator.wikimedia.org/P45673 and previous config saved to /var/cache/conftool/dbconfig/20230309-110827-marostegui.json
  • 11:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T329260)', diff saved to https://phabricator.wikimedia.org/P45672 and previous config saved to /var/cache/conftool/dbconfig/20230309-110806-marostegui.json
  • 11:01 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts
  • 11:01 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for 9 hosts
  • 11:00 otto@deploy2002: Synchronized wmf-config/InitialiseSettings.php: Step 2b: InitialiseSettings.php - remove duplicate configs - T308932 (duration: 06m 37s)
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45671 and previous config saved to /var/cache/conftool/dbconfig/20230309-105726-marostegui.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45670 and previous config saved to /var/cache/conftool/dbconfig/20230309-105259-marostegui.json
  • 10:50 otto@deploy2002: Synchronized wmf-config/ext-EventLogging.php: Step 2a: ext-EventLogging.php - remove duplicate configs - T308932 (duration: 06m 32s)
  • 10:47 topranks: Resetting PIC in slot 1/0 on cr2-codfw T331527
  • 10:45 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on 9 hosts with reason: cr2-codfw linecard 1/0 reset
  • 10:44 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on 9 hosts with reason: cr2-codfw linecard 1/0 reset
  • 10:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T329203)', diff saved to https://phabricator.wikimedia.org/P45669 and previous config saved to /var/cache/conftool/dbconfig/20230309-104220-marostegui.json
  • 10:39 otto@deploy2002: Synchronized multiversion/MWConfigCacheGenerator.php: Step 1b: MWConfigCacheGenerator.php - load ext-EventStreamConfig.php - T308932 (duration: 06m 23s)
  • 10:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45668 and previous config saved to /var/cache/conftool/dbconfig/20230309-103753-marostegui.json
  • 10:32 hashar@deploy2002: Finished deploy [integration/docroot@095a329]: Add 'Test coverage' link for MW core and a few others (duration: 00m 08s)
  • 10:32 hashar@deploy2002: Started deploy [integration/docroot@095a329]: Add 'Test coverage' link for MW core and a few others
  • 10:29 otto@deploy2002: Synchronized wmf-config/ext-EventStreamConfig.php: Step 1a: ext-EventStreamConfig.php - wgEventStreams lives here - T308932 (duration: 06m 43s)
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1011.eqiad.wmnet to cluster eqiad and group C
  • 10:26 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:26 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:25 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:24 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:23 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T329260)', diff saved to https://phabricator.wikimedia.org/P45667 and previous config saved to /var/cache/conftool/dbconfig/20230309-102247-marostegui.json
  • 10:22 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:22 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 9 hosts with reason: cr2-codfw linecard 1/0 reset
  • 10:22 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:22 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:22 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 9 hosts with reason: cr2-codfw linecard 1/0 reset
  • 10:21 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:21 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:20 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1011.eqiad.wmnet to cluster eqiad and group C
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
  • 10:19 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:19 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:13 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:13 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:13 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:13 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:12 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:11 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:11 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:11 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
  • 10:11 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:10 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T329260)', diff saved to https://phabricator.wikimedia.org/P45666 and previous config saved to /var/cache/conftool/dbconfig/20230309-101042-marostegui.json
  • 10:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 10:10 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1011.eqiad.wmnet to cluster eqiad and group C
  • 10:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T329260)', diff saved to https://phabricator.wikimedia.org/P45665 and previous config saved to /var/cache/conftool/dbconfig/20230309-101020-marostegui.json
  • 10:10 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1011.eqiad.wmnet to cluster eqiad and group C
  • 10:10 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T329203)', diff saved to https://phabricator.wikimedia.org/P45664 and previous config saved to /var/cache/conftool/dbconfig/20230309-100611-marostegui.json
  • 10:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 10:01 topranks: commencing work to drain cr2-codfw ports on card 1/0 (T331601)
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
  • 09:55 marostegui: Deploy schema change on s4 codfw dbmaint T329684
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45663 and previous config saved to /var/cache/conftool/dbconfig/20230309-095514-marostegui.json
  • 09:53 marostegui: Deploy schema change on s8 codfw dbmaint T329684
  • 09:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
  • 09:48 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts
  • 09:48 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for 9 hosts
  • 09:46 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45662 and previous config saved to /var/cache/conftool/dbconfig/20230309-094602-root.json
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45661 and previous config saved to /var/cache/conftool/dbconfig/20230309-094008-marostegui.json
  • 09:33 topranks: resetting Pic 1/0 on cr1-codfw
  • 09:32 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-codfw,cr2-codfw IPv6 with reason: cr1-codfw linecard 1/0 reset
  • 09:32 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-codfw,cr2-codfw IPv6 with reason: cr1-codfw linecard 1/0 reset
  • 09:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T329203)', diff saved to https://phabricator.wikimedia.org/P45660 and previous config saved to /var/cache/conftool/dbconfig/20230309-093120-marostegui.json
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45659 and previous config saved to /var/cache/conftool/dbconfig/20230309-093057-root.json
  • 09:29 elukey: delete old/unused ML-related docker images from the registry - T331513
  • 09:27 topranks: disabling Transit cct on cr1-codfw xe-1/0/1:0 (T331527)
  • 09:25 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on pfw3-codfw with reason: cr1-codfw linecard 1/0 reset
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T329260)', diff saved to https://phabricator.wikimedia.org/P45658 and previous config saved to /var/cache/conftool/dbconfig/20230309-092502-marostegui.json
  • 09:25 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on pfw3-codfw with reason: cr1-codfw linecard 1/0 reset
  • 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1011.eqiad.wmnet with OS bullseye
  • 09:21 jnuche@deploy2002: Installation of scap version "latest" completed for 553 hosts
  • 09:20 jnuche@deploy2002: Installing scap version "latest" for 553 hosts
  • 09:19 marostegui: Deploy schema change on s7 codfw dbmaint T329684
  • 09:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45657 and previous config saved to /var/cache/conftool/dbconfig/20230309-091613-marostegui.json
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45656 and previous config saved to /var/cache/conftool/dbconfig/20230309-091552-root.json
  • 09:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T329260)', diff saved to https://phabricator.wikimedia.org/P45655 and previous config saved to /var/cache/conftool/dbconfig/20230309-091400-marostegui.json
  • 09:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 09:13 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 6 hosts with reason: cr1-codfw linecard 1/0 reset
  • 09:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45654 and previous config saved to /var/cache/conftool/dbconfig/20230309-091338-marostegui.json
  • 09:13 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 6 hosts with reason: cr1-codfw linecard 1/0 reset
  • 09:12 cmooney@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on 10 hosts with reason: cr1-codfw linecard 1/0 reset
  • 09:12 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on 10 hosts with reason: cr1-codfw linecard 1/0 reset
  • 09:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1011.eqiad.wmnet with reason: host reimage
  • 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1011.eqiad.wmnet with reason: host reimage
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45653 and previous config saved to /var/cache/conftool/dbconfig/20230309-090107-marostegui.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45652 and previous config saved to /var/cache/conftool/dbconfig/20230309-090048-root.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45651 and previous config saved to /var/cache/conftool/dbconfig/20230309-085832-marostegui.json
  • 08:54 marostegui: Deploy schema change on s2 codfw dbmaint T329684
  • 08:54 marostegui: Deploy schema change on s5 codfw dbmaint T329684
  • 08:54 marostegui: Deploy schema change on s6 codfw dbmaint T329684
  • 08:51 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1011.eqiad.wmnet with OS bullseye
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T329203)', diff saved to https://phabricator.wikimedia.org/P45650 and previous config saved to /var/cache/conftool/dbconfig/20230309-084601-marostegui.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db2180 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45649 and previous config saved to /var/cache/conftool/dbconfig/20230309-084543-root.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T329684)', diff saved to https://phabricator.wikimedia.org/P45648 and previous config saved to /var/cache/conftool/dbconfig/20230309-084359-marostegui.json
  • 08:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45647 and previous config saved to /var/cache/conftool/dbconfig/20230309-084326-marostegui.json
  • 08:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:39 taavi@deploy2002: Finished scap: Backport for User impact: Work around MariaDB query planner bug (T331264), User impact: Work around MariaDB query planner bug (T331264) (duration: 11m 37s)
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45646 and previous config saved to /var/cache/conftool/dbconfig/20230309-083802-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45645 and previous config saved to /var/cache/conftool/dbconfig/20230309-083604-root.json
  • 08:33 moritzm: remove ganeti1011 for eventual reimage T311687
  • 08:30 taavi@deploy2002: taavi and kharlan: Backport for User impact: Work around MariaDB query planner bug (T331264), User impact: Work around MariaDB query planner bug (T331264) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45644 and previous config saved to /var/cache/conftool/dbconfig/20230309-082820-marostegui.json
  • 08:28 taavi@deploy2002: Started scap: Backport for User impact: Work around MariaDB query planner bug (T331264), User impact: Work around MariaDB query planner bug (T331264)
  • 08:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1011.eqiad.wmnet with reason: remove from cluster for reimage
  • 08:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1011.eqiad.wmnet with reason: remove from cluster for reimage
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45643 and previous config saved to /var/cache/conftool/dbconfig/20230309-082257-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45642 and previous config saved to /var/cache/conftool/dbconfig/20230309-082059-root.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45641 and previous config saved to /var/cache/conftool/dbconfig/20230309-081707-marostegui.json
  • 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T329260)', diff saved to https://phabricator.wikimedia.org/P45640 and previous config saved to /var/cache/conftool/dbconfig/20230309-081646-marostegui.json
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T329203)', diff saved to https://phabricator.wikimedia.org/P45639 and previous config saved to /var/cache/conftool/dbconfig/20230309-080858-marostegui.json
  • 08:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 08:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T329203)', diff saved to https://phabricator.wikimedia.org/P45638 and previous config saved to /var/cache/conftool/dbconfig/20230309-080837-marostegui.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45637 and previous config saved to /var/cache/conftool/dbconfig/20230309-080752-root.json
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45636 and previous config saved to /var/cache/conftool/dbconfig/20230309-080555-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45635 and previous config saved to /var/cache/conftool/dbconfig/20230309-080140-marostegui.json
  • 07:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45634 and previous config saved to /var/cache/conftool/dbconfig/20230309-075331-marostegui.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45633 and previous config saved to /var/cache/conftool/dbconfig/20230309-075247-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45632 and previous config saved to /var/cache/conftool/dbconfig/20230309-075050-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45631 and previous config saved to /var/cache/conftool/dbconfig/20230309-074633-marostegui.json
  • 07:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45630 and previous config saved to /var/cache/conftool/dbconfig/20230309-073825-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45629 and previous config saved to /var/cache/conftool/dbconfig/20230309-073743-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45628 and previous config saved to /var/cache/conftool/dbconfig/20230309-073545-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T329260)', diff saved to https://phabricator.wikimedia.org/P45627 and previous config saved to /var/cache/conftool/dbconfig/20230309-073127-marostegui.json
  • 07:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T329203)', diff saved to https://phabricator.wikimedia.org/P45626 and previous config saved to /var/cache/conftool/dbconfig/20230309-072319-marostegui.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45625 and previous config saved to /var/cache/conftool/dbconfig/20230309-072238-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2104 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45624 and previous config saved to /var/cache/conftool/dbconfig/20230309-072040-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T329684)', diff saved to https://phabricator.wikimedia.org/P45623 and previous config saved to /var/cache/conftool/dbconfig/20230309-071853-marostegui.json
  • 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 07:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T329260)', diff saved to https://phabricator.wikimedia.org/P45622 and previous config saved to /var/cache/conftool/dbconfig/20230309-071809-marostegui.json
  • 07:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 07:15 marostegui: Deploy schema change on s3 eqiad dbmaint T329684
  • 07:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Schema change
  • 07:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Schema change
  • 07:13 marostegui: Deploy schema change on s7 eqiad dbmaint T329684
  • 07:13 marostegui: Deploy schema change on s8 eqiad dbmaint T329684
  • 07:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3316', diff saved to https://phabricator.wikimedia.org/P45621 and previous config saved to /var/cache/conftool/dbconfig/20230309-071029-root.json
  • 07:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 07:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T329684)', diff saved to https://phabricator.wikimedia.org/P45620 and previous config saved to /var/cache/conftool/dbconfig/20230309-070805-marostegui.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db2117 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P45619 and previous config saved to /var/cache/conftool/dbconfig/20230309-070733-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T329684)', diff saved to https://phabricator.wikimedia.org/P45618 and previous config saved to /var/cache/conftool/dbconfig/20230309-070658-marostegui.json
  • 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T329684)', diff saved to https://phabricator.wikimedia.org/P45617 and previous config saved to /var/cache/conftool/dbconfig/20230309-070327-marostegui.json
  • 07:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T329684)', diff saved to https://phabricator.wikimedia.org/P45616 and previous config saved to /var/cache/conftool/dbconfig/20230309-070223-marostegui.json
  • 07:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 06:48 marostegui: Deploy schema change on s1 eqiad dbmaint T329684
  • 06:48 marostegui: Deploy schema change on s4 eqiad dbmaint T329684
  • 06:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 06:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 06:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T329203)', diff saved to https://phabricator.wikimedia.org/P45615 and previous config saved to /var/cache/conftool/dbconfig/20230309-064538-marostegui.json
  • 06:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 06:43 marostegui: Deploy schema change on s2 eqiad dbmaint T329684
  • 06:42 marostegui: Deploy schema change on s5 eqiad dbmaint T329684
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Schema change
  • 06:40 marostegui: Deploy schema change on s6 eqiad dbmaint T329684
  • 06:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Schema change
  • 06:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 06:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 06:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 04:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T329260)', diff saved to https://phabricator.wikimedia.org/P45614 and previous config saved to /var/cache/conftool/dbconfig/20230309-040925-marostegui.json
  • 03:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P45613 and previous config saved to /var/cache/conftool/dbconfig/20230309-035418-marostegui.json
  • 03:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P45612 and previous config saved to /var/cache/conftool/dbconfig/20230309-033912-marostegui.json
  • 03:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T329260)', diff saved to https://phabricator.wikimedia.org/P45611 and previous config saved to /var/cache/conftool/dbconfig/20230309-032406-marostegui.json
  • 03:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T329260)', diff saved to https://phabricator.wikimedia.org/P45610 and previous config saved to /var/cache/conftool/dbconfig/20230309-030445-marostegui.json
  • 03:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 03:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 03:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45609 and previous config saved to /var/cache/conftool/dbconfig/20230309-030424-marostegui.json
  • 02:59 sukhe: run keyholder arm on acmechief2001
  • 02:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P45608 and previous config saved to /var/cache/conftool/dbconfig/20230309-024917-marostegui.json
  • 02:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P45607 and previous config saved to /var/cache/conftool/dbconfig/20230309-023411-marostegui.json
  • 02:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45606 and previous config saved to /var/cache/conftool/dbconfig/20230309-021905-marostegui.json
  • 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45604 and previous config saved to /var/cache/conftool/dbconfig/20230309-015831-marostegui.json
  • 01:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45603 and previous config saved to /var/cache/conftool/dbconfig/20230309-015810-marostegui.json
  • 01:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P45602 and previous config saved to /var/cache/conftool/dbconfig/20230309-014303-marostegui.json
  • 01:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P45601 and previous config saved to /var/cache/conftool/dbconfig/20230309-012757-marostegui.json
  • 01:18 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@558da74]: correct eventgate datacenter partitioning in sensors (duration: 00m 13s)
  • 01:18 ebernhardson@deploy2002: Started deploy [airflow-dags/search@558da74]: correct eventgate datacenter partitioning in sensors
  • 01:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45600 and previous config saved to /var/cache/conftool/dbconfig/20230309-011251-marostegui.json
  • 00:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T329260)', diff saved to https://phabricator.wikimedia.org/P45599 and previous config saved to /var/cache/conftool/dbconfig/20230309-005220-marostegui.json
  • 00:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 00:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 00:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T329260)', diff saved to https://phabricator.wikimedia.org/P45598 and previous config saved to /var/cache/conftool/dbconfig/20230309-005210-marostegui.json
  • 00:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P45597 and previous config saved to /var/cache/conftool/dbconfig/20230309-003703-marostegui.json
  • 00:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P45596 and previous config saved to /var/cache/conftool/dbconfig/20230309-002157-marostegui.json
  • 00:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T329260)', diff saved to https://phabricator.wikimedia.org/P45594 and previous config saved to /var/cache/conftool/dbconfig/20230309-000651-marostegui.json

2023-03-08

  • 23:50 zabe@deploy2002: Finished scap: T308932 (duration: 07m 15s)
  • 23:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T329260)', diff saved to https://phabricator.wikimedia.org/P45593 and previous config saved to /var/cache/conftool/dbconfig/20230308-234534-marostegui.json
  • 23:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 23:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 23:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T329260)', diff saved to https://phabricator.wikimedia.org/P45592 and previous config saved to /var/cache/conftool/dbconfig/20230308-234502-marostegui.json
  • 23:42 zabe@deploy2002: Started scap: T308932
  • 23:42 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@29f73a4]: update virtualenv entry_points to use relative paths (duration: 00m 14s)
  • 23:42 ebernhardson@deploy2002: Started deploy [airflow-dags/search@29f73a4]: update virtualenv entry_points to use relative paths
  • 23:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P45591 and previous config saved to /var/cache/conftool/dbconfig/20230308-232956-marostegui.json
  • 23:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P45590 and previous config saved to /var/cache/conftool/dbconfig/20230308-231449-marostegui.json
  • 22:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T329260)', diff saved to https://phabricator.wikimedia.org/P45589 and previous config saved to /var/cache/conftool/dbconfig/20230308-225943-marostegui.json
  • 22:44 hashar: Upgrading CI Jenkins
  • 22:42 tgr: UTC late deploys done
  • 22:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T329260)', diff saved to https://phabricator.wikimedia.org/P45588 and previous config saved to /var/cache/conftool/dbconfig/20230308-224044-marostegui.json
  • 22:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 22:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 22:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 22:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 22:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T329260)', diff saved to https://phabricator.wikimedia.org/P45587 and previous config saved to /var/cache/conftool/dbconfig/20230308-224018-marostegui.json
  • 22:39 tgr@deploy2002: Finished scap: Backport for Leveling up: check if the task type is registered before increasing its edit count (T331524), Leveling up: check if the task type is registered before increasing its edit count (T331524) (duration: 08m 31s)
  • 22:32 tgr@deploy2002: tgr: Backport for Leveling up: check if the task type is registered before increasing its edit count (T331524), Leveling up: check if the task type is registered before increasing its edit count (T331524) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 22:30 tgr@deploy2002: Started scap: Backport for Leveling up: check if the task type is registered before increasing its edit count (T331524), Leveling up: check if the task type is registered before increasing its edit count (T331524)
  • 22:29 tgr@deploy2002: Finished scap: Backport for maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412), maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412) (duration: 07m 43s)
  • 22:27 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 22:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P45586 and previous config saved to /var/cache/conftool/dbconfig/20230308-222512-marostegui.json
  • 22:23 tgr@deploy2002: tgr: Backport for maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412), maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 22:21 tgr@deploy2002: Started scap: Backport for maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412), maintenance: Adjust query builder to account for no secondary namespaces (T321983 T331412)
  • 22:21 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 22:20 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 22:12 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 22:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P45585 and previous config saved to /var/cache/conftool/dbconfig/20230308-221006-marostegui.json
  • 22:09 kindrobot: hand off backport window UTC late to tgr for self-service
  • 22:07 kindrobot@deploy2002: Finished scap: Backport for Enable new Linter UI for namespace, tag and template for all wikis (T299612) (duration: 09m 36s)
  • 21:59 kindrobot@deploy2002: sbailey and kindrobot: Backport for Enable new Linter UI for namespace, tag and template for all wikis (T299612) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:57 kindrobot@deploy2002: Started scap: Backport for Enable new Linter UI for namespace, tag and template for all wikis (T299612)
  • 21:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T329260)', diff saved to https://phabricator.wikimedia.org/P45584 and previous config saved to /var/cache/conftool/dbconfig/20230308-215500-marostegui.json
  • 21:54 kindrobot@deploy2002: Finished scap: Backport for Switch order of "Add topic" and language dropdown (T267444), Release DiscussionTools on mobile on enwiki (T328942), Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588) (duration: 07m 49s)
  • 21:48 kindrobot@deploy2002: kemayo and kindrobot and esanders: Backport for Switch order of "Add topic" and language dropdown (T267444), Release DiscussionTools on mobile on enwiki (T328942), Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.cod
  • 21:46 kindrobot@deploy2002: Started scap: Backport for Switch order of "Add topic" and language dropdown (T267444), Release DiscussionTools on mobile on enwiki (T328942), Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588)
  • 21:37 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sessionstore1001.eqiad.wmnet
  • 21:31 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host sessionstore1001.eqiad.wmnet
  • 21:30 kindrobot@deploy2002: kemayo and kindrobot and esanders: Backport for Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588), Release DiscussionTools on mobile on enwiki (T328942), Switch order of "Add topic" and language dropdown (T267444) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqi
  • 21:29 kindrobot@deploy2002: Started scap: Backport for Enable history page visual diffs everywhere except Wikipedias and Wiktionaries (T314588), Release DiscussionTools on mobile on enwiki (T328942), Switch order of "Add topic" and language dropdown (T267444)
  • 21:22 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@3419b7d]: test deploy after deployment fix (duration: 00m 05s)
  • 21:22 ebernhardson@deploy2002: Started deploy [airflow-dags/search@3419b7d]: test deploy after deployment fix
  • 21:19 kindrobot: start UTC-late backport window
  • 21:08 hashar@deploy2002: Finished deploy [releng/jenkins-deploy@0e465ac] (releasing): (no justification provided) (duration: 01m 01s)
  • 21:07 hashar@deploy2002: Started deploy [releng/jenkins-deploy@0e465ac] (releasing): (no justification provided)
  • 20:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T329260)', diff saved to https://phabricator.wikimedia.org/P45583 and previous config saved to /var/cache/conftool/dbconfig/20230308-205435-marostegui.json
  • 20:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 20:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 20:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T329260)', diff saved to https://phabricator.wikimedia.org/P45582 and previous config saved to /var/cache/conftool/dbconfig/20230308-205414-marostegui.json
  • 20:51 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host acmechief2001.codfw.wmnet with OS bullseye
  • 20:41 mutante: deploy2002 - systemctl restart keyholder-proxy.service to fix T331568 - after this SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh -i /etc/keyholder.d/deploy_jenkins -l deploy-jenkins releases1002.eqiad.wmnet works
  • 20:39 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief2001.codfw.wmnet with reason: host reimage
  • 20:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P45581 and previous config saved to /var/cache/conftool/dbconfig/20230308-203907-marostegui.json
  • 20:36 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief2001.codfw.wmnet with reason: host reimage
  • 20:24 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host acmechief2001.codfw.wmnet with OS bullseye
  • 20:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P45580 and previous config saved to /var/cache/conftool/dbconfig/20230308-202401-marostegui.json
  • 20:18 urandom: power cycle restbase2022 (unresponsive; cannot SSH)
  • 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T329260)', diff saved to https://phabricator.wikimedia.org/P45579 and previous config saved to /var/cache/conftool/dbconfig/20230308-200855-marostegui.json
  • 20:01 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host acmechief-test1001.eqiad.wmnet with OS bullseye
  • 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T329260)', diff saved to https://phabricator.wikimedia.org/P45578 and previous config saved to /var/cache/conftool/dbconfig/20230308-194646-marostegui.json
  • 19:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 19:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T329260)', diff saved to https://phabricator.wikimedia.org/P45577 and previous config saved to /var/cache/conftool/dbconfig/20230308-194625-marostegui.json
  • 19:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief-test1001.eqiad.wmnet with reason: host reimage
  • 19:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief-test1001.eqiad.wmnet with reason: host reimage
  • 19:31 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host acmechief-test1001.eqiad.wmnet with OS bullseye
  • 19:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P45576 and previous config saved to /var/cache/conftool/dbconfig/20230308-193118-marostegui.json
  • 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P45575 and previous config saved to /var/cache/conftool/dbconfig/20230308-191612-marostegui.json
  • 19:16 jhuneidi@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.26 refs T330204 (duration: 06m 16s)
  • 19:14 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:14 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse entries for new links from CRs to cloudsw1-b1-codfw. - cmooney@cumin1001"
  • 19:13 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse entries for new links from CRs to cloudsw1-b1-codfw. - cmooney@cumin1001"
  • 19:09 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.26 refs T330204
  • 19:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for acmechief-test2001.codfw.wmnet
  • 19:09 sukhe@cumin2002: START - Cookbook sre.hosts.remove-downtime for acmechief-test2001.codfw.wmnet
  • 19:08 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T329260)', diff saved to https://phabricator.wikimedia.org/P45574 and previous config saved to /var/cache/conftool/dbconfig/20230308-190106-marostegui.json
  • 18:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T328817)', diff saved to https://phabricator.wikimedia.org/P45573 and previous config saved to /var/cache/conftool/dbconfig/20230308-184328-marostegui.json
  • 18:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T329260)', diff saved to https://phabricator.wikimedia.org/P45572 and previous config saved to /var/cache/conftool/dbconfig/20230308-184204-marostegui.json
  • 18:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 18:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T329260)', diff saved to https://phabricator.wikimedia.org/P45571 and previous config saved to /var/cache/conftool/dbconfig/20230308-184143-marostegui.json
  • 18:36 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T318605)', diff saved to https://phabricator.wikimedia.org/P45570 and previous config saved to /var/cache/conftool/dbconfig/20230308-183020-ladsgroup.json
  • 18:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45569 and previous config saved to /var/cache/conftool/dbconfig/20230308-182822-marostegui.json
  • 18:28 inflatador: bking@cumin2002 repool elastic1060-1066 to finish off T322082
  • 18:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T329203)', diff saved to https://phabricator.wikimedia.org/P45568 and previous config saved to /var/cache/conftool/dbconfig/20230308-182726-marostegui.json
  • 18:27 inflatador: bking@cumin2002 unban elastic1060-1066 to finish off T322082
  • 18:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P45567 and previous config saved to /var/cache/conftool/dbconfig/20230308-182637-marostegui.json
  • 18:26 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:20 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:19 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update locatoin of elastic1064-65 - bking@cumin2002 - T322082"
  • 18:18 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update locatoin of elastic1064-65 - bking@cumin2002 - T322082"
  • 18:16 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:16 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host acmechief-test2001.codfw.wmnet with OS bullseye
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P45566 and previous config saved to /var/cache/conftool/dbconfig/20230308-181514-ladsgroup.json
  • 18:14 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:13 bking@cumin2002: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "update locatoin of elastic1065 - bking@cumin2002 - T322082"
  • 18:13 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update locatoin of elastic1065 - bking@cumin2002 - T322082"
  • 18:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45565 and previous config saved to /var/cache/conftool/dbconfig/20230308-181316-marostegui.json
  • 18:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45564 and previous config saved to /var/cache/conftool/dbconfig/20230308-181220-marostegui.json
  • 18:12 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update locatoin of elastic1064 - bking@cumin2002 - T322082"
  • 18:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P45563 and previous config saved to /var/cache/conftool/dbconfig/20230308-181131-marostegui.json
  • 18:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:09 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 18:09 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:05 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update locatoin of elastic1064 - bking@cumin2002 - T322082"
  • 18:05 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update location of elastic1066 - bking@cumin2002 - T322082"
  • 18:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1064.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 18:02 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 18:02 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1065.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 18:02 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 18:02 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 18:00 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 18:00 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109', diff saved to https://phabricator.wikimedia.org/P45562 and previous config saved to /var/cache/conftool/dbconfig/20230308-180008-ladsgroup.json
  • 17:59 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 17:59 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 17:59 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update location of elastic1066 - bking@cumin2002 - T322082"
  • 17:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on acmechief-test2001.codfw.wmnet with reason: host reimage
  • 17:58 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 17:58 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 17:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T328817)', diff saved to https://phabricator.wikimedia.org/P45561 and previous config saved to /var/cache/conftool/dbconfig/20230308-175810-marostegui.json
  • 17:58 herron: failing grafana over from codfw to eqiad
  • 17:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45560 and previous config saved to /var/cache/conftool/dbconfig/20230308-175714-marostegui.json
  • 17:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on acmechief-test2001.codfw.wmnet with reason: host reimage
  • 17:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T329260)', diff saved to https://phabricator.wikimedia.org/P45559 and previous config saved to /var/cache/conftool/dbconfig/20230308-175625-marostegui.json
  • 17:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1066.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 17:51 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:51 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:48 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1066.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 17:47 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1064.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 17:47 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host acmechief-test2001.codfw.wmnet with OS bullseye
  • 17:46 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1064.eqiad.wmnet']
  • 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T328817)', diff saved to https://phabricator.wikimedia.org/P45558 and previous config saved to /var/cache/conftool/dbconfig/20230308-174535-marostegui.json
  • 17:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 17:45 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1065.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 17:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 17:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T328817)', diff saved to https://phabricator.wikimedia.org/P45557 and previous config saved to /var/cache/conftool/dbconfig/20230308-174514-marostegui.json
  • 17:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1109 (T318605)', diff saved to https://phabricator.wikimedia.org/P45556 and previous config saved to /var/cache/conftool/dbconfig/20230308-174501-ladsgroup.json
  • 17:43 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1066.eqiad.wmnet']
  • 17:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T329203)', diff saved to https://phabricator.wikimedia.org/P45555 and previous config saved to /var/cache/conftool/dbconfig/20230308-174208-marostegui.json
  • 17:38 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic1065.eqiad.wmnet']
  • 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T329260)', diff saved to https://phabricator.wikimedia.org/P45554 and previous config saved to /var/cache/conftool/dbconfig/20230308-173701-marostegui.json
  • 17:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 17:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 17:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T329260)', diff saved to https://phabricator.wikimedia.org/P45553 and previous config saved to /var/cache/conftool/dbconfig/20230308-173640-marostegui.json
  • 17:34 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1066.eqiad.wmnet']
  • 17:34 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1066.eqiad.wmnet']
  • 17:31 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1064.eqiad.wmnet']
  • 17:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T329203)', diff saved to https://phabricator.wikimedia.org/P45552 and previous config saved to /var/cache/conftool/dbconfig/20230308-173125-marostegui.json
  • 17:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 17:31 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1065.eqiad.wmnet']
  • 17:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 17:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45551 and previous config saved to /var/cache/conftool/dbconfig/20230308-173104-marostegui.json
  • 17:31 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1065.eqiad.wmnet']
  • 17:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45550 and previous config saved to /var/cache/conftool/dbconfig/20230308-173007-marostegui.json
  • 17:28 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1064.eqiad.wmnet']
  • 17:26 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1066.eqiad.wmnet']
  • 17:21 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1065.eqiad.wmnet']
  • 17:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P45549 and previous config saved to /var/cache/conftool/dbconfig/20230308-172134-marostegui.json
  • 17:21 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1064.eqiad.wmnet']
  • 17:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45548 and previous config saved to /var/cache/conftool/dbconfig/20230308-171558-marostegui.json
  • 17:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45547 and previous config saved to /var/cache/conftool/dbconfig/20230308-171501-marostegui.json
  • 17:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P45546 and previous config saved to /var/cache/conftool/dbconfig/20230308-170627-marostegui.json
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1109 (T318605)', diff saved to https://phabricator.wikimedia.org/P45545 and previous config saved to /var/cache/conftool/dbconfig/20230308-170512-ladsgroup.json
  • 17:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1109.eqiad.wmnet with reason: Maintenance
  • 17:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45543 and previous config saved to /var/cache/conftool/dbconfig/20230308-170051-marostegui.json
  • 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T328817)', diff saved to https://phabricator.wikimedia.org/P45542 and previous config saved to /var/cache/conftool/dbconfig/20230308-165955-marostegui.json
  • 16:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1063.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T329260)', diff saved to https://phabricator.wikimedia.org/P45541 and previous config saved to /var/cache/conftool/dbconfig/20230308-165121-marostegui.json
  • 16:49 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T328817)', diff saved to https://phabricator.wikimedia.org/P45540 and previous config saved to /var/cache/conftool/dbconfig/20230308-164807-marostegui.json
  • 16:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 16:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 16:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T328817)', diff saved to https://phabricator.wikimedia.org/P45539 and previous config saved to /var/cache/conftool/dbconfig/20230308-164746-marostegui.json
  • 16:47 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45538 and previous config saved to /var/cache/conftool/dbconfig/20230308-164545-marostegui.json
  • 16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:35 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1063.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:34 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update location of elastic1062 - bking@cumin2002 - T322082"
  • 16:34 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update location of elastic1062 - bking@cumin2002 - T322082"
  • 16:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45537 and previous config saved to /var/cache/conftool/dbconfig/20230308-163311-marostegui.json
  • 16:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 16:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T329203)', diff saved to https://phabricator.wikimedia.org/P45536 and previous config saved to /var/cache/conftool/dbconfig/20230308-163249-marostegui.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45535 and previous config saved to /var/cache/conftool/dbconfig/20230308-163240-marostegui.json
  • 16:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T329260)', diff saved to https://phabricator.wikimedia.org/P45534 and previous config saved to /var/cache/conftool/dbconfig/20230308-163230-marostegui.json
  • 16:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 16:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 16:29 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update locatoin of elastic1060 - bking@cumin2002 - T322082"
  • 16:28 hnowlan@puppetmaster1001: conftool action : set/pooled=inactive; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 16:28 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update locatoin of elastic1060 - bking@cumin2002 - T322082"
  • 16:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update location of elastic1061 - bking@cumin2002 - T322082"
  • 16:25 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1063.eqiad.wmnet']
  • 16:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update location of elastic1061 - bking@cumin2002 - T322082"
  • 16:22 hnowlan@puppetmaster1001: conftool action : set/pooled=yes:weight=10; selector: service=thumbor,name=kubernetes201[0123].codfw.wmnet
  • 16:22 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1060.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1061.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45533 and previous config saved to /var/cache/conftool/dbconfig/20230308-161737-marostegui.json
  • 16:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45532 and previous config saved to /var/cache/conftool/dbconfig/20230308-161727-marostegui.json
  • 16:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 16:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 16:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1062.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:10 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1062.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic1062.eqiad.wmnet with reason: re-rack
  • 16:08 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on elastic1062.eqiad.wmnet with reason: re-rack
  • 16:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1062.eqiad.wmnet
  • 16:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on elastic1061.eqiad.wmnet with reason: re-rack
  • 16:06 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on elastic1061.eqiad.wmnet with reason: re-rack
  • 16:05 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Muehlenhoff out of all services on: 4 hosts
  • 16:05 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Muehlenhoff out of all services on: 4 hosts
  • 16:03 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1060.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45531 and previous config saved to /var/cache/conftool/dbconfig/20230308-160231-marostegui.json
  • 16:02 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1061.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T328817)', diff saved to https://phabricator.wikimedia.org/P45530 and previous config saved to /var/cache/conftool/dbconfig/20230308-160221-marostegui.json
  • 16:00 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1062.eqiad.wmnet
  • 16:00 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1061.eqiad.wmnet
  • 15:59 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic1062.eqiad.wmnet']
  • 15:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 15:55 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1063.eqiad.wmnet']
  • 15:54 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1063.eqiad.wmnet']
  • 15:54 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic1061.eqiad.wmnet
  • 15:52 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1062.eqiad.wmnet']
  • 15:52 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1062.eqiad.wmnet']
  • 15:50 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:49 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T328817)', diff saved to https://phabricator.wikimedia.org/P45529 and previous config saved to /var/cache/conftool/dbconfig/20230308-154736-marostegui.json
  • 15:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T329203)', diff saved to https://phabricator.wikimedia.org/P45528 and previous config saved to /var/cache/conftool/dbconfig/20230308-154724-marostegui.json
  • 15:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 15:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45527 and previous config saved to /var/cache/conftool/dbconfig/20230308-154709-marostegui.json
  • 15:46 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1063.eqiad.wmnet']
  • 15:42 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1062.eqiad.wmnet']
  • 15:33 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1061.eqiad.wmnet']
  • 15:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45526 and previous config saved to /var/cache/conftool/dbconfig/20230308-153202-marostegui.json
  • 15:31 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
  • 15:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 15:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2113.codfw.wmnet with reason: Maintenance
  • 15:26 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1061.eqiad.wmnet']
  • 15:23 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
  • 15:22 otto@deploy2002: Synchronized wmf-config/ext-EventLogging.php: wgEventStreams - Fix typo in rc1.enrichment.mediawiki_page_content_change.error stream - T326536 (duration: 06m 41s)
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45525 and previous config saved to /var/cache/conftool/dbconfig/20230308-151656-marostegui.json
  • 15:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 15:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:06 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 15:05 otto@deploy2002: Synchronized wmf-config/ext-EventLogging.php: wgEventStreams - Declare rc1.enrichment.mediawiki_page_content_change.error stream - T326536 (duration: 11m 33s)
  • 15:05 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:04 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45524 and previous config saved to /var/cache/conftool/dbconfig/20230308-150150-marostegui.json
  • 14:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T329260)', diff saved to https://phabricator.wikimedia.org/P45523 and previous config saved to /var/cache/conftool/dbconfig/20230308-145245-marostegui.json
  • 14:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:52 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:50 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45522 and previous config saved to /var/cache/conftool/dbconfig/20230308-144934-marostegui.json
  • 14:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 14:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 14:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45521 and previous config saved to /var/cache/conftool/dbconfig/20230308-144924-marostegui.json
  • 14:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T329203)', diff saved to https://phabricator.wikimedia.org/P45520 and previous config saved to /var/cache/conftool/dbconfig/20230308-144659-marostegui.json
  • 14:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 14:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T329203)', diff saved to https://phabricator.wikimedia.org/P45519 and previous config saved to /var/cache/conftool/dbconfig/20230308-144634-marostegui.json
  • 14:46 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:46 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:45 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:44 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:43 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:42 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:42 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:42 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:41 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:41 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:41 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:40 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:40 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:39 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:39 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:38 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:37 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P45518 and previous config saved to /var/cache/conftool/dbconfig/20230308-143739-marostegui.json
  • 14:37 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:36 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:35 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:35 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:34 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:34 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45517 and previous config saved to /var/cache/conftool/dbconfig/20230308-143418-marostegui.json
  • 14:34 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:33 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:32 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:32 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:32 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 14:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45516 and previous config saved to /var/cache/conftool/dbconfig/20230308-143127-marostegui.json
  • 14:25 inflatador: bking@cumin2002 powering down elastic1060-66 for re-rack T322082
  • 14:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P45514 and previous config saved to /var/cache/conftool/dbconfig/20230308-142233-marostegui.json
  • 14:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45513 and previous config saved to /var/cache/conftool/dbconfig/20230308-141911-marostegui.json
  • 14:16 TheresNoTime: close UTC afternoon backport window
  • 14:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45511 and previous config saved to /var/cache/conftool/dbconfig/20230308-141621-marostegui.json
  • 14:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T329260)', diff saved to https://phabricator.wikimedia.org/P45510 and previous config saved to /var/cache/conftool/dbconfig/20230308-140727-marostegui.json
  • 14:07 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45509 and previous config saved to /var/cache/conftool/dbconfig/20230308-140405-marostegui.json
  • 14:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T329203)', diff saved to https://phabricator.wikimedia.org/P45508 and previous config saved to /var/cache/conftool/dbconfig/20230308-140115-marostegui.json
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T328817)', diff saved to https://phabricator.wikimedia.org/P45507 and previous config saved to /var/cache/conftool/dbconfig/20230308-135153-marostegui.json
  • 13:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 13:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 13:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T328817)', diff saved to https://phabricator.wikimedia.org/P45506 and previous config saved to /var/cache/conftool/dbconfig/20230308-135132-marostegui.json
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T329203)', diff saved to https://phabricator.wikimedia.org/P45505 and previous config saved to /var/cache/conftool/dbconfig/20230308-134945-marostegui.json
  • 13:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 13:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 13:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45504 and previous config saved to /var/cache/conftool/dbconfig/20230308-134034-marostegui.json
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T329260)', diff saved to https://phabricator.wikimedia.org/P45503 and previous config saved to /var/cache/conftool/dbconfig/20230308-134002-marostegui.json
  • 13:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 13:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 13:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45502 and previous config saved to /var/cache/conftool/dbconfig/20230308-133940-marostegui.json
  • 13:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45501 and previous config saved to /var/cache/conftool/dbconfig/20230308-133626-marostegui.json
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45500 and previous config saved to /var/cache/conftool/dbconfig/20230308-132528-marostegui.json
  • 13:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P45499 and previous config saved to /var/cache/conftool/dbconfig/20230308-132434-marostegui.json
  • 13:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45498 and previous config saved to /var/cache/conftool/dbconfig/20230308-132120-marostegui.json
  • 13:18 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:18 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:13 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host urldownloader1003.wikimedia.org with OS bullseye
  • 13:11 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: sync
  • 13:11 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: sync
  • 13:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45497 and previous config saved to /var/cache/conftool/dbconfig/20230308-131022-marostegui.json
  • 13:10 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: sync
  • 13:10 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: sync
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P45496 and previous config saved to /var/cache/conftool/dbconfig/20230308-130928-marostegui.json
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T328817)', diff saved to https://phabricator.wikimedia.org/P45495 and previous config saved to /var/cache/conftool/dbconfig/20230308-130613-marostegui.json
  • 13:02 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:02 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:00 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T328817)', diff saved to https://phabricator.wikimedia.org/P45494 and previous config saved to /var/cache/conftool/dbconfig/20230308-125548-marostegui.json
  • 12:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T328817)', diff saved to https://phabricator.wikimedia.org/P45493 and previous config saved to /var/cache/conftool/dbconfig/20230308-125527-marostegui.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45492 and previous config saved to /var/cache/conftool/dbconfig/20230308-125515-marostegui.json
  • 12:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45491 and previous config saved to /var/cache/conftool/dbconfig/20230308-125422-marostegui.json
  • 12:50 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45490 and previous config saved to /var/cache/conftool/dbconfig/20230308-124945-marostegui.json
  • 12:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T329260)', diff saved to https://phabricator.wikimedia.org/P45489 and previous config saved to /var/cache/conftool/dbconfig/20230308-124924-marostegui.json
  • 12:48 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:48 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mediawiki-page-content-change-enrichment: apply
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45488 and previous config saved to /var/cache/conftool/dbconfig/20230308-124344-marostegui.json
  • 12:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45487 and previous config saved to /var/cache/conftool/dbconfig/20230308-124334-marostegui.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45486 and previous config saved to /var/cache/conftool/dbconfig/20230308-124021-marostegui.json
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P45485 and previous config saved to /var/cache/conftool/dbconfig/20230308-123418-marostegui.json
  • 12:31 hnowlan: running authdns-update for r/890398
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45484 and previous config saved to /var/cache/conftool/dbconfig/20230308-122827-marostegui.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45483 and previous config saved to /var/cache/conftool/dbconfig/20230308-122515-marostegui.json
  • 12:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:22 hnowlan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for device-analytics - hnowlan@cumin1001"
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P45482 and previous config saved to /var/cache/conftool/dbconfig/20230308-121912-marostegui.json
  • 12:17 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1039.eqiad.wmnet with OS bullseye
  • 12:14 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader1003.wikimedia.org with OS bullseye
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45480 and previous config saved to /var/cache/conftool/dbconfig/20230308-121321-marostegui.json
  • 12:10 hnowlan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add service records for device-analytics - hnowlan@cumin1001"
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T328817)', diff saved to https://phabricator.wikimedia.org/P45479 and previous config saved to /var/cache/conftool/dbconfig/20230308-121009-marostegui.json
  • 12:09 jmm@cumin2002: END (ERROR) - Cookbook sre.ganeti.reimage (exit_code=97) for host urldownloader1003.wikimedia.org with OS bullseye
  • 12:08 hnowlan@cumin1001: START - Cookbook sre.dns.netbox
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T329260)', diff saved to https://phabricator.wikimedia.org/P45478 and previous config saved to /var/cache/conftool/dbconfig/20230308-120406-marostegui.json
  • 12:01 claime: restbase-async back in standard state - T330651
  • 12:01 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
  • 12:00 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) depool restbase-async in codfw: T330651
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T329260)', diff saved to https://phabricator.wikimedia.org/P45477 and previous config saved to /var/cache/conftool/dbconfig/20230308-115935-marostegui.json
  • 11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T328817)', diff saved to https://phabricator.wikimedia.org/P45476 and previous config saved to /var/cache/conftool/dbconfig/20230308-115924-marostegui.json
  • 11:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45475 and previous config saved to /var/cache/conftool/dbconfig/20230308-115913-marostegui.json
  • 11:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T328817)', diff saved to https://phabricator.wikimedia.org/P45474 and previous config saved to /var/cache/conftool/dbconfig/20230308-115903-marostegui.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45473 and previous config saved to /var/cache/conftool/dbconfig/20230308-115815-marostegui.json
  • 11:57 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
  • 11:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
  • 11:55 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
  • 11:55 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route depool restbase-async in codfw: T330651
  • 11:55 claime: restbase-async pooled in eqiad, depooling in codfw- T330651
  • 11:54 cgoubert@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0) pool restbase-async in eqiad: T330651
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P45472 and previous config saved to /var/cache/conftool/dbconfig/20230308-115252-root.json
  • 11:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) restbase-async.discovery.wmnet on all recursors
  • 11:49 cgoubert@cumin1001: START - Cookbook sre.dns.wipe-cache restbase-async.discovery.wmnet on all recursors
  • 11:49 cgoubert@cumin1001: START - Cookbook sre.discovery.service-route pool restbase-async in eqiad: T330651
  • 11:49 otto@deploy2002: Finished deploy [analytics/refinery@d4aaff9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d4aaff9] (duration: 01m 30s)
  • 11:48 claime: Starting restbase-async switchback - T330651
  • 11:47 otto@deploy2002: Started deploy [analytics/refinery@d4aaff9] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d4aaff9]
  • 11:47 otto@deploy2002: Finished deploy [analytics/refinery@d4aaff9] (thin): Regular analytics weekly train THIN [analytics/refinery@d4aaff9] (duration: 00m 07s)
  • 11:47 otto@deploy2002: Started deploy [analytics/refinery@d4aaff9] (thin): Regular analytics weekly train THIN [analytics/refinery@d4aaff9]
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T329203)', diff saved to https://phabricator.wikimedia.org/P45471 and previous config saved to /var/cache/conftool/dbconfig/20230308-114652-marostegui.json
  • 11:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T329203)', diff saved to https://phabricator.wikimedia.org/P45470 and previous config saved to /var/cache/conftool/dbconfig/20230308-114642-marostegui.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1113:3315', diff saved to https://phabricator.wikimedia.org/P45469 and previous config saved to /var/cache/conftool/dbconfig/20230308-114553-root.json
  • 11:44 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host mc1039.eqiad.wmnet with OS bullseye
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P45468 and previous config saved to /var/cache/conftool/dbconfig/20230308-114407-marostegui.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45467 and previous config saved to /var/cache/conftool/dbconfig/20230308-114357-marostegui.json
  • 11:42 otto@deploy2002: Finished deploy [analytics/refinery@d4aaff9]: Regular analytics weekly train [analytics/refinery@d4aaff9] (duration: 05m 09s)
  • 11:37 otto@deploy2002: Started deploy [analytics/refinery@d4aaff9]: Regular analytics weekly train [analytics/refinery@d4aaff9]
  • 11:37 otto@deploy2002: deploy aborted: Regular analytics weekly train [analytics/refinery@d4aaff9] (duration: 09m 38s)
  • 11:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45466 and previous config saved to /var/cache/conftool/dbconfig/20230308-113136-marostegui.json
  • 11:29 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:29 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P45465 and previous config saved to /var/cache/conftool/dbconfig/20230308-112901-marostegui.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45464 and previous config saved to /var/cache/conftool/dbconfig/20230308-112850-marostegui.json
  • 11:27 otto@deploy2002: Started deploy [analytics/refinery@d4aaff9]: Regular analytics weekly train [analytics/refinery@d4aaff9]
  • 11:27 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 11:27 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 11:26 akosiaris: T307943 upgrade kubernetes-client on deploy1002 deploy2002
  • 11:25 jmm@cumin2002: START - Cookbook sre.ganeti.reimage for host urldownloader1003.wikimedia.org with OS bullseye
  • 11:23 claime: Traffic: authdns updated successfully for eqiad repool - T331285
  • 11:21 claime: Traffic: repool eqiad for user traffic - T331285
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45463 and previous config saved to /var/cache/conftool/dbconfig/20230308-111628-marostegui.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45462 and previous config saved to /var/cache/conftool/dbconfig/20230308-111355-marostegui.json
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T328817)', diff saved to https://phabricator.wikimedia.org/P45461 and previous config saved to /var/cache/conftool/dbconfig/20230308-111344-marostegui.json
  • 11:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45460 and previous config saved to /var/cache/conftool/dbconfig/20230308-110907-marostegui.json
  • 11:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T329260)', diff saved to https://phabricator.wikimedia.org/P45459 and previous config saved to /var/cache/conftool/dbconfig/20230308-110846-marostegui.json
  • 11:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T328817)', diff saved to https://phabricator.wikimedia.org/P45458 and previous config saved to /var/cache/conftool/dbconfig/20230308-110306-marostegui.json
  • 11:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 11:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 11:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T329203)', diff saved to https://phabricator.wikimedia.org/P45457 and previous config saved to /var/cache/conftool/dbconfig/20230308-110121-marostegui.json
  • 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 10:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T328817)', diff saved to https://phabricator.wikimedia.org/P45456 and previous config saved to /var/cache/conftool/dbconfig/20230308-105347-marostegui.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P45455 and previous config saved to /var/cache/conftool/dbconfig/20230308-105339-marostegui.json
  • 10:52 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:52 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:52 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:51 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:51 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:51 otto@deploy2002: Finished deploy [analytics/refinery@eb29334]: Regular analytics weekly train [analytics/refinery@eb29334] (duration: 08m 20s)
  • 10:50 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T329203)', diff saved to https://phabricator.wikimedia.org/P45454 and previous config saved to /var/cache/conftool/dbconfig/20230308-105043-marostegui.json
  • 10:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 10:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 10:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T329203)', diff saved to https://phabricator.wikimedia.org/P45453 and previous config saved to /var/cache/conftool/dbconfig/20230308-105022-marostegui.json
  • 10:50 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:49 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 10:48 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:42 otto@deploy2002: Started deploy [analytics/refinery@eb29334]: Regular analytics weekly train [analytics/refinery@eb29334]
  • 10:40 jayme@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45452 and previous config saved to /var/cache/conftool/dbconfig/20230308-103840-marostegui.json
  • 10:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P45451 and previous config saved to /var/cache/conftool/dbconfig/20230308-103833-marostegui.json
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45450 and previous config saved to /var/cache/conftool/dbconfig/20230308-103515-marostegui.json
  • 10:28 jayme@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45449 and previous config saved to /var/cache/conftool/dbconfig/20230308-102334-marostegui.json
  • 10:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T329260)', diff saved to https://phabricator.wikimedia.org/P45448 and previous config saved to /var/cache/conftool/dbconfig/20230308-102326-marostegui.json
  • 10:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45447 and previous config saved to /var/cache/conftool/dbconfig/20230308-102009-marostegui.json
  • 10:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 10:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T329260)', diff saved to https://phabricator.wikimedia.org/P45446 and previous config saved to /var/cache/conftool/dbconfig/20230308-101944-marostegui.json
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T328817)', diff saved to https://phabricator.wikimedia.org/P45445 and previous config saved to /var/cache/conftool/dbconfig/20230308-100826-marostegui.json
  • 10:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T329203)', diff saved to https://phabricator.wikimedia.org/P45444 and previous config saved to /var/cache/conftool/dbconfig/20230308-100502-marostegui.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P45443 and previous config saved to /var/cache/conftool/dbconfig/20230308-100437-marostegui.json
  • 09:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T328817)', diff saved to https://phabricator.wikimedia.org/P45442 and previous config saved to /var/cache/conftool/dbconfig/20230308-095804-marostegui.json
  • 09:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 09:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 09:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T328817)', diff saved to https://phabricator.wikimedia.org/P45441 and previous config saved to /var/cache/conftool/dbconfig/20230308-095742-marostegui.json
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T329203)', diff saved to https://phabricator.wikimedia.org/P45440 and previous config saved to /var/cache/conftool/dbconfig/20230308-095320-marostegui.json
  • 09:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T329203)', diff saved to https://phabricator.wikimedia.org/P45439 and previous config saved to /var/cache/conftool/dbconfig/20230308-095259-marostegui.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P45438 and previous config saved to /var/cache/conftool/dbconfig/20230308-094931-marostegui.json
  • 09:45 claime: Rebuilding production-images for 894687
  • 09:43 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45437 and previous config saved to /var/cache/conftool/dbconfig/20230308-094236-marostegui.json
  • 09:42 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:41 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:41 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45436 and previous config saved to /var/cache/conftool/dbconfig/20230308-093752-marostegui.json
  • 09:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T329260)', diff saved to https://phabricator.wikimedia.org/P45435 and previous config saved to /var/cache/conftool/dbconfig/20230308-093424-marostegui.json
  • 09:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T329260)', diff saved to https://phabricator.wikimedia.org/P45434 and previous config saved to /var/cache/conftool/dbconfig/20230308-093106-marostegui.json
  • 09:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 09:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 09:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T329260)', diff saved to https://phabricator.wikimedia.org/P45433 and previous config saved to /var/cache/conftool/dbconfig/20230308-093045-marostegui.json
  • 09:30 moritzm: drain ganeti1011 for eventual reimage to Bullseye T311687
  • 09:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45432 and previous config saved to /var/cache/conftool/dbconfig/20230308-092729-marostegui.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45431 and previous config saved to /var/cache/conftool/dbconfig/20230308-092246-marostegui.json
  • 09:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P45430 and previous config saved to /var/cache/conftool/dbconfig/20230308-091538-marostegui.json
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T328817)', diff saved to https://phabricator.wikimedia.org/P45429 and previous config saved to /var/cache/conftool/dbconfig/20230308-091223-marostegui.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T329203)', diff saved to https://phabricator.wikimedia.org/P45428 and previous config saved to /var/cache/conftool/dbconfig/20230308-090739-marostegui.json
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T328817)', diff saved to https://phabricator.wikimedia.org/P45426 and previous config saved to /var/cache/conftool/dbconfig/20230308-090156-marostegui.json
  • 09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T328817)', diff saved to https://phabricator.wikimedia.org/P45425 and previous config saved to /var/cache/conftool/dbconfig/20230308-090134-marostegui.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P45424 and previous config saved to /var/cache/conftool/dbconfig/20230308-090031-marostegui.json
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T329203)', diff saved to https://phabricator.wikimedia.org/P45423 and previous config saved to /var/cache/conftool/dbconfig/20230308-085608-marostegui.json
  • 08:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 08:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T329203)', diff saved to https://phabricator.wikimedia.org/P45422 and previous config saved to /var/cache/conftool/dbconfig/20230308-085546-marostegui.json
  • 08:53 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:53 akosiaris: remove 10.64.64.0/21 and 10.192.64.0/21 from calico GlobalNetworkPolicies T326617
  • 08:52 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P45421 and previous config saved to /var/cache/conftool/dbconfig/20230308-085159-root.json
  • 08:50 vgutierrez: re-enable HAProxy systemd service unit hardening in ulsfo - T323944
  • 08:49 moritzm: installing git security updates
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45420 and previous config saved to /var/cache/conftool/dbconfig/20230308-084628-marostegui.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T329260)', diff saved to https://phabricator.wikimedia.org/P45419 and previous config saved to /var/cache/conftool/dbconfig/20230308-084525-marostegui.json
  • 08:41 marostegui: Deploy schema change on s3 eqiad dbmaint T329203
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T329260)', diff saved to https://phabricator.wikimedia.org/P45418 and previous config saved to /var/cache/conftool/dbconfig/20230308-084053-marostegui.json
  • 08:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45417 and previous config saved to /var/cache/conftool/dbconfig/20230308-084040-marostegui.json
  • 08:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1113:3315', diff saved to https://phabricator.wikimedia.org/P45416 and previous config saved to /var/cache/conftool/dbconfig/20230308-083843-marostegui.json
  • 08:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 08:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1113:3315 (re)pooling @ 15%: Repooling', diff saved to https://phabricator.wikimedia.org/P45415 and previous config saved to /var/cache/conftool/dbconfig/20230308-083731-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P45414 and previous config saved to /var/cache/conftool/dbconfig/20230308-083654-root.json
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P45413 and previous config saved to /var/cache/conftool/dbconfig/20230308-083618-marostegui.json
  • 08:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Schema change
  • 08:34 marostegui: Deploy schema change on s3 eqiad dbmaint T329260
  • 08:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Schema change
  • 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 14 hosts with reason: Schema change
  • 08:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 14 hosts with reason: Schema change
  • 08:32 marostegui: Deploy schema change on s5 eqiad dbmaint T329260
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45412 and previous config saved to /var/cache/conftool/dbconfig/20230308-083121-marostegui.json
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45411 and previous config saved to /var/cache/conftool/dbconfig/20230308-082533-marostegui.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P45410 and previous config saved to /var/cache/conftool/dbconfig/20230308-082149-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45409 and previous config saved to /var/cache/conftool/dbconfig/20230308-082112-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T329260)', diff saved to https://phabricator.wikimedia.org/P45408 and previous config saved to /var/cache/conftool/dbconfig/20230308-081809-marostegui.json
  • 08:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45407 and previous config saved to /var/cache/conftool/dbconfig/20230308-081748-marostegui.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T328817)', diff saved to https://phabricator.wikimedia.org/P45406 and previous config saved to /var/cache/conftool/dbconfig/20230308-081614-marostegui.json
  • 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 19 hosts with reason: Schema change
  • 08:15 marostegui: Deploy schema change on s8 eqiad dbmaint T329260
  • 08:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 19 hosts with reason: Schema change
  • 08:15 marostegui: Deploy schema change on s7 eqiad dbmaint T329260
  • 08:15 marostegui: Deploy schema change on s4 eqiad dbmaint T329260
  • 08:15 marostegui: Deploy schema change on s1 eqiad dbmaint T329260
  • 08:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Schema change
  • 08:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Schema change
  • 08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2093.codfw.wmnet
  • 08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2093.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 08:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T329203)', diff saved to https://phabricator.wikimedia.org/P45405 and previous config saved to /var/cache/conftool/dbconfig/20230308-081027-marostegui.json
  • 08:09 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2093.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 08:07 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P45404 and previous config saved to /var/cache/conftool/dbconfig/20230308-080644-root.json
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T328817)', diff saved to https://phabricator.wikimedia.org/P45403 and previous config saved to /var/cache/conftool/dbconfig/20230308-080431-marostegui.json
  • 08:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 08:02 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2093.codfw.wmnet
  • 08:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P45402 and previous config saved to /var/cache/conftool/dbconfig/20230308-080241-marostegui.json
  • 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 20 hosts with reason: Schema change
  • 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 20 hosts with reason: Schema change
  • 08:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 22 hosts with reason: Schema change
  • 08:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 22 hosts with reason: Schema change
  • 07:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T329203)', diff saved to https://phabricator.wikimedia.org/P45401 and previous config saved to /var/cache/conftool/dbconfig/20230308-075857-marostegui.json
  • 07:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 07:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 07:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 07:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P45400 and previous config saved to /var/cache/conftool/dbconfig/20230308-075139-root.json
  • 07:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 07:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 07:47 taavi@deploy2002: Finished deploy [horizon/deploy@9d02cd6]: updating wmf-sudo-dashboard (duration: 04m 56s)
  • 07:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 07:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 07:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P45399 and previous config saved to /var/cache/conftool/dbconfig/20230308-074735-marostegui.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1109', diff saved to https://phabricator.wikimedia.org/P45398 and previous config saved to /var/cache/conftool/dbconfig/20230308-074427-marostegui.json
  • 07:42 taavi@deploy2002: Started deploy [horizon/deploy@9d02cd6]: updating wmf-sudo-dashboard
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P45397 and previous config saved to /var/cache/conftool/dbconfig/20230308-073633-root.json
  • 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 07:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45396 and previous config saved to /var/cache/conftool/dbconfig/20230308-073228-marostegui.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1109 T330991', diff saved to https://phabricator.wikimedia.org/P45395 and previous config saved to /var/cache/conftool/dbconfig/20230308-073110-root.json
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1126 to s8 primary T330991', diff saved to https://phabricator.wikimedia.org/P45394 and previous config saved to /var/cache/conftool/dbconfig/20230308-073005-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T329260)', diff saved to https://phabricator.wikimedia.org/P45393 and previous config saved to /var/cache/conftool/dbconfig/20230308-072932-marostegui.json
  • 07:29 marostegui: Starting s8 eqiad failover from db1109 to db1126 - T330991
  • 07:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 07:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 07:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P45392 and previous config saved to /var/cache/conftool/dbconfig/20230308-072128-root.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1126 with weight 0 T330991', diff saved to https://phabricator.wikimedia.org/P45391 and previous config saved to /var/cache/conftool/dbconfig/20230308-070544-root.json
  • 07:05 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s8 T330991
  • 07:05 root@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s8 T330991
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T329260)', diff saved to https://phabricator.wikimedia.org/P45390 and previous config saved to /var/cache/conftool/dbconfig/20230308-070458-marostegui.json
  • 07:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 07:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 07:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 06:53 marostegui: Failover m3 from db1101 to db1159 - T331387
  • 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331387
  • 06:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331387
  • 06:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
  • 06:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
  • 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 06:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T329260)', diff saved to https://phabricator.wikimedia.org/P45389 and previous config saved to /var/cache/conftool/dbconfig/20230308-055038-marostegui.json
  • 05:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P45388 and previous config saved to /var/cache/conftool/dbconfig/20230308-053531-marostegui.json
  • 05:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P45387 and previous config saved to /var/cache/conftool/dbconfig/20230308-052024-marostegui.json
  • 05:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T329260)', diff saved to https://phabricator.wikimedia.org/P45386 and previous config saved to /var/cache/conftool/dbconfig/20230308-050517-marostegui.json
  • 04:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T329260)', diff saved to https://phabricator.wikimedia.org/P45385 and previous config saved to /var/cache/conftool/dbconfig/20230308-040451-marostegui.json
  • 04:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 04:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 04:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45384 and previous config saved to /var/cache/conftool/dbconfig/20230308-040430-marostegui.json
  • 03:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P45383 and previous config saved to /var/cache/conftool/dbconfig/20230308-034923-marostegui.json
  • 03:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P45382 and previous config saved to /var/cache/conftool/dbconfig/20230308-033416-marostegui.json
  • 03:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45381 and previous config saved to /var/cache/conftool/dbconfig/20230308-031910-marostegui.json
  • 03:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45380 and previous config saved to /var/cache/conftool/dbconfig/20230308-031257-marostegui.json
  • 03:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 03:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 03:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T329260)', diff saved to https://phabricator.wikimedia.org/P45379 and previous config saved to /var/cache/conftool/dbconfig/20230308-031246-marostegui.json
  • 02:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P45378 and previous config saved to /var/cache/conftool/dbconfig/20230308-025739-marostegui.json
  • 02:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T329203)', diff saved to https://phabricator.wikimedia.org/P45377 and previous config saved to /var/cache/conftool/dbconfig/20230308-024536-marostegui.json
  • 02:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P45376 and previous config saved to /var/cache/conftool/dbconfig/20230308-024233-marostegui.json
  • 02:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45375 and previous config saved to /var/cache/conftool/dbconfig/20230308-023029-marostegui.json
  • 02:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T329260)', diff saved to https://phabricator.wikimedia.org/P45374 and previous config saved to /var/cache/conftool/dbconfig/20230308-022726-marostegui.json
  • 02:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T329260)', diff saved to https://phabricator.wikimedia.org/P45373 and previous config saved to /var/cache/conftool/dbconfig/20230308-022116-marostegui.json
  • 02:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 02:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 02:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45372 and previous config saved to /var/cache/conftool/dbconfig/20230308-022054-marostegui.json
  • 02:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P45371 and previous config saved to /var/cache/conftool/dbconfig/20230308-021523-marostegui.json
  • 02:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P45370 and previous config saved to /var/cache/conftool/dbconfig/20230308-020547-marostegui.json
  • 02:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T329203)', diff saved to https://phabricator.wikimedia.org/P45369 and previous config saved to /var/cache/conftool/dbconfig/20230308-020016-marostegui.json
  • 01:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45368 and previous config saved to /var/cache/conftool/dbconfig/20230308-015921-marostegui.json
  • 01:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P45367 and previous config saved to /var/cache/conftool/dbconfig/20230308-015040-marostegui.json
  • 01:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T329203)', diff saved to https://phabricator.wikimedia.org/P45366 and previous config saved to /var/cache/conftool/dbconfig/20230308-014659-marostegui.json
  • 01:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 01:46 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 01:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T329203)', diff saved to https://phabricator.wikimedia.org/P45365 and previous config saved to /var/cache/conftool/dbconfig/20230308-014637-marostegui.json
  • 01:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45364 and previous config saved to /var/cache/conftool/dbconfig/20230308-014415-marostegui.json
  • 01:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45363 and previous config saved to /var/cache/conftool/dbconfig/20230308-013534-marostegui.json
  • 01:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45362 and previous config saved to /var/cache/conftool/dbconfig/20230308-013131-marostegui.json
  • 01:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45361 and previous config saved to /var/cache/conftool/dbconfig/20230308-012918-marostegui.json
  • 01:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P45360 and previous config saved to /var/cache/conftool/dbconfig/20230308-012908-marostegui.json
  • 01:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 01:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 01:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T329260)', diff saved to https://phabricator.wikimedia.org/P45359 and previous config saved to /var/cache/conftool/dbconfig/20230308-012901-marostegui.json
  • 01:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P45358 and previous config saved to /var/cache/conftool/dbconfig/20230308-011624-marostegui.json
  • 01:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45357 and previous config saved to /var/cache/conftool/dbconfig/20230308-011401-marostegui.json
  • 01:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P45356 and previous config saved to /var/cache/conftool/dbconfig/20230308-011354-marostegui.json
  • 01:09 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir1002.eqiad.wmnet
  • 01:08 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir1002.eqiad.wmnet with OS bullseye
  • 01:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45355 and previous config saved to /var/cache/conftool/dbconfig/20230308-010321-marostegui.json
  • 01:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 01:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 01:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T328817)', diff saved to https://phabricator.wikimedia.org/P45354 and previous config saved to /var/cache/conftool/dbconfig/20230308-010300-marostegui.json
  • 01:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T329203)', diff saved to https://phabricator.wikimedia.org/P45353 and previous config saved to /var/cache/conftool/dbconfig/20230308-010117-marostegui.json
  • 00:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P45352 and previous config saved to /var/cache/conftool/dbconfig/20230308-005848-marostegui.json
  • 00:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
  • 00:51 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1002.eqiad.wmnet with reason: host reimage
  • 00:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45351 and previous config saved to /var/cache/conftool/dbconfig/20230308-004753-marostegui.json
  • 00:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T329203)', diff saved to https://phabricator.wikimedia.org/P45350 and previous config saved to /var/cache/conftool/dbconfig/20230308-004744-marostegui.json
  • 00:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 00:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T329203)', diff saved to https://phabricator.wikimedia.org/P45349 and previous config saved to /var/cache/conftool/dbconfig/20230308-004722-marostegui.json
  • 00:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T329260)', diff saved to https://phabricator.wikimedia.org/P45348 and previous config saved to /var/cache/conftool/dbconfig/20230308-004341-marostegui.json
  • 00:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T329260)', diff saved to https://phabricator.wikimedia.org/P45347 and previous config saved to /var/cache/conftool/dbconfig/20230308-004115-marostegui.json
  • 00:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T329260)', diff saved to https://phabricator.wikimedia.org/P45346 and previous config saved to /var/cache/conftool/dbconfig/20230308-004049-marostegui.json
  • 00:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P45345 and previous config saved to /var/cache/conftool/dbconfig/20230308-003240-marostegui.json
  • 00:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45344 and previous config saved to /var/cache/conftool/dbconfig/20230308-003216-marostegui.json
  • 00:32 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir1002.eqiad.wmnet with OS bullseye
  • 00:29 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ncredir1002.eqiad.wmnet with OS bullseye
  • 00:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P45343 and previous config saved to /var/cache/conftool/dbconfig/20230308-002543-marostegui.json
  • 00:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T328817)', diff saved to https://phabricator.wikimedia.org/P45342 and previous config saved to /var/cache/conftool/dbconfig/20230308-001734-marostegui.json
  • 00:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P45341 and previous config saved to /var/cache/conftool/dbconfig/20230308-001709-marostegui.json
  • 00:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P45340 and previous config saved to /var/cache/conftool/dbconfig/20230308-001036-marostegui.json
  • 00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T328817)', diff saved to https://phabricator.wikimedia.org/P45339 and previous config saved to /var/cache/conftool/dbconfig/20230308-000538-marostegui.json
  • 00:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 00:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T328817)', diff saved to https://phabricator.wikimedia.org/P45338 and previous config saved to /var/cache/conftool/dbconfig/20230308-000516-marostegui.json
  • 00:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T329203)', diff saved to https://phabricator.wikimedia.org/P45337 and previous config saved to /var/cache/conftool/dbconfig/20230308-000203-marostegui.json

2023-03-07

  • 23:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T329260)', diff saved to https://phabricator.wikimedia.org/P45336 and previous config saved to /var/cache/conftool/dbconfig/20230307-235529-marostegui.json
  • 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45335 and previous config saved to /var/cache/conftool/dbconfig/20230307-235010-marostegui.json
  • 23:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T329260)', diff saved to https://phabricator.wikimedia.org/P45334 and previous config saved to /var/cache/conftool/dbconfig/20230307-234858-marostegui.json
  • 23:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 23:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 23:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T329260)', diff saved to https://phabricator.wikimedia.org/P45333 and previous config saved to /var/cache/conftool/dbconfig/20230307-234837-marostegui.json
  • 23:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T329203)', diff saved to https://phabricator.wikimedia.org/P45332 and previous config saved to /var/cache/conftool/dbconfig/20230307-234741-marostegui.json
  • 23:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 23:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 23:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 23:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45331 and previous config saved to /var/cache/conftool/dbconfig/20230307-234715-marostegui.json
  • 23:40 ryankemper@deploy2002: Finished deploy [airflow-dags/search@3419b7d]: initial deployment to new search platform airflow 2 instance - ryankemper (duration: 00m 15s)
  • 23:39 ryankemper@deploy2002: Started deploy [airflow-dags/search@3419b7d]: initial deployment to new search platform airflow 2 instance - ryankemper
  • 23:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P45329 and previous config saved to /var/cache/conftool/dbconfig/20230307-233503-marostegui.json
  • 23:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P45328 and previous config saved to /var/cache/conftool/dbconfig/20230307-233330-marostegui.json
  • 23:32 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir1002.eqiad.wmnet with OS bullseye
  • 23:32 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir1002.eqiad.wmnet
  • 23:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45327 and previous config saved to /var/cache/conftool/dbconfig/20230307-233209-marostegui.json
  • 23:31 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir2002.codfw.wmnet
  • 23:30 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir1001.eqiad.wmnet
  • 23:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T328817)', diff saved to https://phabricator.wikimedia.org/P45326 and previous config saved to /var/cache/conftool/dbconfig/20230307-231957-marostegui.json
  • 23:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P45325 and previous config saved to /var/cache/conftool/dbconfig/20230307-231824-marostegui.json
  • 23:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P45324 and previous config saved to /var/cache/conftool/dbconfig/20230307-231702-marostegui.json
  • 23:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T329260)', diff saved to https://phabricator.wikimedia.org/P45323 and previous config saved to /var/cache/conftool/dbconfig/20230307-230317-marostegui.json
  • 23:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45322 and previous config saved to /var/cache/conftool/dbconfig/20230307-230156-marostegui.json
  • 22:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T329260)', diff saved to https://phabricator.wikimedia.org/P45321 and previous config saved to /var/cache/conftool/dbconfig/20230307-225951-marostegui.json
  • 22:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 22:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 22:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 22:54 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir2002.codfw.wmnet with OS bullseye
  • 22:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T329260)', diff saved to https://phabricator.wikimedia.org/P45319 and previous config saved to /var/cache/conftool/dbconfig/20230307-225110-marostegui.json
  • 22:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45318 and previous config saved to /var/cache/conftool/dbconfig/20230307-224803-marostegui.json
  • 22:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 22:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45317 and previous config saved to /var/cache/conftool/dbconfig/20230307-224742-marostegui.json
  • 22:44 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir1001.eqiad.wmnet with OS bullseye
  • 22:39 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2002.codfw.wmnet with reason: host reimage
  • 22:36 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir2002.codfw.wmnet with reason: host reimage
  • 22:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P45316 and previous config saved to /var/cache/conftool/dbconfig/20230307-223603-marostegui.json
  • 22:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45315 and previous config saved to /var/cache/conftool/dbconfig/20230307-223235-marostegui.json
  • 22:31 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
  • 22:26 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir1001.eqiad.wmnet with reason: host reimage
  • 22:26 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir2002.codfw.wmnet with OS bullseye
  • 22:26 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir2002.codfw.wmnet
  • 22:25 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir2001.codfw.wmnet
  • 22:23 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir2001.codfw.wmnet with OS bullseye
  • 22:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P45314 and previous config saved to /var/cache/conftool/dbconfig/20230307-222056-marostegui.json
  • 22:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T328817)', diff saved to https://phabricator.wikimedia.org/P45313 and previous config saved to /var/cache/conftool/dbconfig/20230307-221931-marostegui.json
  • 22:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 22:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 22:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 22:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 22:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T328817)', diff saved to https://phabricator.wikimedia.org/P45312 and previous config saved to /var/cache/conftool/dbconfig/20230307-221854-marostegui.json
  • 22:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P45311 and previous config saved to /var/cache/conftool/dbconfig/20230307-221729-marostegui.json
  • 22:14 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir1001.eqiad.wmnet with OS bullseye
  • 22:14 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir1001.eqiad.wmnet
  • 22:13 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir4002.ulsfo.wmnet
  • 22:13 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir4002.ulsfo.wmnet with OS bullseye
  • 22:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir2001.codfw.wmnet with reason: host reimage
  • 22:06 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir2001.codfw.wmnet with reason: host reimage
  • 22:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T329260)', diff saved to https://phabricator.wikimedia.org/P45310 and previous config saved to /var/cache/conftool/dbconfig/20230307-220550-marostegui.json
  • 22:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T329260)', diff saved to https://phabricator.wikimedia.org/P45309 and previous config saved to /var/cache/conftool/dbconfig/20230307-220438-marostegui.json
  • 22:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 22:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 22:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T329260)', diff saved to https://phabricator.wikimedia.org/P45308 and previous config saved to /var/cache/conftool/dbconfig/20230307-220416-marostegui.json
  • 22:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45307 and previous config saved to /var/cache/conftool/dbconfig/20230307-220348-marostegui.json
  • 22:03 mforns@deploy2002: Finished deploy [airflow-dags/analytics@9fba86b]: (no justification provided) (duration: 00m 18s)
  • 22:03 mforns@deploy2002: Started deploy [airflow-dags/analytics@9fba86b]: (no justification provided)
  • 22:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45306 and previous config saved to /var/cache/conftool/dbconfig/20230307-220222-marostegui.json
  • 21:59 sukhe@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host durum6002.drmrs.wmnet with OS bullseye
  • 21:58 inflatador: bking@cumin2002 depool elastic row D hosts to prepare for T322082
  • 21:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 7 hosts with reason: re-rack
  • 21:56 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 7 hosts with reason: re-rack
  • 21:56 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir2001.codfw.wmnet with OS bullseye
  • 21:56 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir2001.codfw.wmnet
  • 21:55 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4002.ulsfo.wmnet with reason: host reimage
  • 21:54 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir3002.esams.wmnet
  • 21:54 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir3002.esams.wmnet with OS bullseye
  • 21:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4002.ulsfo.wmnet with reason: host reimage
  • 21:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P45305 and previous config saved to /var/cache/conftool/dbconfig/20230307-214910-marostegui.json
  • 21:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P45304 and previous config saved to /var/cache/conftool/dbconfig/20230307-214841-marostegui.json
  • 21:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T329203)', diff saved to https://phabricator.wikimedia.org/P45303 and previous config saved to /var/cache/conftool/dbconfig/20230307-214824-marostegui.json
  • 21:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 21:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 21:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T329203)', diff saved to https://phabricator.wikimedia.org/P45302 and previous config saved to /var/cache/conftool/dbconfig/20230307-214802-marostegui.json
  • 21:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 21:43 TheresNoTime: close UTC late backport window
  • 21:42 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6002.drmrs.wmnet with reason: host reimage
  • 21:41 inflatador: bking@cumin2002 ban elastic row D hosts to prepare for T322082
  • 21:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2073.codfw.wmnet with OS bullseye
  • 21:40 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:39 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir4002.ulsfo.wmnet with OS bullseye
  • 21:38 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir4002.ulsfo.wmnet
  • 21:37 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir4001.ulsfo.wmnet
  • 21:37 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir4001.ulsfo.wmnet with OS bullseye
  • 21:35 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3002.esams.wmnet with reason: host reimage
  • 21:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P45301 and previous config saved to /var/cache/conftool/dbconfig/20230307-213403-marostegui.json
  • 21:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T328817)', diff saved to https://phabricator.wikimedia.org/P45300 and previous config saved to /var/cache/conftool/dbconfig/20230307-213334-marostegui.json
  • 21:33 samtar@deploy2002: Finished scap: Backport for Enable new Linter UI for namespace, tag and template for group1 wikis (T299612) (duration: 09m 11s)
  • 21:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45299 and previous config saved to /var/cache/conftool/dbconfig/20230307-213256-marostegui.json
  • 21:32 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3002.esams.wmnet with reason: host reimage
  • 21:27 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:27 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum6002.drmrs.wmnet with OS bullseye
  • 21:25 samtar@deploy2002: sbailey and samtar: Backport for Enable new Linter UI for namespace, tag and template for group1 wikis (T299612) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 21:23 samtar@deploy2002: Started scap: Backport for Enable new Linter UI for namespace, tag and template for group1 wikis (T299612)
  • 21:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2147 (T328817)', diff saved to https://phabricator.wikimedia.org/P45298 and previous config saved to /var/cache/conftool/dbconfig/20230307-212138-marostegui.json
  • 21:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 21:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 21:20 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.26 refs T330204
  • 21:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir4001.ulsfo.wmnet with reason: host reimage
  • 21:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T329260)', diff saved to https://phabricator.wikimedia.org/P45297 and previous config saved to /var/cache/conftool/dbconfig/20230307-211857-marostegui.json
  • 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P45296 and previous config saved to /var/cache/conftool/dbconfig/20230307-211749-marostegui.json
  • 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T329260)', diff saved to https://phabricator.wikimedia.org/P45295 and previous config saved to /var/cache/conftool/dbconfig/20230307-211744-marostegui.json
  • 21:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 21:17 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir3002.esams.wmnet with OS bullseye
  • 21:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 21:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45294 and previous config saved to /var/cache/conftool/dbconfig/20230307-211723-marostegui.json
  • 21:17 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir4001.ulsfo.wmnet with reason: host reimage
  • 21:17 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir3002.esams.wmnet
  • 21:16 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir3001.esams.wmnet
  • 21:15 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir3001.esams.wmnet with OS bullseye
  • 21:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 21:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45293 and previous config saved to /var/cache/conftool/dbconfig/20230307-211159-marostegui.json
  • 21:10 bblack: lvs500[45]: re-enabling/pooling, back to normal flow
  • 21:10 jhuneidi@deploy2002: Pruned MediaWiki: 1.40.0-wmf.24 (duration: 02m 08s)
  • 21:07 jhuneidi@deploy2002: Finished scap: testwikis wikis to 1.40.0-wmf.26 refs T330204 (duration: 43m 53s)
  • 21:07 bking@deploy2002: Finished deploy [airflow-dags/search@d533716]: initial deployment to search platform airflow 2 instance-bk (duration: 00m 41s)
  • 21:07 bking@deploy2002: Started deploy [airflow-dags/search@d533716]: initial deployment to search platform airflow 2 instance-bk
  • 21:06 bblack: lvs500[45]: disabling puppet and stopping pybal, all eqsin traffic through lvs5006 temporarily...
  • 21:03 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir4001.ulsfo.wmnet with OS bullseye
  • 21:02 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir4001.ulsfo.wmnet
  • 21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T329203)', diff saved to https://phabricator.wikimedia.org/P45292 and previous config saved to /var/cache/conftool/dbconfig/20230307-210243-marostegui.json
  • 21:02 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir4001.drmrs.wmnet
  • 21:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P45291 and previous config saved to /var/cache/conftool/dbconfig/20230307-210216-marostegui.json
  • 20:58 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@9924c93]: test deploy new airflow instance (duration: 02m 03s)
  • 20:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45290 and previous config saved to /var/cache/conftool/dbconfig/20230307-205653-marostegui.json
  • 20:56 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: test deploy new airflow instance
  • 20:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir3001.esams.wmnet with reason: host reimage
  • 20:56 ebernhardson@deploy2002: deploy aborted: test deploy new airflow instance (duration: 00m 01s)
  • 20:56 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: test deploy new airflow instance
  • 20:53 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2073.codfw.wmnet with reason: host reimage
  • 20:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir3001.esams.wmnet with reason: host reimage
  • 20:50 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2073.codfw.wmnet with reason: host reimage
  • 20:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T329203)', diff saved to https://phabricator.wikimedia.org/P45289 and previous config saved to /var/cache/conftool/dbconfig/20230307-204925-marostegui.json
  • 20:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 20:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 20:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T329203)', diff saved to https://phabricator.wikimedia.org/P45288 and previous config saved to /var/cache/conftool/dbconfig/20230307-204904-marostegui.json
  • 20:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P45287 and previous config saved to /var/cache/conftool/dbconfig/20230307-204710-marostegui.json
  • 20:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P45286 and previous config saved to /var/cache/conftool/dbconfig/20230307-204146-marostegui.json
  • 20:35 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir3001.esams.wmnet with OS bullseye
  • 20:35 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir3001.drmrs.wmnet
  • 20:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45284 and previous config saved to /var/cache/conftool/dbconfig/20230307-203357-marostegui.json
  • 20:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45283 and previous config saved to /var/cache/conftool/dbconfig/20230307-203203-marostegui.json
  • 20:30 ebernhardson@deploy2002: deploy aborted: test deploy new airflow instance (duration: 00m 02s)
  • 20:30 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: test deploy new airflow instance
  • 20:30 ebernhardson@deploy2002: Finished deploy [wikimedia/discovery/analytics@c8dc6d5]: test deploy old airflow instance (duration: 00m 05s)
  • 20:29 ebernhardson@deploy2002: Started deploy [wikimedia/discovery/analytics@c8dc6d5]: test deploy old airflow instance
  • 20:29 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2073.codfw.wmnet with OS bullseye
  • 20:27 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 20:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T329260)', diff saved to https://phabricator.wikimedia.org/P45282 and previous config saved to /var/cache/conftool/dbconfig/20230307-202713-marostegui.json
  • 20:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 20:27 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45281 and previous config saved to /var/cache/conftool/dbconfig/20230307-202652-marostegui.json
  • 20:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45280 and previous config saved to /var/cache/conftool/dbconfig/20230307-202640-marostegui.json
  • 20:24 jhuneidi@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.26 refs T330204
  • 20:21 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir6002.eqsin.wmnet
  • 20:19 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ncredir6002.drmrs.wmnet with OS bullseye
  • 20:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P45279 and previous config saved to /var/cache/conftool/dbconfig/20230307-201851-marostegui.json
  • 20:17 bking@deploy2002: Finished deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance-bk (duration: 01m 18s)
  • 20:16 bking@deploy2002: Started deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance-bk
  • 20:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45277 and previous config saved to /var/cache/conftool/dbconfig/20230307-201414-marostegui.json
  • 20:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 20:14 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance (duration: 01m 49s)
  • 20:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 20:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45276 and previous config saved to /var/cache/conftool/dbconfig/20230307-201353-marostegui.json
  • 20:12 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance
  • 20:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P45274 and previous config saved to /var/cache/conftool/dbconfig/20230307-201145-marostegui.json
  • 20:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T329203)', diff saved to https://phabricator.wikimedia.org/P45273 and previous config saved to /var/cache/conftool/dbconfig/20230307-200344-marostegui.json
  • 20:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir6002.drmrs.wmnet with reason: host reimage
  • 19:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45272 and previous config saved to /var/cache/conftool/dbconfig/20230307-195846-marostegui.json
  • 19:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir6002.drmrs.wmnet with reason: host reimage
  • 19:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P45270 and previous config saved to /var/cache/conftool/dbconfig/20230307-195639-marostegui.json
  • 19:51 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum5002.eqsin.wmnet with OS bullseye
  • 19:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T329203)', diff saved to https://phabricator.wikimedia.org/P45268 and previous config saved to /var/cache/conftool/dbconfig/20230307-194934-marostegui.json
  • 19:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 19:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 19:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T329203)', diff saved to https://phabricator.wikimedia.org/P45267 and previous config saved to /var/cache/conftool/dbconfig/20230307-194913-marostegui.json
  • 19:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P45266 and previous config saved to /var/cache/conftool/dbconfig/20230307-194340-marostegui.json
  • 19:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45265 and previous config saved to /var/cache/conftool/dbconfig/20230307-194132-marostegui.json
  • 19:40 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance (duration: 00m 07s)
  • 19:40 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir6002.drmrs.wmnet with OS bullseye
  • 19:40 ebernhardson@deploy2002: Started deploy [airflow-dags/search@9924c93]: initial deployment to search platform airflow 2 instance
  • 19:40 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir6002.eqsin.wmnet
  • 19:40 ejegg: payments-wiki upgraded from 346e6f61 to 05a5e09a
  • 19:39 jhuneidi@deploy2002: scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki=aawiki --force-version "1.40.0-wmf.26" --no-progress --store-class=LCStoreCDB --threads=30 --lang en --quiet ' returned non-zero exit status 255. (duration: 00m 02s)
  • 19:39 jhuneidi@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.26 refs T330204
  • 19:39 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir6001.eqsin.wmnet
  • 19:37 brett@cumin2002: conftool action : set/pooled=yess; selector: name=ncredir6001.eqsin.wmnet
  • 19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45264 and previous config saved to /var/cache/conftool/dbconfig/20230307-193639-marostegui.json
  • 19:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 19:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45263 and previous config saved to /var/cache/conftool/dbconfig/20230307-193617-marostegui.json
  • 19:35 sukhe@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host durum6001.drmrs.wmnet with OS bullseye
  • 19:35 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum4002.ulsfo.wmnet with OS bullseye
  • 19:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45262 and previous config saved to /var/cache/conftool/dbconfig/20230307-193406-marostegui.json
  • 19:32 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
  • 19:31 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 19:29 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5002.eqsin.wmnet with reason: host reimage
  • 19:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45261 and previous config saved to /var/cache/conftool/dbconfig/20230307-192833-marostegui.json
  • 19:22 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 19:21 brett@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host ncredir6001.drmrs.wmnet with OS bullseye
  • 19:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P45260 and previous config saved to /var/cache/conftool/dbconfig/20230307-192111-marostegui.json
  • 19:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
  • 19:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P45259 and previous config saved to /var/cache/conftool/dbconfig/20230307-191900-marostegui.json
  • 19:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 19:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T328817)', diff saved to https://phabricator.wikimedia.org/P45258 and previous config saved to /var/cache/conftool/dbconfig/20230307-191717-marostegui.json
  • 19:17 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4002.ulsfo.wmnet with reason: host reimage
  • 19:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 19:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T328817)', diff saved to https://phabricator.wikimedia.org/P45257 and previous config saved to /var/cache/conftool/dbconfig/20230307-191656-marostegui.json
  • 19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2072.codfw.wmnet with OS bullseye
  • 19:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:08 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 19:06 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum5002.eqsin.wmnet with OS bullseye
  • 19:06 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum4002.ulsfo.wmnet with OS bullseye
  • 19:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P45256 and previous config saved to /var/cache/conftool/dbconfig/20230307-190604-marostegui.json
  • 19:04 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum6001.drmrs.wmnet with OS bullseye
  • 19:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T329203)', diff saved to https://phabricator.wikimedia.org/P45255 and previous config saved to /var/cache/conftool/dbconfig/20230307-190353-marostegui.json
  • 19:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir6001.drmrs.wmnet with reason: host reimage
  • 19:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45254 and previous config saved to /var/cache/conftool/dbconfig/20230307-190149-marostegui.json
  • 19:01 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum5001.eqsin.wmnet with OS bullseye
  • 18:59 jhuneidi@deploy2002: Finished scap: testwikis wikis to 1.40.0-wmf.26 refs T330204 (duration: 12m 38s)
  • 18:57 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir6001.drmrs.wmnet with reason: host reimage
  • 18:56 sukhe@cumin2002: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host durum6001.drmrs.wmnet with OS bullseye
  • 18:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45253 and previous config saved to /var/cache/conftool/dbconfig/20230307-185058-marostegui.json
  • 18:49 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2072.codfw.wmnet with reason: host reimage
  • 18:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T329203)', diff saved to https://phabricator.wikimedia.org/P45252 and previous config saved to /var/cache/conftool/dbconfig/20230307-184907-marostegui.json
  • 18:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 18:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 18:48 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum4001.ulsfo.wmnet with OS bullseye
  • 18:47 jhuneidi@deploy2002: Started scap: testwikis wikis to 1.40.0-wmf.26 refs T330204
  • 18:46 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2072.codfw.wmnet with reason: host reimage
  • 18:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P45251 and previous config saved to /var/cache/conftool/dbconfig/20230307-184642-marostegui.json
  • 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T329260)', diff saved to https://phabricator.wikimedia.org/P45250 and previous config saved to /var/cache/conftool/dbconfig/20230307-184506-marostegui.json
  • 18:45 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 18:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 18:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45249 and previous config saved to /var/cache/conftool/dbconfig/20230307-184428-marostegui.json
  • 18:43 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum5001.eqsin.wmnet with reason: host reimage
  • 18:40 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 18:39 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir6001.drmrs.wmnet with OS bullseye
  • 18:39 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir6001.eqsin.wmnet
  • 18:39 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir6001.eqsin.wmnet
  • 18:39 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum5001.eqsin.wmnet with reason: host reimage
  • 18:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 18:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 18:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T329203)', diff saved to https://phabricator.wikimedia.org/P45248 and previous config saved to /var/cache/conftool/dbconfig/20230307-183810-marostegui.json
  • 18:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 18:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 18:35 brett@cumin2002: conftool action : set/pooled=yes; selector: name=ncredir5002.eqsin.wmnet
  • 18:32 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum4001.ulsfo.wmnet with reason: host reimage
  • 18:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T328817)', diff saved to https://phabricator.wikimedia.org/P45247 and previous config saved to /var/cache/conftool/dbconfig/20230307-183136-marostegui.json
  • 18:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P45246 and previous config saved to /var/cache/conftool/dbconfig/20230307-182921-marostegui.json
  • 18:29 dancy: dancy@deploy2002: Fixing up /srv/mediawiki-staging/.git permissions
  • 18:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2072.codfw.wmnet with OS bullseye
  • 18:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2071.codfw.wmnet with OS bullseye
  • 18:26 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45245 and previous config saved to /var/cache/conftool/dbconfig/20230307-182304-marostegui.json
  • 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T328817)', diff saved to https://phabricator.wikimedia.org/P45244 and previous config saved to /var/cache/conftool/dbconfig/20230307-182035-marostegui.json
  • 18:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 18:20 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum6001.drmrs.wmnet with OS bullseye
  • 18:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 18:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T328817)', diff saved to https://phabricator.wikimedia.org/P45243 and previous config saved to /var/cache/conftool/dbconfig/20230307-182013-marostegui.json
  • 18:19 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum5001.eqsin.wmnet with OS bullseye
  • 18:18 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum4001.ulsfo.wmnet with OS bullseye
  • 18:17 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum3002.esams.wmnet with OS bullseye
  • 18:16 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir5002.eqsin.wmnet with OS bullseye
  • 18:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P45242 and previous config saved to /var/cache/conftool/dbconfig/20230307-181414-marostegui.json
  • 18:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 18:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P45241 and previous config saved to /var/cache/conftool/dbconfig/20230307-180757-marostegui.json
  • 18:05 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 18:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45240 and previous config saved to /var/cache/conftool/dbconfig/20230307-180506-marostegui.json
  • 18:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3002.esams.wmnet with reason: host reimage
  • 17:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45239 and previous config saved to /var/cache/conftool/dbconfig/20230307-175907-marostegui.json
  • 17:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3002.esams.wmnet with reason: host reimage
  • 17:53 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45238 and previous config saved to /var/cache/conftool/dbconfig/20230307-175314-marostegui.json
  • 17:53 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T329203)', diff saved to https://phabricator.wikimedia.org/P45237 and previous config saved to /var/cache/conftool/dbconfig/20230307-175251-marostegui.json
  • 17:51 inflatador: bking@cumin2002 repool wdqs hosts post-maintenance T329073
  • 17:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P45236 and previous config saved to /var/cache/conftool/dbconfig/20230307-175000-marostegui.json
  • 17:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 17:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 17:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T329260)', diff saved to https://phabricator.wikimedia.org/P45235 and previous config saved to /var/cache/conftool/dbconfig/20230307-174848-marostegui.json
  • 17:47 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir5002.eqsin.wmnet with reason: host reimage
  • 17:47 volans@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 17:47 volans@cumin1001: START - Cookbook sre.network.cf
  • 17:44 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir5002.eqsin.wmnet with reason: host reimage
  • 17:40 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum3002.esams.wmnet with OS bullseye
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T329203)', diff saved to https://phabricator.wikimedia.org/P45234 and previous config saved to /var/cache/conftool/dbconfig/20230307-173923-marostegui.json
  • 17:39 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 17:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T329203)', diff saved to https://phabricator.wikimedia.org/P45233 and previous config saved to /var/cache/conftool/dbconfig/20230307-173901-marostegui.json
  • 17:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T328817)', diff saved to https://phabricator.wikimedia.org/P45232 and previous config saved to /var/cache/conftool/dbconfig/20230307-173453-marostegui.json
  • 17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P45231 and previous config saved to /var/cache/conftool/dbconfig/20230307-173341-marostegui.json
  • 17:31 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum3001.esams.wmnet with OS bullseye
  • 17:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45229 and previous config saved to /var/cache/conftool/dbconfig/20230307-172354-marostegui.json
  • 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T328817)', diff saved to https://phabricator.wikimedia.org/P45230 and previous config saved to /var/cache/conftool/dbconfig/20230307-172354-marostegui.json
  • 17:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 17:23 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum2002.codfw.wmnet with OS bullseye
  • 17:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 17:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T328817)', diff saved to https://phabricator.wikimedia.org/P45228 and previous config saved to /var/cache/conftool/dbconfig/20230307-172333-marostegui.json
  • 17:22 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir5002.eqsin.wmnet with OS bullseye
  • 17:21 brett@cumin2002: conftool action : set/pooled=no; selector: name=ncredir5002.eqsin.wmnet
  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P45227 and previous config saved to /var/cache/conftool/dbconfig/20230307-171834-marostegui.json
  • 17:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3001.esams.wmnet with reason: host reimage
  • 17:12 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3001.esams.wmnet with reason: host reimage
  • 17:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P45226 and previous config saved to /var/cache/conftool/dbconfig/20230307-170848-marostegui.json
  • 17:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45225 and previous config saved to /var/cache/conftool/dbconfig/20230307-170826-marostegui.json
  • 17:06 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2002.codfw.wmnet with reason: host reimage
  • 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T329260)', diff saved to https://phabricator.wikimedia.org/P45224 and previous config saved to /var/cache/conftool/dbconfig/20230307-170328-marostegui.json
  • 17:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T329260)', diff saved to https://phabricator.wikimedia.org/P45223 and previous config saved to /var/cache/conftool/dbconfig/20230307-170215-marostegui.json
  • 17:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 17:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 17:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45222 and previous config saved to /var/cache/conftool/dbconfig/20230307-170154-marostegui.json
  • 16:58 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs,name=eqiad
  • 16:57 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum3001.esams.wmnet with OS bullseye
  • 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T329203)', diff saved to https://phabricator.wikimedia.org/P45221 and previous config saved to /var/cache/conftool/dbconfig/20230307-165340-marostegui.json
  • 16:53 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum2002.codfw.wmnet with OS bullseye
  • 16:53 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@9924c93]: (no justification provided) (duration: 00m 11s)
  • 16:53 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@9924c93]: (no justification provided)
  • 16:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P45220 and previous config saved to /var/cache/conftool/dbconfig/20230307-165319-marostegui.json
  • 16:52 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum2001.codfw.wmnet with OS bullseye
  • 16:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2071.codfw.wmnet with reason: host reimage
  • 16:47 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2071.codfw.wmnet with reason: host reimage
  • 16:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P45219 and previous config saved to /var/cache/conftool/dbconfig/20230307-164647-marostegui.json
  • 16:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T329203)', diff saved to https://phabricator.wikimedia.org/P45218 and previous config saved to /var/cache/conftool/dbconfig/20230307-164010-marostegui.json
  • 16:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 16:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum2001.codfw.wmnet with reason: host reimage
  • 16:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 16:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T329203)', diff saved to https://phabricator.wikimedia.org/P45217 and previous config saved to /var/cache/conftool/dbconfig/20230307-163948-marostegui.json
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T328817)', diff saved to https://phabricator.wikimedia.org/P45216 and previous config saved to /var/cache/conftool/dbconfig/20230307-163813-marostegui.json
  • 16:36 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum2001.codfw.wmnet with reason: host reimage
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122', diff saved to https://phabricator.wikimedia.org/P45215 and previous config saved to /var/cache/conftool/dbconfig/20230307-163140-marostegui.json
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T328817)', diff saved to https://phabricator.wikimedia.org/P45214 and previous config saved to /var/cache/conftool/dbconfig/20230307-162616-marostegui.json
  • 16:26 herron@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 16:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 16:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 16:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T328817)', diff saved to https://phabricator.wikimedia.org/P45213 and previous config saved to /var/cache/conftool/dbconfig/20230307-162554-marostegui.json
  • 16:25 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum2001.codfw.wmnet with OS bullseye
  • 16:25 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2071.codfw.wmnet with OS bullseye
  • 16:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45212 and previous config saved to /var/cache/conftool/dbconfig/20230307-162442-marostegui.json
  • 16:23 akosiaris@cumin1001: conftool action : set/pooled=yes; selector: service=kubesvc,name=kubernetes2016.codfw.wmnet
  • 16:21 brett@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ncredir5001.eqsin.wmnet with OS bullseye
  • 16:17 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1037']
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45211 and previous config saved to /var/cache/conftool/dbconfig/20230307-161634-marostegui.json
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1122 (T329260)', diff saved to https://phabricator.wikimedia.org/P45210 and previous config saved to /var/cache/conftool/dbconfig/20230307-161132-marostegui.json
  • 16:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 16:11 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45209 and previous config saved to /var/cache/conftool/dbconfig/20230307-161111-marostegui.json
  • 16:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45208 and previous config saved to /var/cache/conftool/dbconfig/20230307-161047-marostegui.json
  • 16:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P45207 and previous config saved to /var/cache/conftool/dbconfig/20230307-160935-marostegui.json
  • 16:08 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1037']
  • 16:04 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum1002.eqiad.wmnet with OS bullseye
  • 16:01 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 15:56 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1040']
  • 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P45206 and previous config saved to /var/cache/conftool/dbconfig/20230307-155604-marostegui.json
  • 15:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106', diff saved to https://phabricator.wikimedia.org/P45205 and previous config saved to /var/cache/conftool/dbconfig/20230307-155541-marostegui.json
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T329203)', diff saved to https://phabricator.wikimedia.org/P45204 and previous config saved to /var/cache/conftool/dbconfig/20230307-155428-marostegui.json
  • 15:53 marostegui: Failover m1-master T330165
  • 15:52 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncredir5001.eqsin.wmnet with reason: host reimage
  • 15:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 15:49 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncredir5001.eqsin.wmnet with reason: host reimage
  • 15:46 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 15:46 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1040']
  • 15:44 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
  • 15:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1040']
  • 15:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1022.eqiad.wmnet with reason: host reimage
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P45203 and previous config saved to /var/cache/conftool/dbconfig/20230307-154058-marostegui.json
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T329203)', diff saved to https://phabricator.wikimedia.org/P45202 and previous config saved to /var/cache/conftool/dbconfig/20230307-154049-marostegui.json
  • 15:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T328817)', diff saved to https://phabricator.wikimedia.org/P45201 and previous config saved to /var/cache/conftool/dbconfig/20230307-154034-marostegui.json
  • 15:40 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 15:36 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum1002.eqiad.wmnet with OS bullseye
  • 15:34 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1040']
  • 15:30 sukhe@cumin2002: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host durum1001.eqiad.wmnet with OS bullseye
  • 15:29 moritzm: installing libde265 security updates
  • 15:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 15:28 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1022.eqiad.wmnet with OS bullseye
  • 15:28 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T328817)', diff saved to https://phabricator.wikimedia.org/P45200 and previous config saved to /var/cache/conftool/dbconfig/20230307-152729-marostegui.json
  • 15:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 15:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 15:26 brett@cumin2002: START - Cookbook sre.ganeti.reimage for host ncredir5001.eqsin.wmnet with OS bullseye
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: sync
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: sync
  • 15:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 15:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45199 and previous config saved to /var/cache/conftool/dbconfig/20230307-152545-marostegui.json
  • 15:25 herron@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: sync
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/termbox: sync
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/similar-users: sync
  • 15:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/similar-users: sync
  • 15:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: sync
  • 15:24 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: sync
  • 15:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: sync
  • 15:24 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: sync
  • 15:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: sync
  • 15:24 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: sync
  • 15:23 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: sync
  • 15:23 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: sync
  • 15:23 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/similar-users: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: sync
  • 15:22 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/similar-users: sync
  • 15:21 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1021.eqiad.wmnet with OS bullseye
  • 15:21 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/tegola-vector-tiles: sync
  • 15:21 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: sync
  • 15:21 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: sync
  • 15:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1039']
  • 15:21 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/tegola-vector-tiles: sync
  • 15:20 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1039']
  • 15:20 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: sync
  • 15:20 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/push-notifications: sync
  • 15:20 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: sync
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T329260)', diff saved to https://phabricator.wikimedia.org/P45198 and previous config saved to /var/cache/conftool/dbconfig/20230307-152037-marostegui.json
  • 15:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1039']
  • 15:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 15:20 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/termbox: sync
  • 15:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 15:19 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: sync
  • 15:19 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: sync
  • 15:19 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
  • 15:19 Emperor: pool thanos-fe1001 T329073
  • 15:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 15:19 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
  • 15:19 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: sync
  • 15:19 mvernon@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
  • 15:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: sync
  • 15:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 15:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 15:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 15:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 15:16 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1001.eqiad.wmnet with reason: host reimage
  • 15:16 Emperor: pool ms-fe1009 T329073
  • 15:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 15:16 Emperor: pool moss-fe1001 T329073
  • 15:15 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: sync
  • 15:15 akosiaris@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:15 akosiaris@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 15:15 akosiaris@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 15:15 akosiaris@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 15:15 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: sync
  • 15:11 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: sync
  • 15:11 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: sync
  • 15:11 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: sync
  • 15:11 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1039']
  • 15:11 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: sync
  • 15:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1038']
  • 15:06 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: sync
  • 15:06 sukhe@cumin2002: START - Cookbook sre.ganeti.reimage for host durum1001.eqiad.wmnet with OS bullseye
  • 15:06 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: sync
  • 15:06 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: sync
  • 15:06 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: sync
  • 15:04 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
  • 15:04 bblack: dns1001 - restarted prometheus-bird-exporter
  • 15:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: sync
  • 15:04 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: sync
  • 15:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: sync
  • 15:04 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/SERVICE_NAME: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/SERVICE_NAME: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: sync
  • 15:02 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: sync
  • 15:01 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: sync
  • 15:01 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1021.eqiad.wmnet with reason: host reimage
  • 15:01 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: sync
  • 15:01 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 15:01 sukhe: repooling dns1001: authdns-update can now be run again
  • 15:01 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 15:01 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: sync
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: sync
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: sync
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: sync
  • 15:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: sync
  • 14:59 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: sync
  • 14:59 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: sync
  • 14:59 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: sync
  • 14:59 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 14:59 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase101[69].eqiad.wmnet
  • 14:58 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase102[18].eqiad.wmnet
  • 14:58 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: name=restbase1031.eqiad.wmnet
  • 14:58 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1038']
  • 14:58 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: sync on main
  • 14:58 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: sync
  • 14:58 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: sync
  • 14:58 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync
  • 14:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: sync
  • 14:57 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 14:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 14:57 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 14:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 14:56 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: sync
  • 14:56 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: sync
  • 14:56 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 14:56 inflatador: bking@cumin2002 unban production row A elastic nodes from all clusters T329073
  • 14:56 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 14:56 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: sync
  • 14:55 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: sync
  • 14:54 akosiaris: T331126 toolhub deployed, https://toolhub.wikimedia.org/ operational again
  • 14:53 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: sync
  • 14:53 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: sync
  • 14:52 inflatador: bking@cumin2002 unban row A cloudelastic nodes T329073
  • 14:47 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1021.eqiad.wmnet with OS bullseye
  • 14:45 akosiaris: uncordon kubernetes{1005,1007,1008,1017,1018}.eqiad.wmnet T331126
  • 14:45 akosiaris: uncordon kubernetes{1005,1007,1008,1017,1018}.eqiad.wmnet
  • 14:44 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:43 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:43 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 238 hosts
  • 14:43 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:43 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:43 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:42 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:42 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for 238 hosts
  • 14:42 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mr1-eqiad
  • 14:42 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for mr1-eqiad
  • 14:41 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:41 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:41 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:41 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:41 moritzm: enabling Puppet in eqiad/esams/drmrs after completed Switch maintenance T329073
  • 14:40 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:40 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:38 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:36 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:29 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:26 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:25 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:25 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:24 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:24 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:21 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:21 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:21 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:20 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:20 topranks: issuing reboot to upgrade asw2-a-eqiad virtual-chassis to Junos 21.4
  • 14:20 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:19 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:19 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1038']
  • 14:17 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1020.eqiad.wmnet with OS bullseye
  • 14:16 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mr1-eqiad with reason: eqiad row A upgrade
  • 14:16 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on mr1-eqiad with reason: eqiad row A upgrade
  • 14:15 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1037']
  • 14:13 akosiaris: kubectl cordon kubernetes{1005,1007,1008,1017,1018}.eqiad.wmnet T329073
  • 14:13 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2070.codfw.wmnet with OS bullseye
  • 14:12 mvernon@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
  • 14:09 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1038']
  • 14:09 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 238 hosts with reason: eqiad row A upgrade
  • 14:09 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1038']
  • 14:09 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1038']
  • 14:08 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
  • 14:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1020.eqiad.wmnet with reason: host reimage
  • 14:07 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1037']
  • 14:07 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 238 hosts with reason: eqiad row A upgrade
  • 14:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase1031.eqiad.wmnet
  • 14:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase102[18].eqiad.wmnet
  • 14:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: name=restbase101[69].eqiad.wmnet
  • 14:02 mvernon@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1001"
  • 13:59 jbond: failover pki.discovery.wmnet to codfw T329073
  • 13:58 Emperor: depool thanos-fe1001 T329073
  • 13:55 Emperor: depool ms-fe1009 T329073
  • 13:55 Emperor: depool moss-fe1001 T329073
  • 13:54 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1020.eqiad.wmnet with OS bullseye
  • 13:50 moritzm: disabling Puppet in eqiad/esams/drmrs for forthcoming Switch maintenance T329073
  • 13:50 topranks: staging Junos files to individual VC members eqiad row A to prep for upgrade
  • 13:15 otto@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:15 otto@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:14 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1019.eqiad.wmnet with OS bullseye
  • 13:04 moritzm: drain ganeti1011 for eventual reimage to Bullseye T311687
  • 13:00 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1018.eqiad.wmnet with OS bullseye
  • 12:57 sukhe: removing dns1001 from authdns_servers for T329073
  • 12:55 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
  • 12:52 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1019.eqiad.wmnet with reason: host reimage
  • 12:44 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
  • 12:41 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1018.eqiad.wmnet with reason: host reimage
  • 12:38 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1019.eqiad.wmnet with OS bullseye
  • 12:37 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1017.eqiad.wmnet with OS bullseye
  • 12:27 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1018.eqiad.wmnet with OS bullseye
  • 12:25 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes1015.eqiad.wmnet with OS bullseye
  • 12:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1017.eqiad.wmnet with reason: host reimage
  • 12:19 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:19 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:17 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1017.eqiad.wmnet with reason: host reimage
  • 12:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:15 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:15 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:15 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:15 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:15 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:14 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:14 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:14 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:14 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:14 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:14 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes1016.eqiad.wmnet with OS bullseye
  • 12:13 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:13 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:13 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:12 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:12 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:12 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:12 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:12 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:11 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:10 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:10 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:10 jayme@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:09 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1015.eqiad.wmnet with reason: host reimage
  • 12:09 jayme@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:09 jayme@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:09 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:08 jayme@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:08 jayme@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:08 jayme@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:08 jayme@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:07 jayme@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:06 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
  • 12:06 jayme@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:06 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1015.eqiad.wmnet with reason: host reimage
  • 12:06 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:06 jayme@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:06 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:06 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:05 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:05 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:05 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:03 mvernon@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
  • 12:03 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1017.eqiad.wmnet with OS bullseye
  • 12:01 jayme@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:01 jayme@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:01 jayme@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:00 jayme@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:59 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1016.eqiad.wmnet with reason: host reimage
  • 11:56 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1016.eqiad.wmnet with reason: host reimage
  • 11:54 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1015.eqiad.wmnet with OS bullseye
  • 11:47 mvernon@cumin1001: START - Cookbook sre.hosts.reimage for host ms-be2070.codfw.wmnet with OS bullseye
  • 11:45 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:44 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:43 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1016.eqiad.wmnet with OS bullseye
  • 11:42 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1014.eqiad.wmnet with OS bullseye
  • 11:38 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1008.eqiad.wmnet with OS bullseye
  • 11:38 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1010.eqiad.wmnet with OS bullseye
  • 11:38 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1009.eqiad.wmnet with OS bullseye
  • 11:37 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubernetes1015.eqiad.wmnet with OS bullseye
  • 11:36 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1011.eqiad.wmnet with OS bullseye
  • 11:33 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1012.eqiad.wmnet with OS bullseye
  • 11:29 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubernetes1007.eqiad.wmnet with OS bullseye
  • 11:28 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubernetes1005.eqiad.wmnet with OS bullseye
  • 11:28 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host kubernetes1013.eqiad.wmnet with OS bullseye
  • 11:24 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: host reimage
  • 11:23 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubernetes1006.eqiad.wmnet with OS bullseye
  • 11:21 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
  • 11:21 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
  • 11:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
  • 11:19 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
  • 11:19 akosiaris@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: host reimage
  • 11:17 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: host reimage
  • 11:14 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1006.eqiad.wmnet with reason: host reimage
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T329203)', diff saved to https://phabricator.wikimedia.org/P45193 and previous config saved to /var/cache/conftool/dbconfig/20230307-111421-marostegui.json
  • 11:14 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1014.eqiad.wmnet with reason: host reimage
  • 11:14 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1013.eqiad.wmnet with reason: host reimage
  • 11:13 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1012.eqiad.wmnet with reason: host reimage
  • 11:13 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1011.eqiad.wmnet with reason: host reimage
  • 11:12 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1010.eqiad.wmnet with reason: host reimage
  • 11:12 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1009.eqiad.wmnet with reason: host reimage
  • 11:12 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
  • 11:11 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1008.eqiad.wmnet with reason: host reimage
  • 11:09 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubernetes1005.eqiad.wmnet with reason: host reimage
  • 11:08 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1007.eqiad.wmnet with reason: host reimage
  • 11:06 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1006.eqiad.wmnet with reason: host reimage
  • 11:06 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubernetes1005.eqiad.wmnet with reason: host reimage
  • 11:05 akosiaris@cumin1001: END (FAIL) - Cookbook sre.ganeti.reimage (exit_code=99) for host kubernetes1016.eqiad.wmnet with OS bullseye
  • 11:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1014.eqiad.wmnet with OS bullseye
  • 11:00 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1013.eqiad.wmnet with OS bullseye
  • 10:59 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1012.eqiad.wmnet with OS bullseye
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P45192 and previous config saved to /var/cache/conftool/dbconfig/20230307-105914-marostegui.json
  • 10:59 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1011.eqiad.wmnet with OS bullseye
  • 10:59 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1010.eqiad.wmnet with OS bullseye
  • 10:58 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1009.eqiad.wmnet with OS bullseye
  • 10:57 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1008.eqiad.wmnet with OS bullseye
  • 10:56 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1016.eqiad.wmnet with OS bullseye
  • 10:55 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1015.eqiad.wmnet with OS bullseye
  • 10:54 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1006.eqiad.wmnet with OS bullseye
  • 10:54 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubernetes1005.eqiad.wmnet with OS bullseye
  • 10:53 akosiaris@cumin1001: START - Cookbook sre.hosts.reimage for host kubernetes1007.eqiad.wmnet with OS bullseye
  • 10:51 akosiaris: manually label kubemaster1001, kubemaster1002 giving them role master T307943
  • 10:48 arturo: apt2001: pull latest packages for thirdparty/kubeadm-k8s-1-22 buster-wikimedia (T286856)
  • 10:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P45191 and previous config saved to /var/cache/conftool/dbconfig/20230307-104408-marostegui.json
  • 10:39 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubemaster1001.eqiad.wmnet with OS bullseye
  • 10:38 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubemaster1002.eqiad.wmnet with OS bullseye
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T329203)', diff saved to https://phabricator.wikimedia.org/P45190 and previous config saved to /var/cache/conftool/dbconfig/20230307-102901-marostegui.json
  • 10:28 arturo: apt1001: pull latest packages for thirdparty/kubeadm-k8s-1-22 buster-wikimedia (T286856)
  • 10:21 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubemaster1002.eqiad.wmnet with reason: host reimage
  • 10:19 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubemaster1001.eqiad.wmnet with reason: host reimage
  • 10:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubemaster1002.eqiad.wmnet with reason: host reimage
  • 10:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubemaster1001.eqiad.wmnet with reason: host reimage
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T329203)', diff saved to https://phabricator.wikimedia.org/P45189 and previous config saved to /var/cache/conftool/dbconfig/20230307-100807-marostegui.json
  • 10:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 10:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45188 and previous config saved to /var/cache/conftool/dbconfig/20230307-100745-marostegui.json
  • 10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubemaster1002.eqiad.wmnet with OS bullseye
  • 10:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubemaster1001.eqiad.wmnet with OS bullseye
  • 10:05 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubetcd1005.eqiad.wmnet with OS bullseye
  • 09:54 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubetcd1006.eqiad.wmnet with OS bullseye
  • 09:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P45187 and previous config saved to /var/cache/conftool/dbconfig/20230307-095239-marostegui.json
  • 09:39 akosiaris: schedule downtime for PyBal backends health on lvs1019, lvs1020
  • 09:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P45186 and previous config saved to /var/cache/conftool/dbconfig/20230307-093732-marostegui.json
  • 09:35 akosiaris@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host kubetcd1004.eqiad.wmnet with OS bullseye
  • 09:33 moritzm: installing apr-util security updates on Bullseye
  • 09:23 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd1004.eqiad.wmnet with reason: host reimage
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45184 and previous config saved to /var/cache/conftool/dbconfig/20230307-092226-marostegui.json
  • 09:21 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd1006.eqiad.wmnet with reason: host reimage
  • 09:18 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubetcd1005.eqiad.wmnet with reason: host reimage
  • 09:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd1006.eqiad.wmnet with reason: host reimage
  • 09:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd1004.eqiad.wmnet with reason: host reimage
  • 09:16 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubetcd1005.eqiad.wmnet with reason: host reimage
  • 09:14 moritzm: installing PHP 7.4 security updates (as packaged in Debian Bullseye, not our internal build for Buster)
  • 09:07 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd1006.eqiad.wmnet with OS bullseye
  • 09:06 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd1005.eqiad.wmnet with OS bullseye
  • 09:06 akosiaris@cumin1001: START - Cookbook sre.ganeti.reimage for host kubetcd1004.eqiad.wmnet with OS bullseye
  • 09:02 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=blubberoid,name=eqiad
  • 09:02 akosiaris@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=wdqs,name=eqiad
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45182 and previous config saved to /var/cache/conftool/dbconfig/20230307-090130-marostegui.json
  • 09:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45181 and previous config saved to /var/cache/conftool/dbconfig/20230307-090109-marostegui.json
  • 08:51 akosiaris: T331126 Scheduled 24H downtime for all wikikube eqiad hosts and all LVS services powered by the cluster
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P45180 and previous config saved to /var/cache/conftool/dbconfig/20230307-084602-marostegui.json
  • 08:43 dcausse: closing the UTC morning backport window
  • 08:42 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1003.eqiad.wmnet with OS bullseye
  • 08:41 dcausse@deploy2002: Finished scap: Backport for Properly pass the page id on page moves (T331127) (duration: 16m 34s)
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'Remove db1101 from dbctl T329352', diff saved to https://phabricator.wikimedia.org/P45179 and previous config saved to /var/cache/conftool/dbconfig/20230307-083542-marostegui.json
  • 08:34 akosiaris@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 23 hosts with reason: Reinitialize eqiad with k8s 1.23
  • 08:33 akosiaris@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 23 hosts with reason: Reinitialize eqiad with k8s 1.23
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P45178 and previous config saved to /var/cache/conftool/dbconfig/20230307-083056-marostegui.json
  • 08:28 dcausse@deploy2002: dcausse: Backport for Properly pass the page id on page moves (T331127) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 08:24 dcausse@deploy2002: Started scap: Backport for Properly pass the page id on page moves (T331127)
  • 08:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:23 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:22 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:22 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:22 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1003.eqiad.wmnet with reason: host reimage
  • 08:21 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-codfw
  • 08:20 marostegui: Failover m3 from db1159 to db1101 - T331384
  • 08:20 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 08:19 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1003.eqiad.wmnet with reason: host reimage
  • 08:18 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-codfw
  • 08:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45177 and previous config saved to /var/cache/conftool/dbconfig/20230307-081549-marostegui.json
  • 08:15 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:14 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:14 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 08:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
  • 08:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2134,2160].codfw.wmnet,db[1101,1117,1159].eqiad.wmnet with reason: m3 master switchover T331384
  • 08:09 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
  • 08:07 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-conf1003.eqiad.wmnet with OS bullseye
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P45176 and previous config saved to /var/cache/conftool/dbconfig/20230307-075453-marostegui.json
  • 07:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T329203)', diff saved to https://phabricator.wikimedia.org/P45175 and previous config saved to /var/cache/conftool/dbconfig/20230307-075443-marostegui.json
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P45174 and previous config saved to /var/cache/conftool/dbconfig/20230307-073936-marostegui.json
  • 07:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 15 hosts with reason: Row A switch maintenance T329073
  • 07:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 15 hosts with reason: Row A switch maintenance T329073
  • 07:34 vgutierrez: enable haproxy systemd service unit hardening in cp4044 - T323944
  • 07:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db[2142-2144].codfw.wmnet with reason: Row A switch maintenance T329073
  • 07:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db[2142-2144].codfw.wmnet with reason: Row A switch maintenance T329073
  • 07:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db[1151-1153].eqiad.wmnet with reason: Row A switch maintenance T329073
  • 07:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db[1151-1153].eqiad.wmnet with reason: Row A switch maintenance T329073
  • 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1115.eqiad.wmnet with reason: Row A switch maintenance T329073
  • 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1115.eqiad.wmnet with reason: Row A switch maintenance T329073
  • 07:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on 6 hosts with reason: Row A switch maintenance T329073
  • 07:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on 6 hosts with reason: Row A switch maintenance T329073
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1101 (s7,s8) T331381', diff saved to https://phabricator.wikimedia.org/P45172 and previous config saved to /var/cache/conftool/dbconfig/20230307-072454-root.json
  • 07:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P45171 and previous config saved to /var/cache/conftool/dbconfig/20230307-072429-marostegui.json
  • 07:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T329203)', diff saved to https://phabricator.wikimedia.org/P45170 and previous config saved to /var/cache/conftool/dbconfig/20230307-070923-marostegui.json
  • 06:54 marostegui: dbmaint eqiad s1 T329203
  • 06:53 marostegui: dbmaint eqiad s4 T329203
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T329203)', diff saved to https://phabricator.wikimedia.org/P45169 and previous config saved to /var/cache/conftool/dbconfig/20230307-064752-marostegui.json
  • 06:47 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 06:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T329203)', diff saved to https://phabricator.wikimedia.org/P45168 and previous config saved to /var/cache/conftool/dbconfig/20230307-064730-marostegui.json
  • 06:43 marostegui: dbmaint eqiad s4 T328817
  • 06:43 marostegui: dbmaint eqiad s1 T328817
  • 06:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 34 hosts with reason: Schema change on s4 eqiad
  • 06:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 34 hosts with reason: Schema change on s4 eqiad
  • 06:40 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 37 hosts with reason: Schema change on s1 eqiad
  • 06:39 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on 37 hosts with reason: Schema change on s1 eqiad
  • 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2095.codfw.wmnet
  • 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:36 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2095.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:34 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2095.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 06:32 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P45167 and previous config saved to /var/cache/conftool/dbconfig/20230307-063223-marostegui.json
  • 06:28 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2095.codfw.wmnet
  • 06:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 06:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 06:20 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P45166 and previous config saved to /var/cache/conftool/dbconfig/20230307-061717-marostegui.json
  • 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 06:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 06:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T329203)', diff saved to https://phabricator.wikimedia.org/P45165 and previous config saved to /var/cache/conftool/dbconfig/20230307-060210-marostegui.json
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T329203)', diff saved to https://phabricator.wikimedia.org/P45164 and previous config saved to /var/cache/conftool/dbconfig/20230307-054153-marostegui.json
  • 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T329203)', diff saved to https://phabricator.wikimedia.org/P45163 and previous config saved to /var/cache/conftool/dbconfig/20230307-054127-marostegui.json
  • 05:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P45162 and previous config saved to /var/cache/conftool/dbconfig/20230307-052620-marostegui.json
  • 05:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P45161 and previous config saved to /var/cache/conftool/dbconfig/20230307-051113-marostegui.json
  • 04:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T329203)', diff saved to https://phabricator.wikimedia.org/P45160 and previous config saved to /var/cache/conftool/dbconfig/20230307-045607-marostegui.json
  • 03:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T329203)', diff saved to https://phabricator.wikimedia.org/P45159 and previous config saved to /var/cache/conftool/dbconfig/20230307-035541-marostegui.json
  • 03:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 03:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 03:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T329203)', diff saved to https://phabricator.wikimedia.org/P45158 and previous config saved to /var/cache/conftool/dbconfig/20230307-035520-marostegui.json
  • 03:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P45157 and previous config saved to /var/cache/conftool/dbconfig/20230307-034013-marostegui.json
  • 03:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P45156 and previous config saved to /var/cache/conftool/dbconfig/20230307-032506-marostegui.json
  • 03:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T329203)', diff saved to https://phabricator.wikimedia.org/P45155 and previous config saved to /var/cache/conftool/dbconfig/20230307-031000-marostegui.json
  • 02:49 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T329203)', diff saved to https://phabricator.wikimedia.org/P45154 and previous config saved to /var/cache/conftool/dbconfig/20230307-024912-marostegui.json
  • 02:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 02:48 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 02:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T329203)', diff saved to https://phabricator.wikimedia.org/P45153 and previous config saved to /var/cache/conftool/dbconfig/20230307-024850-marostegui.json
  • 02:34 eileen: civicrm upgraded from fe2c06f6 to dbe3b716
  • 02:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P45152 and previous config saved to /var/cache/conftool/dbconfig/20230307-023344-marostegui.json
  • 02:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P45151 and previous config saved to /var/cache/conftool/dbconfig/20230307-021837-marostegui.json
  • 02:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T329203)', diff saved to https://phabricator.wikimedia.org/P45150 and previous config saved to /var/cache/conftool/dbconfig/20230307-020330-marostegui.json
  • 01:41 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T329203)', diff saved to https://phabricator.wikimedia.org/P45149 and previous config saved to /var/cache/conftool/dbconfig/20230307-014152-marostegui.json
  • 01:41 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 01:41 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 01:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T329203)', diff saved to https://phabricator.wikimedia.org/P45148 and previous config saved to /var/cache/conftool/dbconfig/20230307-014130-marostegui.json
  • 01:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P45147 and previous config saved to /var/cache/conftool/dbconfig/20230307-012624-marostegui.json
  • 01:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P45146 and previous config saved to /var/cache/conftool/dbconfig/20230307-011117-marostegui.json
  • 00:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T329203)', diff saved to https://phabricator.wikimedia.org/P45145 and previous config saved to /var/cache/conftool/dbconfig/20230307-005611-marostegui.json
  • 00:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2154 (T329203)', diff saved to https://phabricator.wikimedia.org/P45144 and previous config saved to /var/cache/conftool/dbconfig/20230307-003547-marostegui.json
  • 00:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 00:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 00:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T329203)', diff saved to https://phabricator.wikimedia.org/P45143 and previous config saved to /var/cache/conftool/dbconfig/20230307-003525-marostegui.json
  • 00:23 mutante: people* - determined which users did not have a public_html dir in codfw but did in eqiad. created that dir, rsynced via push from people1003 to people2002 for the 7 affected users. re-enabled temp disabled puppet to restore live-hacked rsync config. T330091
  • 00:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P45142 and previous config saved to /var/cache/conftool/dbconfig/20230307-002019-marostegui.json
  • 00:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P45141 and previous config saved to /var/cache/conftool/dbconfig/20230307-000512-marostegui.json

2023-03-06

  • 23:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T329203)', diff saved to https://phabricator.wikimedia.org/P45140 and previous config saved to /var/cache/conftool/dbconfig/20230306-235006-marostegui.json
  • 23:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T329203)', diff saved to https://phabricator.wikimedia.org/P45139 and previous config saved to /var/cache/conftool/dbconfig/20230306-232933-marostegui.json
  • 23:29 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 23:29 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 23:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wcqs1001.eqiad.wmnet,wdqs[1003-1004,1006,1011].eqiad.wmnet with reason: switch maintenance
  • 23:20 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wcqs1001.eqiad.wmnet,wdqs[1003-1004,1006,1011].eqiad.wmnet with reason: switch maintenance
  • 23:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 12 hosts with reason: switch maintenance
  • 23:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 12 hosts with reason: switch maintenance
  • 23:16 inflatador: bking@cumin2002 ban row A cloudelastic hosts T329073
  • 23:11 mforns@deploy2002: Finished deploy [airflow-dags/analytics@53a0280]: (no justification provided) (duration: 00m 17s)
  • 23:11 mforns@deploy2002: Started deploy [airflow-dags/analytics@53a0280]: (no justification provided)
  • 23:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 23:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 23:05 ryankemper: T329073 Pre-emptively depooled internal wdqs hosts `wdqs10[03,11]`
  • 23:04 inflatador: bking@cumin2002 'depool wcqs and wdqs row A hosts T329073'
  • 22:49 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 22:49 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 22:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 22:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 22:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T329203)', diff saved to https://phabricator.wikimedia.org/P45138 and previous config saved to /var/cache/conftool/dbconfig/20230306-223044-marostegui.json
  • 22:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P45137 and previous config saved to /var/cache/conftool/dbconfig/20230306-221537-marostegui.json
  • 22:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P45136 and previous config saved to /var/cache/conftool/dbconfig/20230306-220031-marostegui.json
  • 21:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T329203)', diff saved to https://phabricator.wikimedia.org/P45135 and previous config saved to /var/cache/conftool/dbconfig/20230306-214524-marostegui.json
  • 21:45 herron@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 21:24 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T329203)', diff saved to https://phabricator.wikimedia.org/P45133 and previous config saved to /var/cache/conftool/dbconfig/20230306-212358-marostegui.json
  • 21:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 21:23 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 21:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T329203)', diff saved to https://phabricator.wikimedia.org/P45132 and previous config saved to /var/cache/conftool/dbconfig/20230306-212336-marostegui.json
  • 21:19 zabe@deploy2002: Finished scap: Backport for Enable new Linter UI for namespace, tag and template for group0 wikis (T299612) (duration: 16m 59s)
  • 21:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P45131 and previous config saved to /var/cache/conftool/dbconfig/20230306-210829-marostegui.json
  • 21:04 zabe@deploy2002: zabe and sbailey: Backport for Enable new Linter UI for namespace, tag and template for group0 wikis (T299612) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:02 zabe@deploy2002: Started scap: Backport for Enable new Linter UI for namespace, tag and template for group0 wikis (T299612)
  • 20:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P45130 and previous config saved to /var/cache/conftool/dbconfig/20230306-205322-marostegui.json
  • 20:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T329203)', diff saved to https://phabricator.wikimedia.org/P45129 and previous config saved to /var/cache/conftool/dbconfig/20230306-203816-marostegui.json
  • 20:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T329203)', diff saved to https://phabricator.wikimedia.org/P45128 and previous config saved to /var/cache/conftool/dbconfig/20230306-201704-marostegui.json
  • 20:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 20:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 20:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T329203)', diff saved to https://phabricator.wikimedia.org/P45127 and previous config saved to /var/cache/conftool/dbconfig/20230306-201643-marostegui.json
  • 20:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328817)', diff saved to https://phabricator.wikimedia.org/P45126 and previous config saved to /var/cache/conftool/dbconfig/20230306-200843-marostegui.json
  • 20:04 herron@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 20:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T329260)', diff saved to https://phabricator.wikimedia.org/P45125 and previous config saved to /var/cache/conftool/dbconfig/20230306-200354-marostegui.json
  • 20:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P45124 and previous config saved to /var/cache/conftool/dbconfig/20230306-200136-marostegui.json
  • 19:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45123 and previous config saved to /var/cache/conftool/dbconfig/20230306-195336-marostegui.json
  • 19:51 derick@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 19:49 derick@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 19:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P45122 and previous config saved to /var/cache/conftool/dbconfig/20230306-194848-marostegui.json
  • 19:48 derick@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 19:47 derick@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 19:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P45121 and previous config saved to /var/cache/conftool/dbconfig/20230306-194630-marostegui.json
  • 19:45 derick@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 19:44 derick@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
  • 19:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P45120 and previous config saved to /var/cache/conftool/dbconfig/20230306-193829-marostegui.json
  • 19:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P45119 and previous config saved to /var/cache/conftool/dbconfig/20230306-193341-marostegui.json
  • 19:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T329203)', diff saved to https://phabricator.wikimedia.org/P45118 and previous config saved to /var/cache/conftool/dbconfig/20230306-193123-marostegui.json
  • 19:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T328817)', diff saved to https://phabricator.wikimedia.org/P45117 and previous config saved to /var/cache/conftool/dbconfig/20230306-192322-marostegui.json
  • 19:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T329260)', diff saved to https://phabricator.wikimedia.org/P45116 and previous config saved to /var/cache/conftool/dbconfig/20230306-191835-marostegui.json
  • 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T329260)', diff saved to https://phabricator.wikimedia.org/P45115 and previous config saved to /var/cache/conftool/dbconfig/20230306-191622-marostegui.json
  • 19:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 19:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 19:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45114 and previous config saved to /var/cache/conftool/dbconfig/20230306-191600-marostegui.json
  • 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T329203)', diff saved to https://phabricator.wikimedia.org/P45113 and previous config saved to /var/cache/conftool/dbconfig/20230306-190943-marostegui.json
  • 19:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 19:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 19:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T329203)', diff saved to https://phabricator.wikimedia.org/P45112 and previous config saved to /var/cache/conftool/dbconfig/20230306-190921-marostegui.json
  • 19:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P45111 and previous config saved to /var/cache/conftool/dbconfig/20230306-190054-marostegui.json
  • 18:56 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1036']
  • 18:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T328817)', diff saved to https://phabricator.wikimedia.org/P45110 and previous config saved to /var/cache/conftool/dbconfig/20230306-185559-marostegui.json
  • 18:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 18:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 18:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328817)', diff saved to https://phabricator.wikimedia.org/P45109 and previous config saved to /var/cache/conftool/dbconfig/20230306-185537-marostegui.json
  • 18:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P45108 and previous config saved to /var/cache/conftool/dbconfig/20230306-185415-marostegui.json
  • 18:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P45107 and previous config saved to /var/cache/conftool/dbconfig/20230306-184547-marostegui.json
  • 18:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45106 and previous config saved to /var/cache/conftool/dbconfig/20230306-184030-marostegui.json
  • 18:40 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1035']
  • 18:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P45105 and previous config saved to /var/cache/conftool/dbconfig/20230306-183908-marostegui.json
  • 18:38 mutante: phabricator - locked and archived project acl*discovery-repository-admins (T324171)
  • 18:34 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
  • 18:34 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1035']
  • 18:34 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
  • 18:33 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1035']
  • 18:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45104 and previous config saved to /var/cache/conftool/dbconfig/20230306-183040-marostegui.json
  • 18:25 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1036']
  • 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P45103 and previous config saved to /var/cache/conftool/dbconfig/20230306-182524-marostegui.json
  • 18:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45102 and previous config saved to /var/cache/conftool/dbconfig/20230306-182508-marostegui.json
  • 18:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 18:24 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45101 and previous config saved to /var/cache/conftool/dbconfig/20230306-182447-marostegui.json
  • 18:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T329203)', diff saved to https://phabricator.wikimedia.org/P45100 and previous config saved to /var/cache/conftool/dbconfig/20230306-182402-marostegui.json
  • 18:23 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
  • 18:21 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1035']
  • 18:21 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
  • 18:21 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1035']
  • 18:12 cmjohnson@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1035']
  • 18:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T328817)', diff saved to https://phabricator.wikimedia.org/P45099 and previous config saved to /var/cache/conftool/dbconfig/20230306-181017-marostegui.json
  • 18:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P45098 and previous config saved to /var/cache/conftool/dbconfig/20230306-180940-marostegui.json
  • 18:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T329203)', diff saved to https://phabricator.wikimedia.org/P45097 and previous config saved to /var/cache/conftool/dbconfig/20230306-180249-marostegui.json
  • 18:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 18:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 18:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45096 and previous config saved to /var/cache/conftool/dbconfig/20230306-180228-marostegui.json
  • 17:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P45095 and previous config saved to /var/cache/conftool/dbconfig/20230306-175433-marostegui.json
  • 17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T328817)', diff saved to https://phabricator.wikimedia.org/P45094 and previous config saved to /var/cache/conftool/dbconfig/20230306-175254-marostegui.json
  • 17:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 17:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328817)', diff saved to https://phabricator.wikimedia.org/P45093 and previous config saved to /var/cache/conftool/dbconfig/20230306-175218-marostegui.json
  • 17:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P45092 and previous config saved to /var/cache/conftool/dbconfig/20230306-174721-marostegui.json
  • 17:42 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:41 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45091 and previous config saved to /var/cache/conftool/dbconfig/20230306-173927-marostegui.json
  • 17:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45090 and previous config saved to /var/cache/conftool/dbconfig/20230306-173711-marostegui.json
  • 17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P45089 and previous config saved to /var/cache/conftool/dbconfig/20230306-173350-marostegui.json
  • 17:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 17:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 17:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T329260)', diff saved to https://phabricator.wikimedia.org/P45088 and previous config saved to /var/cache/conftool/dbconfig/20230306-173328-marostegui.json
  • 17:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P45087 and previous config saved to /var/cache/conftool/dbconfig/20230306-173215-marostegui.json
  • 17:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P45086 and previous config saved to /var/cache/conftool/dbconfig/20230306-172205-marostegui.json
  • 17:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P45085 and previous config saved to /var/cache/conftool/dbconfig/20230306-171821-marostegui.json
  • 17:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45084 and previous config saved to /var/cache/conftool/dbconfig/20230306-171708-marostegui.json
  • 17:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T328817)', diff saved to https://phabricator.wikimedia.org/P45083 and previous config saved to /var/cache/conftool/dbconfig/20230306-170657-marostegui.json
  • 17:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P45082 and previous config saved to /var/cache/conftool/dbconfig/20230306-170315-marostegui.json
  • 16:54 andrew@deploy2002: Finished deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names (take two) -- T330759 (duration: 05m 19s)
  • 16:49 andrew@deploy2002: Started deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names (take two) -- T330759
  • 16:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T329260)', diff saved to https://phabricator.wikimedia.org/P45081 and previous config saved to /var/cache/conftool/dbconfig/20230306-164808-marostegui.json
  • 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T329260)', diff saved to https://phabricator.wikimedia.org/P45080 and previous config saved to /var/cache/conftool/dbconfig/20230306-164245-marostegui.json
  • 16:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 16:42 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 16:42 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=0) rolling restart_daemons on A:restbase-codfw
  • 16:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 16:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T329260)', diff saved to https://phabricator.wikimedia.org/P45079 and previous config saved to /var/cache/conftool/dbconfig/20230306-164158-marostegui.json
  • 16:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T328817)', diff saved to https://phabricator.wikimedia.org/P45078 and previous config saved to /var/cache/conftool/dbconfig/20230306-163806-marostegui.json
  • 16:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 16:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 16:32 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on A:restbase-codfw
  • 16:29 volans@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P45077 and previous config saved to /var/cache/conftool/dbconfig/20230306-162651-marostegui.json
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T329203)', diff saved to https://phabricator.wikimedia.org/P45076 and previous config saved to /var/cache/conftool/dbconfig/20230306-161652-marostegui.json
  • 16:16 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 16:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 16:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45075 and previous config saved to /var/cache/conftool/dbconfig/20230306-161631-marostegui.json
  • 16:13 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:13 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T328817)', diff saved to https://phabricator.wikimedia.org/P45074 and previous config saved to /var/cache/conftool/dbconfig/20230306-161321-marostegui.json
  • 16:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P45073 and previous config saved to /var/cache/conftool/dbconfig/20230306-161144-marostegui.json
  • 16:05 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2014.codfw.wmnet
  • 16:05 eevans@puppetmaster1001: conftool action : set/weight=40; selector: name=ms-fe2014.codfw.wmnet
  • 16:05 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2014.codfw.wmnet
  • 16:04 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift,name=ms-fe2014.codfw.wmnet
  • 16:03 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ms-fe2013.codfw.wmnet
  • 16:02 eevans@puppetmaster1001: conftool action : set/weight=40; selector: name=ms-fe2013.codfw.wmnet
  • 16:01 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2013.codfw.wmnet
  • 16:01 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift,name=ms-fe2013.codfw.wmnet
  • 16:01 eevans@puppetmaster1001: conftool action : set/pooled=yes; selector: service=swift,name=ms-fe2013.codfw.wmnet
  • 16:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P45072 and previous config saved to /var/cache/conftool/dbconfig/20230306-160124-marostegui.json
  • 15:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45071 and previous config saved to /var/cache/conftool/dbconfig/20230306-155815-marostegui.json
  • 15:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T329260)', diff saved to https://phabricator.wikimedia.org/P45070 and previous config saved to /var/cache/conftool/dbconfig/20230306-155638-marostegui.json
  • 15:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T329260)', diff saved to https://phabricator.wikimedia.org/P45069 and previous config saved to /var/cache/conftool/dbconfig/20230306-155428-marostegui.json
  • 15:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 15:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 15:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 15:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 15:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T329260)', diff saved to https://phabricator.wikimedia.org/P45068 and previous config saved to /var/cache/conftool/dbconfig/20230306-155030-marostegui.json
  • 15:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P45067 and previous config saved to /var/cache/conftool/dbconfig/20230306-154618-marostegui.json
  • 15:45 otto@deploy2002: Finished deploy [analytics/refinery@ee8981b] (hadoop-test): (no justification provided) (duration: 01m 25s)
  • 15:44 otto@deploy2002: Started deploy [analytics/refinery@ee8981b] (hadoop-test): (no justification provided)
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P45066 and previous config saved to /var/cache/conftool/dbconfig/20230306-154308-marostegui.json
  • 15:40 otto@deploy2002: Finished deploy [analytics/refinery@d4d723a] (hadoop-test): (no justification provided) (duration: 01m 27s)
  • 15:39 otto@deploy2002: Started deploy [analytics/refinery@d4d723a] (hadoop-test): (no justification provided)
  • 15:36 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2014.codfw.wmnet
  • 15:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P45065 and previous config saved to /var/cache/conftool/dbconfig/20230306-153524-marostegui.json
  • 15:35 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2013.codfw.wmnet
  • 15:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45064 and previous config saved to /var/cache/conftool/dbconfig/20230306-153111-marostegui.json
  • 15:30 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on ml-serve1007.eqiad.wmnet with reason: testing provision cookbook
  • 15:30 volans@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on ml-serve1007.eqiad.wmnet with reason: testing provision cookbook
  • 15:29 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2014.codfw.wmnet
  • 15:29 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2013.codfw.wmnet
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T328817)', diff saved to https://phabricator.wikimedia.org/P45063 and previous config saved to /var/cache/conftool/dbconfig/20230306-152801-marostegui.json
  • 15:26 eevans@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe2013.codfw.wmnet
  • 15:26 mforns@deploy2002: Finished deploy [airflow-dags/analytics@2fa7484]: (no justification provided) (duration: 00m 17s)
  • 15:25 mforns@deploy2002: Started deploy [airflow-dags/analytics@2fa7484]: (no justification provided)
  • 15:25 volans@cumin1001: START - Cookbook sre.hosts.provision for host ml-serve1007.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:23 zabe@deploy2002: Finished scap: Backport for Add logo for azwikimedia and vewikimedia (T331177) (duration: 08m 33s)
  • 15:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P45062 and previous config saved to /var/cache/conftool/dbconfig/20230306-152017-marostegui.json
  • 15:17 eevans@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-fe2013.codfw.wmnet
  • 15:16 zabe@deploy2002: zabe: Backport for Add logo for azwikimedia and vewikimedia (T331177) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:14 zabe@deploy2002: Started scap: Backport for Add logo for azwikimedia and vewikimedia (T331177)
  • 15:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T329203)', diff saved to https://phabricator.wikimedia.org/P45061 and previous config saved to /var/cache/conftool/dbconfig/20230306-150956-marostegui.json
  • 15:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 15:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 15:08 Lucas_WMDE: UTC afternoon backport+config window done
  • 15:06 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 15:06 lucaswerkmeister-wmde@deploy2002: helmfile [codfw] START helmfile.d/services/termbox: apply
  • 15:05 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 15:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T329260)', diff saved to https://phabricator.wikimedia.org/P45060 and previous config saved to /var/cache/conftool/dbconfig/20230306-150510-marostegui.json
  • 15:04 lucaswerkmeister-wmde@deploy2002: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 15:02 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 15:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T328817)', diff saved to https://phabricator.wikimedia.org/P45059 and previous config saved to /var/cache/conftool/dbconfig/20230306-150115-marostegui.json
  • 15:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328817)', diff saved to https://phabricator.wikimedia.org/P45058 and previous config saved to /var/cache/conftool/dbconfig/20230306-150054-marostegui.json
  • 14:59 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T329260)', diff saved to https://phabricator.wikimedia.org/P45057 and previous config saved to /var/cache/conftool/dbconfig/20230306-145945-marostegui.json
  • 14:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 14:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T329260)', diff saved to https://phabricator.wikimedia.org/P45056 and previous config saved to /var/cache/conftool/dbconfig/20230306-145924-marostegui.json
  • 14:57 herron: failing grafana over to codfw T329073
  • 14:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T329203)', diff saved to https://phabricator.wikimedia.org/P45055 and previous config saved to /var/cache/conftool/dbconfig/20230306-145052-marostegui.json
  • 14:50 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 14:49 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45054 and previous config saved to /var/cache/conftool/dbconfig/20230306-144547-marostegui.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P45053 and previous config saved to /var/cache/conftool/dbconfig/20230306-144417-marostegui.json
  • 14:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P45051 and previous config saved to /var/cache/conftool/dbconfig/20230306-143546-marostegui.json
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P45050 and previous config saved to /var/cache/conftool/dbconfig/20230306-143041-marostegui.json
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P45049 and previous config saved to /var/cache/conftool/dbconfig/20230306-142910-marostegui.json
  • 14:25 lucaswerkmeister-wmde@deploy2002: helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 14:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P45048 and previous config saved to /var/cache/conftool/dbconfig/20230306-142039-marostegui.json
  • 14:16 sukhe: running authdns-update for CR 894652
  • 14:15 lucaswerkmeister-wmde@deploy2002: helmfile [staging] START helmfile.d/services/termbox: apply
  • 14:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T328817)', diff saved to https://phabricator.wikimedia.org/P45047 and previous config saved to /var/cache/conftool/dbconfig/20230306-141534-marostegui.json
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T329260)', diff saved to https://phabricator.wikimedia.org/P45046 and previous config saved to /var/cache/conftool/dbconfig/20230306-141404-marostegui.json
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T329203)', diff saved to https://phabricator.wikimedia.org/P45045 and previous config saved to /var/cache/conftool/dbconfig/20230306-140533-marostegui.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T329260)', diff saved to https://phabricator.wikimedia.org/P45044 and previous config saved to /var/cache/conftool/dbconfig/20230306-140339-marostegui.json
  • 14:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 14:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T329260)', diff saved to https://phabricator.wikimedia.org/P45043 and previous config saved to /var/cache/conftool/dbconfig/20230306-140317-marostegui.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T328817)', diff saved to https://phabricator.wikimedia.org/P45042 and previous config saved to /var/cache/conftool/dbconfig/20230306-134820-marostegui.json
  • 13:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P45041 and previous config saved to /var/cache/conftool/dbconfig/20230306-134811-marostegui.json
  • 13:48 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:47 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 13:40 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe1001.eqiad.wmnet,service=thanos-web
  • 13:40 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
  • 13:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T328817)', diff saved to https://phabricator.wikimedia.org/P45040 and previous config saved to /var/cache/conftool/dbconfig/20230306-133451-marostegui.json
  • 13:34 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-restbase (exit_code=0) rolling restart_daemons on A:restbase-canary
  • 13:34 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-restbase rolling restart_daemons on A:restbase-canary
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P45039 and previous config saved to /var/cache/conftool/dbconfig/20230306-133304-marostegui.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P45038 and previous config saved to /var/cache/conftool/dbconfig/20230306-131945-marostegui.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T329260)', diff saved to https://phabricator.wikimedia.org/P45037 and previous config saved to /var/cache/conftool/dbconfig/20230306-131758-marostegui.json
  • 13:15 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T329260)', diff saved to https://phabricator.wikimedia.org/P45036 and previous config saved to /var/cache/conftool/dbconfig/20230306-131545-marostegui.json
  • 13:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 13:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T329260)', diff saved to https://phabricator.wikimedia.org/P45035 and previous config saved to /var/cache/conftool/dbconfig/20230306-131214-marostegui.json
  • 13:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T329203)', diff saved to https://phabricator.wikimedia.org/P45034 and previous config saved to /var/cache/conftool/dbconfig/20230306-130933-marostegui.json
  • 13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 13:09 moritzm: rearmed keyholder on deploy1002 following reboot
  • 13:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T329203)', diff saved to https://phabricator.wikimedia.org/P45033 and previous config saved to /var/cache/conftool/dbconfig/20230306-130854-marostegui.json
  • 13:08 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1002.eqiad.wmnet with OS bullseye
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P45032 and previous config saved to /var/cache/conftool/dbconfig/20230306-130438-marostegui.json
  • 12:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P45031 and previous config saved to /var/cache/conftool/dbconfig/20230306-125707-marostegui.json
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P45030 and previous config saved to /var/cache/conftool/dbconfig/20230306-125348-marostegui.json
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T328817)', diff saved to https://phabricator.wikimedia.org/P45029 and previous config saved to /var/cache/conftool/dbconfig/20230306-124932-marostegui.json
  • 12:48 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1002.eqiad.wmnet with reason: host reimage
  • 12:46 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1002.eqiad.wmnet with reason: host reimage
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T328817)', diff saved to https://phabricator.wikimedia.org/P45028 and previous config saved to /var/cache/conftool/dbconfig/20230306-124341-marostegui.json
  • 12:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 12:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T328817)', diff saved to https://phabricator.wikimedia.org/P45027 and previous config saved to /var/cache/conftool/dbconfig/20230306-124308-marostegui.json
  • 12:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P45026 and previous config saved to /var/cache/conftool/dbconfig/20230306-124200-marostegui.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P45025 and previous config saved to /var/cache/conftool/dbconfig/20230306-123841-marostegui.json
  • 12:32 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-conf1002.eqiad.wmnet with OS bullseye
  • 12:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P45024 and previous config saved to /var/cache/conftool/dbconfig/20230306-122801-marostegui.json
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T329260)', diff saved to https://phabricator.wikimedia.org/P45023 and previous config saved to /var/cache/conftool/dbconfig/20230306-122654-marostegui.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T329260)', diff saved to https://phabricator.wikimedia.org/P45022 and previous config saved to /var/cache/conftool/dbconfig/20230306-122546-marostegui.json
  • 12:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 12:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T329260)', diff saved to https://phabricator.wikimedia.org/P45021 and previous config saved to /var/cache/conftool/dbconfig/20230306-122524-marostegui.json
  • 12:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T329203)', diff saved to https://phabricator.wikimedia.org/P45020 and previous config saved to /var/cache/conftool/dbconfig/20230306-122334-marostegui.json
  • 12:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P45019 and previous config saved to /var/cache/conftool/dbconfig/20230306-121255-marostegui.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P45018 and previous config saved to /var/cache/conftool/dbconfig/20230306-121018-marostegui.json
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1126 (T329203)', diff saved to https://phabricator.wikimedia.org/P45017 and previous config saved to /var/cache/conftool/dbconfig/20230306-120328-marostegui.json
  • 12:03 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 12:03 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1126.eqiad.wmnet with reason: Maintenance
  • 11:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T328817)', diff saved to https://phabricator.wikimedia.org/P45016 and previous config saved to /var/cache/conftool/dbconfig/20230306-115748-marostegui.json
  • 11:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P45015 and previous config saved to /var/cache/conftool/dbconfig/20230306-115511-marostegui.json
  • 11:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T328817)', diff saved to https://phabricator.wikimedia.org/P45014 and previous config saved to /var/cache/conftool/dbconfig/20230306-115201-marostegui.json
  • 11:51 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45013 and previous config saved to /var/cache/conftool/dbconfig/20230306-115140-marostegui.json
  • 11:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 11:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T329203)', diff saved to https://phabricator.wikimedia.org/P45012 and previous config saved to /var/cache/conftool/dbconfig/20230306-114354-marostegui.json
  • 11:42 vgutierrez: enable ESI testing in cp4044 - T308799
  • 11:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T329260)', diff saved to https://phabricator.wikimedia.org/P45011 and previous config saved to /var/cache/conftool/dbconfig/20230306-114004-marostegui.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T329260)', diff saved to https://phabricator.wikimedia.org/P45010 and previous config saved to /var/cache/conftool/dbconfig/20230306-113856-marostegui.json
  • 11:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T329260)', diff saved to https://phabricator.wikimedia.org/P45009 and previous config saved to /var/cache/conftool/dbconfig/20230306-113835-marostegui.json
  • 11:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P45008 and previous config saved to /var/cache/conftool/dbconfig/20230306-113633-marostegui.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P45007 and previous config saved to /var/cache/conftool/dbconfig/20230306-112847-marostegui.json
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P45006 and previous config saved to /var/cache/conftool/dbconfig/20230306-112328-marostegui.json
  • 11:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P45005 and previous config saved to /var/cache/conftool/dbconfig/20230306-112126-marostegui.json
  • 11:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1004.eqiad.wmnet
  • 11:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host poolcounter1004.eqiad.wmnet
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P45003 and previous config saved to /var/cache/conftool/dbconfig/20230306-111340-marostegui.json
  • 11:09 jbond: enable puppet fleet wide to post reboot puppetdb
  • 11:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P45002 and previous config saved to /var/cache/conftool/dbconfig/20230306-110822-marostegui.json
  • 11:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45001 and previous config saved to /var/cache/conftool/dbconfig/20230306-110620-marostegui.json
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T328817)', diff saved to https://phabricator.wikimedia.org/P45000 and previous config saved to /var/cache/conftool/dbconfig/20230306-110031-marostegui.json
  • 11:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 11:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 11:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T328817)', diff saved to https://phabricator.wikimedia.org/P44999 and previous config saved to /var/cache/conftool/dbconfig/20230306-110009-marostegui.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T329203)', diff saved to https://phabricator.wikimedia.org/P44998 and previous config saved to /var/cache/conftool/dbconfig/20230306-105834-marostegui.json
  • 10:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T329260)', diff saved to https://phabricator.wikimedia.org/P44997 and previous config saved to /var/cache/conftool/dbconfig/20230306-105315-marostegui.json
  • 10:52 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T329260)', diff saved to https://phabricator.wikimedia.org/P44996 and previous config saved to /var/cache/conftool/dbconfig/20230306-105206-marostegui.json
  • 10:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:51 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 10:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T329260)', diff saved to https://phabricator.wikimedia.org/P44995 and previous config saved to /var/cache/conftool/dbconfig/20230306-105145-marostegui.json
  • 10:49 jbond: disable puppet fleet wide to reboot puppetdb
  • 10:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P44994 and previous config saved to /var/cache/conftool/dbconfig/20230306-104503-marostegui.json
  • 10:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P44993 and previous config saved to /var/cache/conftool/dbconfig/20230306-103639-marostegui.json
  • 10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host poolcounter1005.eqiad.wmnet
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T329203)', diff saved to https://phabricator.wikimedia.org/P44992 and previous config saved to /var/cache/conftool/dbconfig/20230306-103525-marostegui.json
  • 10:35 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 10:35 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 10:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44991 and previous config saved to /var/cache/conftool/dbconfig/20230306-103503-marostegui.json
  • 10:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host poolcounter1005.eqiad.wmnet
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P44990 and previous config saved to /var/cache/conftool/dbconfig/20230306-102956-marostegui.json
  • 10:29 vgutierrez: enable haproxy systemd service unit hardening in cp4045 - T323944
  • 10:29 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-conf1001.eqiad.wmnet with OS bullseye
  • 10:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P44989 and previous config saved to /var/cache/conftool/dbconfig/20230306-102132-marostegui.json
  • 10:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P44988 and previous config saved to /var/cache/conftool/dbconfig/20230306-101957-marostegui.json
  • 10:18 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:17 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 10:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T328817)', diff saved to https://phabricator.wikimedia.org/P44987 and previous config saved to /var/cache/conftool/dbconfig/20230306-101450-marostegui.json
  • 10:12 otto@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 10:12 jayme@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:12 jayme@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T328817)', diff saved to https://phabricator.wikimedia.org/P44986 and previous config saved to /var/cache/conftool/dbconfig/20230306-100901-marostegui.json
  • 10:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44985 and previous config saved to /var/cache/conftool/dbconfig/20230306-100840-marostegui.json
  • 10:08 nfraison@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-conf1001.eqiad.wmnet with reason: host reimage
  • 10:07 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T329260)', diff saved to https://phabricator.wikimedia.org/P44984 and previous config saved to /var/cache/conftool/dbconfig/20230306-100626-marostegui.json
  • 10:05 nfraison@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-conf1001.eqiad.wmnet with reason: host reimage
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P44983 and previous config saved to /var/cache/conftool/dbconfig/20230306-100450-marostegui.json
  • 10:04 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T329260)', diff saved to https://phabricator.wikimedia.org/P44982 and previous config saved to /var/cache/conftool/dbconfig/20230306-100417-marostegui.json
  • 10:04 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 10:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 10:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T329260)', diff saved to https://phabricator.wikimedia.org/P44981 and previous config saved to /var/cache/conftool/dbconfig/20230306-100356-marostegui.json
  • 09:59 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host deploy1002.eqiad.wmnet
  • 09:59 otto@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 09:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P44980 and previous config saved to /var/cache/conftool/dbconfig/20230306-095333-marostegui.json
  • 09:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44979 and previous config saved to /var/cache/conftool/dbconfig/20230306-094944-marostegui.json
  • 09:49 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-conf1001.eqiad.wmnet with OS bullseye
  • 09:49 nfraison@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host an-conf1001.eqiad.wmnet with OS bullseye
  • 09:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P44978 and previous config saved to /var/cache/conftool/dbconfig/20230306-094849-marostegui.json
  • 09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host deploy1002.eqiad.wmnet
  • 09:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P44977 and previous config saved to /var/cache/conftool/dbconfig/20230306-094341-root.json
  • 09:42 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 09:42 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P44976 and previous config saved to /var/cache/conftool/dbconfig/20230306-093827-marostegui.json
  • 09:36 nfraison@cumin1001: START - Cookbook sre.hosts.reimage for host an-conf1001.eqiad.wmnet with OS bullseye
  • 09:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P44975 and previous config saved to /var/cache/conftool/dbconfig/20230306-093343-marostegui.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P44974 and previous config saved to /var/cache/conftool/dbconfig/20230306-092836-root.json
  • 09:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T329203)', diff saved to https://phabricator.wikimedia.org/P44973 and previous config saved to /var/cache/conftool/dbconfig/20230306-092557-marostegui.json
  • 09:25 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 09:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44972 and previous config saved to /var/cache/conftool/dbconfig/20230306-092536-marostegui.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44971 and previous config saved to /var/cache/conftool/dbconfig/20230306-092320-marostegui.json
  • 09:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T329260)', diff saved to https://phabricator.wikimedia.org/P44970 and previous config saved to /var/cache/conftool/dbconfig/20230306-091836-marostegui.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T328817)', diff saved to https://phabricator.wikimedia.org/P44969 and previous config saved to /var/cache/conftool/dbconfig/20230306-091733-marostegui.json
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T329260)', diff saved to https://phabricator.wikimedia.org/P44968 and previous config saved to /var/cache/conftool/dbconfig/20230306-091728-marostegui.json
  • 09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:17 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T329260)', diff saved to https://phabricator.wikimedia.org/P44967 and previous config saved to /var/cache/conftool/dbconfig/20230306-091706-marostegui.json
  • 09:14 dcausse: depooling & restarting blazegraph on wdqs1006 (stuck for 48+ hours)
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P44966 and previous config saved to /var/cache/conftool/dbconfig/20230306-091330-root.json
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P44965 and previous config saved to /var/cache/conftool/dbconfig/20230306-091030-marostegui.json
  • 09:06 hashar@deploy2002: Finished deploy [gerrit/gerrit@b725ff6]: Gerrit to 3.5.5 on gerrit1001 (duration: 00m 12s)
  • 09:06 hashar@deploy2002: Started deploy [gerrit/gerrit@b725ff6]: Gerrit to 3.5.5 on gerrit1001
  • 09:04 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:04 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 09:04 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T328817)', diff saved to https://phabricator.wikimedia.org/P44964 and previous config saved to /var/cache/conftool/dbconfig/20230306-090416-marostegui.json
  • 09:02 vgutierrez: disabling haproxy systemd service unit hardening in ulsfo - T323944
  • 09:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P44963 and previous config saved to /var/cache/conftool/dbconfig/20230306-090200-marostegui.json
  • 09:00 hashar@deploy2002: Finished deploy [gerrit/gerrit@b725ff6]: Gerrit to 3.5.5 on gerrit2002 (duration: 00m 07s)
  • 09:00 hashar@deploy2002: Started deploy [gerrit/gerrit@b725ff6]: Gerrit to 3.5.5 on gerrit2002
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P44962 and previous config saved to /var/cache/conftool/dbconfig/20230306-085825-root.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P44961 and previous config saved to /var/cache/conftool/dbconfig/20230306-085523-marostegui.json
  • 08:49 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P44960 and previous config saved to /var/cache/conftool/dbconfig/20230306-084910-marostegui.json
  • 08:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P44959 and previous config saved to /var/cache/conftool/dbconfig/20230306-084653-marostegui.json
  • 08:43 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P44958 and previous config saved to /var/cache/conftool/dbconfig/20230306-084320-root.json
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44957 and previous config saved to /var/cache/conftool/dbconfig/20230306-084017-marostegui.json
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123', diff saved to https://phabricator.wikimedia.org/P44956 and previous config saved to /var/cache/conftool/dbconfig/20230306-083403-marostegui.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T329260)', diff saved to https://phabricator.wikimedia.org/P44955 and previous config saved to /var/cache/conftool/dbconfig/20230306-083147-marostegui.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T329260)', diff saved to https://phabricator.wikimedia.org/P44954 and previous config saved to /var/cache/conftool/dbconfig/20230306-083038-marostegui.json
  • 08:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:30 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:30 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 08:28 moritzm: rolling restart of Apache on mw* to pick up apr-util security updates
  • 08:28 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P44953 and previous config saved to /var/cache/conftool/dbconfig/20230306-082815-root.json
  • 08:27 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P44952 and previous config saved to /var/cache/conftool/dbconfig/20230306-082645-marostegui.json
  • 08:24 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-eqiad
  • 08:22 kartik@deploy2002: Finished scap: Backport for Content Translation: Adjust the global limit for unedited MT to 95% (T330482) (duration: 19m 12s)
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1123 (T328817)', diff saved to https://phabricator.wikimedia.org/P44951 and previous config saved to /var/cache/conftool/dbconfig/20230306-081857-marostegui.json
  • 08:17 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T329203)', diff saved to https://phabricator.wikimedia.org/P44950 and previous config saved to /var/cache/conftool/dbconfig/20230306-081711-marostegui.json
  • 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P44949 and previous config saved to /var/cache/conftool/dbconfig/20230306-081639-marostegui.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P44948 and previous config saved to /var/cache/conftool/dbconfig/20230306-081310-root.json
  • 08:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1123 (T328817)', diff saved to https://phabricator.wikimedia.org/P44947 and previous config saved to /var/cache/conftool/dbconfig/20230306-081305-marostegui.json
  • 08:12 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 08:12 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1123.eqiad.wmnet with reason: Maintenance
  • 08:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T328817)', diff saved to https://phabricator.wikimedia.org/P44946 and previous config saved to /var/cache/conftool/dbconfig/20230306-081244-marostegui.json
  • 08:12 kartik@deploy2002: kartik: Backport for Content Translation: Adjust the global limit for unedited MT to 95% (T330482) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P44945 and previous config saved to /var/cache/conftool/dbconfig/20230306-081138-marostegui.json
  • 08:02 kartik@deploy2002: Started scap: Backport for Content Translation: Adjust the global limit for unedited MT to 95% (T330482)
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P44944 and previous config saved to /var/cache/conftool/dbconfig/20230306-080132-marostegui.json
  • 08:00 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-eqiad
  • 07:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P44943 and previous config saved to /var/cache/conftool/dbconfig/20230306-075737-marostegui.json
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P44942 and previous config saved to /var/cache/conftool/dbconfig/20230306-075632-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2122', diff saved to https://phabricator.wikimedia.org/P44941 and previous config saved to /var/cache/conftool/dbconfig/20230306-074830-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P44940 and previous config saved to /var/cache/conftool/dbconfig/20230306-074626-marostegui.json
  • 07:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P44939 and previous config saved to /var/cache/conftool/dbconfig/20230306-074231-marostegui.json
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P44938 and previous config saved to /var/cache/conftool/dbconfig/20230306-074125-marostegui.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T329260)', diff saved to https://phabricator.wikimedia.org/P44937 and previous config saved to /var/cache/conftool/dbconfig/20230306-073707-marostegui.json
  • 07:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P44936 and previous config saved to /var/cache/conftool/dbconfig/20230306-073119-marostegui.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T328817)', diff saved to https://phabricator.wikimedia.org/P44935 and previous config saved to /var/cache/conftool/dbconfig/20230306-072724-marostegui.json
  • 07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2094.codfw.wmnet
  • 07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:23 marostegui@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2094.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:22 marostegui@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2094.codfw.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1001"
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T328817)', diff saved to https://phabricator.wikimedia.org/P44934 and previous config saved to /var/cache/conftool/dbconfig/20230306-072132-marostegui.json
  • 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 07:21 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1001: START - Cookbook sre.dns.netbox
  • 07:15 marostegui@cumin1001: START - Cookbook sre.hosts.decommission for hosts db2094.codfw.wmnet
  • 07:08 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T329203)', diff saved to https://phabricator.wikimedia.org/P44933 and previous config saved to /var/cache/conftool/dbconfig/20230306-070814-marostegui.json
  • 07:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 07:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 07:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 06:29 apergos: rsync from dumpsdata1001 in ariel screen session of xmldatadumps/public to dumpsdata1007, no bandwidth cap
  • 06:03 apergos: rsync from dumpsdata1001 in ariel screen session of xmldatadumps/private to dumpsdata1007 (did this for 1006 about an hour ago, forgot to log), no bandwidth cap

2023-03-04

  • 14:56 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names -- T330759 (duration: 02m 17s)
  • 14:53 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names -- T330759
  • 14:44 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names -- T330759 (duration: 08m 56s)
  • 14:35 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: Updating member dashboard to reflect new role names -- T330759
  • 14:32 andrew@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: (no justification provided) (duration: 00m 46s)
  • 14:31 andrew@deploy1002: Started deploy [horizon/deploy@9d02cd6]: (no justification provided)
  • 06:09 apergos: started rsync of xmldatadumps/public from dumpsdata1001 in screen session as ariel on that host, to dumpsdata1006, no bandwidth cap

2023-03-03

  • 20:58 inflatador: bking@cumin2002 persistently unban all elastic nodes in eqiad T322082
  • 20:55 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic1059 - bking@cumin2002 - T322082"
  • 20:52 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1059 - bking@cumin2002 - T322082"
  • 20:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2070.codfw.wmnet with OS bullseye
  • 20:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1059.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 20:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:33 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1059.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 20:30 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:29 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:25 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic1058 - bking@cumin2002 - T322082"
  • 20:24 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1058 - bking@cumin2002 - T322082"
  • 20:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:17 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:13 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1058.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 20:12 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:09 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:05 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1058.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 19:53 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:52 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:51 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic hosts - bking@cumin2002 - T322082"
  • 19:49 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic hosts - bking@cumin2002 - T322082"
  • 19:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1057.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 19:42 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 19:40 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1057.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 19:39 bking@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Update location of elastic1055 - bking@cumin2002 - T322082"
  • 19:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1055 - bking@cumin2002 - T322082"
  • 19:36 bking@cumin2002: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "Update location of elastic1055 - bking@cumin2002 - T322082"
  • 19:32 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1055 - bking@cumin2002 - T322082"
  • 19:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
  • 19:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
  • 19:11 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1055.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 19:02 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1055.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 18:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2070.codfw.wmnet with OS bullseye
  • 18:43 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic1056 - bking@cumin2002 - T322082"
  • 18:42 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1056 - bking@cumin2002 - T322082"
  • 18:40 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-be2070.codfw.wmnet with OS bullseye
  • 18:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:28 cmjohnson@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloucephosd - cmjohnson@cumin1001"
  • 18:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1056.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 18:17 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1056.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 18:16 cmjohnson@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloucephosd - cmjohnson@cumin1001"
  • 18:13 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:47 krinkle@deploy2002: Synchronized wmf-config/mc.php: Ic55725: Prepare mc.php for next week train (duration: 07m 39s)
  • 17:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Update location of elastic1054 - bking@cumin2002 - T322082"
  • 17:35 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Update location of elastic1054 - bking@cumin2002 - T322082"
  • 17:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
  • 17:31 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2070.codfw.wmnet with reason: host reimage
  • 17:30 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on releases2002.codfw.wmnet with reason: debugging
  • 17:29 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on releases2002.codfw.wmnet with reason: debugging
  • 17:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1054.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 17:01 inflatador: bking@cumin2002 ban elastic1059-1066 T322082
  • 16:56 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1054.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 16:46 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1061.eqiad.wmnet']
  • 16:45 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
  • 16:44 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1059.eqiad.wmnet']
  • 16:43 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1058.eqiad.wmnet']
  • 16:39 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1061.eqiad.wmnet']
  • 16:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
  • 16:38 bking@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
  • 16:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1060.eqiad.wmnet']
  • 16:37 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1059.eqiad.wmnet']
  • 16:36 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1058.eqiad.wmnet']
  • 16:23 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2070.codfw.wmnet with OS bullseye
  • 16:10 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "update location of elastic1053 - bking@cumin2002 - T322082"
  • 16:09 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "update location of elastic1053 - bking@cumin2002 - T322082"
  • 15:53 mforns@deploy2002: Finished deploy [airflow-dags/analytics@ad17aa9]: (no justification provided) (duration: 00m 22s)
  • 15:53 mforns@deploy2002: Started deploy [airflow-dags/analytics@ad17aa9]: (no justification provided)
  • 15:47 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:46 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1055.eqiad.wmnet']
  • 15:45 bking@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1053.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:43 xcollazo@deploy2002: Finished deploy [airflow-dags/platform_eng@8d9af3e]: Deploying latest image_suggestions DAG on platform_eng Airflow instance (duration: 00m 21s)
  • 15:42 xcollazo@deploy2002: Started deploy [airflow-dags/platform_eng@8d9af3e]: Deploying latest image_suggestions DAG on platform_eng Airflow instance
  • 15:39 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:39 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:38 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1055.eqiad.wmnet']
  • 15:36 bking@cumin2002: START - Cookbook sre.hosts.provision for host elastic1053.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:33 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:33 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:32 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:32 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1055.eqiad.wmnet']
  • 15:28 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
  • 15:28 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 15:27 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
  • 15:27 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
  • 15:27 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 15:27 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 15:26 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
  • 15:26 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
  • 15:26 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 15:25 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1057.eqiad.wmnet']
  • 15:25 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1056.eqiad.wmnet']
  • 15:24 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 15:24 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1055.eqiad.wmnet']
  • 15:23 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
  • 15:21 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1053.eqiad.wmnet']
  • 15:12 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1053.eqiad.wmnet']
  • 15:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host urldownloader1004.wikimedia.org
  • 15:11 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['elastic1053.eqiad.wmnet']
  • 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) urldownloader1004.wikimedia.org on all recursors
  • 15:02 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache urldownloader1004.wikimedia.org on all recursors
  • 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader1004.wikimedia.org - jmm@cumin2002"
  • 14:59 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1053.eqiad.wmnet']
  • 14:58 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader1004.wikimedia.org - jmm@cumin2002"
  • 14:56 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:56 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host urldownloader1004.wikimedia.org
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host urldownloader1003.wikimedia.org
  • 14:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) urldownloader1003.wikimedia.org on all recursors
  • 14:27 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache urldownloader1003.wikimedia.org on all recursors
  • 14:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader1003.wikimedia.org - jmm@cumin2002"
  • 14:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 14 hosts with reason: rerack
  • 14:26 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 14 hosts with reason: rerack
  • 14:24 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:16 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM urldownloader1003.wikimedia.org - jmm@cumin2002"
  • 14:10 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 14:10 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host urldownloader1003.wikimedia.org
  • 14:09 inflatador: bking@cumin2002 banning elastic1053-59 from the cluster in preparation for T322082
  • 14:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:51 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:16 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 20485
  • 13:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20485
  • 13:15 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 20485
  • 13:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 20485
  • 12:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:29 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 11:13 moritzm: imported PHP 7.4 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2+icu67u1 to component/icu67 (build of PHP against co-installable ICU67) T329491
  • 10:39 vgutierrez: restart ntp.service in dns2001
  • 10:30 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 10:25 moritzm: installing 5.10.162 kernels on buster systems running Linux 5.10
  • 10:12 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jonas Kress (WMDE) out of all services on: 1119 hosts
  • 10:12 root@cumin2002: START - Cookbook sre.idm.logout Logging Jonas Kress (WMDE) out of all services on: 1119 hosts
  • 09:56 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tobias Andersson out of all services on: 1119 hosts
  • 09:55 root@cumin2002: START - Cookbook sre.idm.logout Logging Tobias Andersson out of all services on: 1119 hosts
  • 09:54 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tobias Andersson out of all services on: 909 hosts
  • 09:54 root@cumin2002: START - Cookbook sre.idm.logout Logging Tobias Andersson out of all services on: 909 hosts
  • 09:45 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Install software version upgrade
  • 09:45 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 09:27 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 09:10 elukey@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:10 elukey@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:07 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:01 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Install software version upgrade
  • 08:54 elukey: restart pybal on lvs2010 (standby) and then on lvs2009 (active) to pick up monitoring change (https://gerrit.wikimedia.org/r/c/operations/puppet/+/893008)
  • 08:48 elukey: restart pybal on lvs1020 (standby) and then on lvs1019 (active) to pick up monitoring change (https://gerrit.wikimedia.org/r/c/operations/puppet/+/893008)
  • 08:45 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Install software version upgrade
  • 08:36 vgutierrez: restarting ntp in dns1001
  • 07:29 elukey: truncate /var/log/auth.log.1 on krb1001 to free space (root partition almost filled up)
  • 01:12 mutante: releases1002: deleting /usr/local/sbin/sync-srv-org-wikimedia-reprepro-releases1002.eqiad.wmnet which confusingly contains an rsync command to rsync from releases1001 which does not exist anymore T330960
  • 00:13 mutante: switching releases.wikimedia.org from eqiad to codfw - T330960

2023-03-02

  • 23:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs[2001-2003].codfw.wmnet
  • 23:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
  • 22:45 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
  • 22:37 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 22:11 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts wdqs[2001-2003].codfw.wmnet
  • 21:22 TheresNoTime: close UTC late backport and config training
  • 21:10 samtar@deploy2002: Finished scap: Backport for [itwiki] Assign 'changetags' flag only to sysop/bot/botadmin (T331051) (duration: 08m 03s)
  • 21:04 samtar@deploy2002: superpes and samtar: Backport for [itwiki] Assign 'changetags' flag only to sysop/bot/botadmin (T331051) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 21:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2001.wikimedia.org with OS bullseye
  • 21:02 samtar@deploy2002: Started scap: Backport for [itwiki] Assign 'changetags' flag only to sysop/bot/botadmin (T331051)
  • 21:02 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 21:01 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:52 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1001.wikimedia.org with OS bullseye
  • 20:43 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2001.wikimedia.org with reason: host reimage
  • 20:40 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe2004.codfw.wmnet with reason: host reimage
  • 20:39 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2001.wikimedia.org with reason: host reimage
  • 20:37 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe2004.codfw.wmnet with reason: host reimage
  • 20:23 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2001.wikimedia.org with OS bullseye
  • 20:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1001.wikimedia.org with reason: host reimage
  • 20:20 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1001.wikimedia.org with reason: host reimage
  • 20:12 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-fe2004.codfw.wmnet with OS bullseye
  • 20:08 brett@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:07 brett@cumin2002: START - Cookbook sre.dns.netbox
  • 20:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns1001.wikimedia.org with OS bullseye
  • 19:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2014.codfw.wmnet with OS bullseye
  • 19:59 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:31 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2014.codfw.wmnet with reason: host reimage
  • 19:27 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2014.codfw.wmnet with reason: host reimage
  • 19:05 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2014.codfw.wmnet with OS bullseye
  • 18:10 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:10 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:10 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:09 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:09 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:08 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:09 oblivian@deploy2002: Finished scap: Backport for filebackend: hotfix - make swift master follow the mediawiki master (T330942) (duration: 09m 16s)
  • 17:01 oblivian@deploy2002: oblivian: Backport for filebackend: hotfix - make swift master follow the mediawiki master (T330942) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 16:59 oblivian@deploy2002: Started scap: Backport for filebackend: hotfix - make swift master follow the mediawiki master (T330942)
  • 15:59 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:59 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix DNS typo in record for cr2-eqiad gr-3/3/0.2 - cmooney@cumin1001"
  • 15:58 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix DNS typo in record for cr2-eqiad gr-3/3/0.2 - cmooney@cumin1001"
  • 15:55 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:41 jynus: restart db2099 T330218
  • 14:32 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:29 lucaswerkmeister-wmde@deploy2002: Finished scap: Backport for Remove unused Wikibase config variables (T330410) (duration: 08m 41s)
  • 14:23 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Remove unused Wikibase config variables (T330410) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:21 lucaswerkmeister-wmde@deploy2002: Started scap: Backport for Remove unused Wikibase config variables (T330410)
  • 13:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:51 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:49 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:48 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1010.eqiad.wmnet with OS bullseye
  • 13:48 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
  • 13:47 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 13:47 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 13:46 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 13:46 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 13:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:42 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 13:40 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 11:48 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:48 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:47 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:47 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:46 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:46 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:42 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:42 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:13 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:11 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:00 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 10:42 claime: Running authdns-update for 893675
  • 10:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 10:21 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 10:16 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@9568478]: Re-Deploy Airflow upgrade branch for analytics_test (duration: 00m 12s)
  • 10:16 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@9568478]: Re-Deploy Airflow upgrade branch for analytics_test
  • 10:08 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 10:05 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dcaro@cumin1001"
  • 10:03 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ml-serve1006.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 09:50 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1008.eqiad.wmnet with reason: host reimage
  • 09:48 dcaro@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1010.eqiad.wmnet with reason: host reimage
  • 09:47 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1008.eqiad.wmnet with reason: host reimage
  • 09:44 dcaro@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1010.eqiad.wmnet with reason: host reimage
  • 09:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1007.eqiad.wmnet with reason: host reimage
  • 09:35 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1007.eqiad.wmnet with reason: host reimage
  • 09:28 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1010.eqiad.wmnet with OS bullseye
  • 09:20 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 09:14 jnuche@deploy2002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.25 refs T325588
  • 09:13 dcaro@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1010']
  • 09:10 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 09:06 dcaro@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1010']
  • 09:04 root@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1010']
  • 08:58 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1010']
  • 08:58 dcaro@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1010
  • 08:57 dcaro@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1010
  • 08:57 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:57 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moved cloudcephosd1010 to rack F4 - dcaro@cumin1001"
  • 08:46 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moved cloudcephosd1010 to rack F4 - dcaro@cumin1001"
  • 08:39 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 08:38 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 08:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.cf
  • 08:34 marostegui: Stop MySQL on db2093 T330827
  • 08:19 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 08:18 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 08:15 apergos: started rsync of xmldatadumps/public from dumpsdata1001 in screen session as ariel on that host, to dumpsdata1005, no bandwidth cap
  • 08:08 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1006.eqiad.wmnet with reason: host reimage
  • 08:05 elukey@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1006.eqiad.wmnet with reason: host reimage
  • 07:53 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 07:48 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:48 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:48 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:47 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:38 apergos: started rsync of xmldatadumps/private from dumpsdata1001 in screen session as ariel on that host, to dumpsdata1005, no bandwidth cap
  • 07:38 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:36 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 07:17 marostegui: Stop MySQL on db2095 T330975
  • 01:23 mutante: doc2001 - stopping apache2 to test alerting - active server is doc1002 but should be switched T327973 T330963
  • 01:08 mutante: releases2002 - stopping apache2 to test alerting (active server is 1002 but should be switched) T327975 T330960
  • 00:28 mutante: planet1002 - stopping apache2 to test alerting (active host is codfw)

2023-03-01

  • 23:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1002.wikimedia.org with OS bullseye
  • 23:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1002.wikimedia.org with reason: host reimage
  • 22:56 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1002.wikimedia.org with reason: host reimage
  • 22:52 mutante: apt1001 - systemctl reset-failed T328907
  • 22:45 mforns@deploy2002: Finished deploy [airflow-dags/analytics@1fb5c4a]: (no justification provided) (duration: 00m 23s)
  • 22:45 mforns@deploy2002: Started deploy [airflow-dags/analytics@1fb5c4a]: (no justification provided)
  • 22:42 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns1002.wikimedia.org with OS bullseye
  • 22:42 mforns@deploy2002: Finished deploy [airflow-dags/analytics@51e92b1]: (no justification provided) (duration: 00m 21s)
  • 22:42 mforns@deploy2002: Started deploy [airflow-dags/analytics@51e92b1]: (no justification provided)
  • 21:41 mforns@deploy2002: Finished deploy [analytics/refinery@d4d723a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d4d723a] (duration: 01m 22s)
  • 21:39 mforns@deploy2002: Started deploy [analytics/refinery@d4d723a] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d4d723a]
  • 21:39 mforns@deploy2002: Finished deploy [analytics/refinery@d4d723a] (thin): Regular analytics weekly train THIN [analytics/refinery@d4d723a] (duration: 00m 07s)
  • 21:39 mforns@deploy2002: Started deploy [analytics/refinery@d4d723a] (thin): Regular analytics weekly train THIN [analytics/refinery@d4d723a]
  • 21:38 mforns@deploy2002: Finished deploy [analytics/refinery@d4d723a]: Regular analytics weekly train [analytics/refinery@d4d723a] (duration: 10m 55s)
  • 21:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns2002.wikimedia.org with OS bullseye
  • 21:27 mforns@deploy2002: Started deploy [analytics/refinery@d4d723a]: Regular analytics weekly train [analytics/refinery@d4d723a]
  • 21:23 TheresNoTime: closing UTC late backport window
  • 21:18 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns2002.wikimedia.org with reason: host reimage
  • 21:16 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns2002.wikimedia.org with reason: host reimage
  • 21:11 samtar@deploy2002: Finished scap: Backport for [trwiki] Reverting logo change for Vector 2022 and Vector legacy (T329047) (duration: 09m 30s)
  • 21:04 samtar@deploy2002: superpes and samtar: Backport for [trwiki] Reverting logo change for Vector 2022 and Vector legacy (T329047) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:02 samtar@deploy2002: Started scap: Backport for [trwiki] Reverting logo change for Vector 2022 and Vector legacy (T329047)
  • 21:02 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns2002.wikimedia.org with OS bullseye
  • 20:43 zabe: move rev_comment_id migration screens from mwmaint1002 to mwmaint2002 # T275246
  • 19:47 brett: re-adding dns3001 to next-hop routing via juniper - T321309
  • 19:36 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns3001.wikimedia.org with OS bullseye
  • 19:12 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns3001.wikimedia.org with reason: host reimage
  • 19:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns3001.wikimedia.org with reason: host reimage
  • 18:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host dns3001.wikimedia.org with OS bullseye
  • 18:12 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1005.eqiad.wmnet with OS bullseye
  • 18:12 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 18:01 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cmooney@cumin1001"
  • 18:01 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1003.eqiad.wmnet with OS buster
  • 17:44 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1005.eqiad.wmnet with reason: host reimage
  • 17:41 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1005.eqiad.wmnet with reason: host reimage
  • 17:36 aqu@deploy2002: Finished deploy [airflow-dags/analytics_test@9568478]: Deploy Airflow upgrade branch for analytics_test (duration: 00m 05s)
  • 17:36 aqu@deploy2002: Started deploy [airflow-dags/analytics_test@9568478]: Deploy Airflow upgrade branch for analytics_test
  • 17:26 root@cumin1001: END (PASS) - Cookbook sre.k8s.upgrade-cluster (exit_code=0) Upgrade K8s version: Upgrade to k8s 1.23
  • 17:24 cmooney@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1005.eqiad.wmnet with OS bullseye
  • 17:24 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 17:06 dcaro@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']
  • 17:05 dcaro@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
  • 16:56 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1005.eqiad.wmnet with OS bullseye
  • 16:28 brett: Remove dns3001 DNS request routing via juniper - T321309
  • 16:28 XioNoX: rollback port 80 block in esams - T330683
  • 16:26 taavi@deploy2002: Finished scap: Backport for Set OATHAuthMultipleDevicesMigrationStage to MIGRATION_OLD (T242031) (duration: 08m 23s)
  • 16:21 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:20 hnowlan@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:20 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:20 taavi@deploy2002: taavi: Backport for Set OATHAuthMultipleDevicesMigrationStage to MIGRATION_OLD (T242031) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 16:19 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:19 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:18 taavi@deploy2002: Started scap: Backport for Set OATHAuthMultipleDevicesMigrationStage to MIGRATION_OLD (T242031)
  • 16:17 hnowlan@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:17 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:17 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:15 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:15 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 16:12 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 16:05 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 16:02 bblack: cr[23]-esams: manually adding brett's ssh-rsa to match https://gerrit.wikimedia.org/r/c/operations/homer/public/+/892551
  • 16:01 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-codfw
  • 16:00 dcaro@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcephosd1005.eqiad.wmnet with OS bullseye
  • 15:57 dcaro@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']
  • 15:57 dcaro@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
  • 15:44 root@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']
  • 15:39 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 15:39 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on vrts2001.codfw.wmnet with reason: installation failed due to read-only database
  • 15:35 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
  • 15:32 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-codfw
  • 15:28 root@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1005']
  • 15:22 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
  • 15:20 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:elastic-canary
  • 15:18 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:elastic-canary
  • 15:12 hnowlan@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:11 root@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudcephosd1005']
  • 15:09 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 15:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 15:06 hashar: Restarting Apache on Gerrit host
  • 15:04 root@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1005']
  • 15:02 hnowlan@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:57 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-eqiad
  • 14:52 dcaro@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1005
  • 14:45 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-eqiad
  • 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-codfw
  • 14:45 dcaro@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1005
  • 14:34 filippo@cumin1001: conftool action : set/pooled=no; selector: name=thanos-fe2002.codfw.wmnet,service=thanos-web
  • 14:33 elukey@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 14:32 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-codfw
  • 14:30 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-canary
  • 14:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1008.eqiad.wmnet with OS bullseye
  • 14:30 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1006.eqiad.wmnet with OS bullseye
  • 14:29 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-canary
  • 14:27 taavi: re-start persistRevisionThreadItems.php on itwiki from P44912 after DC switchover T315510
  • 14:27 claime: End mediawiki datacenter switchover - T327920
  • 14:26 cgoubert@deploy2002: Finished scap: Backport for debug.json: List primary DC servers first (T327920) (duration: 07m 54s)
  • 14:20 cgoubert@deploy2002: cgoubert: Backport for debug.json: List primary DC servers first (T327920) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 14:18 cgoubert@deploy2002: Started scap: Backport for debug.json: List primary DC servers first (T327920)
  • 14:16 claime: Removing scap lock - T327920
  • 14:15 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0)
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'Reduce db2122 weight', diff saved to https://phabricator.wikimedia.org/P44913 and previous config saved to /var/cache/conftool/dbconfig/20230301-141414-marostegui.json
  • 14:10 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters
  • 14:09 claime: Phase 9.5 DNS records for new database masters updated - T327920
  • 14:08 claime: Phase 9.5 Update DNS records for new database masters - T327920
  • 14:07 taavi: test
  • 14:06 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0)
  • 14:05 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl
  • 14:05 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0)
  • 14:03 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance
  • 14:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners (exit_code=0)
  • 14:02 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.08-restart-envoy-on-jobrunners
  • 14:02 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0)
  • 14:02 cgoubert@cumin1001: MediaWiki read-only period ends at: 2023-03-01 14:02:09.272468
  • 14:00 cgoubert@cumin1001: MediaWiki read-only period starts at: 2023-03-01 14:00:10.075167
  • 14:00 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.02-set-readonly
  • 13:56 elukey@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 13:52 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 13:52 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0)
  • 13:51 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1007.eqiad.wmnet with OS bullseye
  • 13:51 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:49 cgoubert@cumin1001: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99)
  • 13:49 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance
  • 13:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0)
  • 13:41 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl
  • 13:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0)
  • 13:41 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks
  • 13:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0)
  • 13:41 cgoubert@cumin1001: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet
  • 13:41 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:41 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moved cloudcephosd1015 to rack F4 - dcaro@cumin1001"
  • 13:40 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moved cloudcephosd1015 to rack F4 - dcaro@cumin1001"
  • 13:40 claime: Starting mediawiki datacenter switchover step 0 - T327920
  • 13:37 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 13:31 claime: Locking scap deployments for datacenter switchover - T327920
  • 13:30 krinkle@deploy2002: Synchronized wmf-config/: I3beefb filebackend cleanup (duration: 07m 13s)
  • 13:19 krinkle@deploy2002: Synchronized wmf-config/: Ie063fb - Remove config for former Rdbms logging (duration: 07m 39s)
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
  • 13:17 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
  • 13:11 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
  • 13:10 claime: Adding scheduled maintenance for switchover to statuspage - T327920
  • 13:09 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
  • 12:40 marostegui: Upgrade db2183 to 10.6 T330861
  • 12:28 moritzm: upgrade mwmaint to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
  • 11:58 moritzm: upgrade parse/eqiad to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
  • 11:09 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:08 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:07 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:07 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:07 dcaro@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudcephosd1010.eqiad.wmnet
  • 11:07 dcaro@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:07 dcaro@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 11:03 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:03 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:03 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:02 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:02 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:02 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:01 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:01 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:01 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 11:01 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 11:00 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:59 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:58 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:58 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 10:57 moritzm: upgrade cloudweb to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
  • 10:56 dcaro@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - dcaro@cumin1001"
  • 10:37 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1003.eqiad.wmnet with OS bullseye
  • 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1002.eqiad.wmnet with OS bullseye
  • 10:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1005.eqiad.wmnet with reason: host reimage
  • 10:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1004.eqiad.wmnet with OS bullseye
  • 10:32 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1005.eqiad.wmnet with reason: host reimage
  • 10:30 root@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1001.eqiad.wmnet with OS bullseye
  • 10:25 dcaro@cumin1001: START - Cookbook sre.dns.netbox
  • 10:19 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
  • 10:17 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
  • 10:16 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
  • 10:14 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1004.eqiad.wmnet with reason: host reimage
  • 10:14 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
  • 10:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1003.eqiad.wmnet with reason: host reimage
  • 10:13 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1002.eqiad.wmnet with reason: host reimage
  • 10:11 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1001.eqiad.wmnet with reason: host reimage
  • 10:03 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1005.eqiad.wmnet with OS bullseye
  • 10:02 dcaro@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1010.eqiad.wmnet
  • 09:59 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1004.eqiad.wmnet with OS bullseye
  • 09:59 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1003.eqiad.wmnet with OS bullseye
  • 09:58 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1002.eqiad.wmnet with OS bullseye
  • 09:57 marostegui: Stop db1117:3325 and db1176 T329478
  • 09:57 root@cumin1001: START - Cookbook sre.hosts.reimage for host ml-serve1001.eqiad.wmnet with OS bullseye
  • 09:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8309
  • 09:47 root@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-serve-ctrl1002.eqiad.wmnet with OS bullseye
  • 09:41 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8309
  • 09:39 oblivian@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=eqiad
  • 09:38 moritzm: installing tiff security updates
  • 09:31 oblivian@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=appservers-ro
  • 09:31 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: host reimage
  • 09:30 jnuche@deploy2002: Synchronized php: group1 wikis to 1.40.0-wmf.25 refs T325588 (duration: 07m 48s)
  • 09:26 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl1002.eqiad.wmnet with reason: host reimage
  • 09:23 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.25 refs T325588
  • 09:15 root@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-serve-ctrl1002.eqiad.wmnet with OS bullseye
  • 09:15 root@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-serve-ctrl1001.eqiad.wmnet with OS bullseye
  • 08:58 root@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: host reimage
  • 08:56 root@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl1001.eqiad.wmnet with reason: host reimage
  • 08:51 moritzm: upgrade mw/eqiad to PHP 1:7.4.33-1+0~20221108.73+debian10~1.gbpa00350a+wmf10u2 T330270
  • 08:45 root@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-serve-ctrl1001.eqiad.wmnet with OS bullseye
  • 08:42 root@cumin1001: START - Cookbook sre.k8s.upgrade-cluster Upgrade K8s version: Upgrade to k8s 1.23
  • 08:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-etcd1003.eqiad.wmnet with OS bullseye
  • 08:41 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-etcd1002.eqiad.wmnet with OS bullseye
  • 08:40 elukey@cumin1001: END (PASS) - Cookbook sre.ganeti.reimage (exit_code=0) for host ml-etcd1001.eqiad.wmnet with OS bullseye
  • 08:37 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Emil Chetty out of all services on: 918 hosts
  • 08:36 root@cumin2002: START - Cookbook sre.idm.logout Logging Emil Chetty out of all services on: 918 hosts
  • 08:35 root@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Emil Chetty out of all services on: 1110 hosts
  • 08:34 root@cumin2002: START - Cookbook sre.idm.logout Logging Emil Chetty out of all services on: 1110 hosts
  • 08:28 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd1001.eqiad.wmnet with reason: host reimage
  • 08:26 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd1003.eqiad.wmnet with reason: host reimage
  • 08:26 jynus: stopping db2184 for testing mariadb 10.6 recovery workflow T319383
  • 08:24 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd1002.eqiad.wmnet with reason: host reimage
  • 08:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd1001.eqiad.wmnet with reason: host reimage
  • 08:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd1003.eqiad.wmnet with reason: host reimage
  • 08:21 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd1002.eqiad.wmnet with reason: host reimage
  • 08:15 jynus@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2184.codfw.wmnet with reason: 10.6 recovery
  • 08:14 jynus@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2184.codfw.wmnet with reason: 10.6 recovery
  • 08:11 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd1001.eqiad.wmnet with OS bullseye
  • 08:11 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd1002.eqiad.wmnet with OS bullseye
  • 08:11 elukey@cumin1001: START - Cookbook sre.ganeti.reimage for host ml-etcd1003.eqiad.wmnet with OS bullseye
  • 08:10 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on 13 hosts with reason: T330758
  • 08:10 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on 13 hosts with reason: T330758
  • 06:14 marostegui: Stop MySQL on db2094 T330828
  • 05:37 marostegui: Stop mysql on codfw sanitarium host db2095 (s2, s7, s6, s4) to clone db2187 T326596
  • 05:37 eileen: civicrm upgraded from ffc16d2d to fe2c06f6
  • 00:25 ejegg: civicrm rolled back from d199694e to ffc16d2d
  • 00:06 zabe@deploy2002: Finished scap: T198673 (duration: 07m 25s)

Archives

See Server Admin Log/Archives.