Server Admin Log
Appearance
2025-05-12
- 20:43 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T392806)', diff saved to https://phabricator.wikimedia.org/P75933 and previous config saved to /var/cache/conftool/dbconfig/20250512-204336-fceratto.json
- 20:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 20:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 20:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T392806)', diff saved to https://phabricator.wikimedia.org/P75932 and previous config saved to /var/cache/conftool/dbconfig/20250512-204253-fceratto.json
- 20:40 tgr@deploy1003: tgr, krinkle: Backport for mc: remove unused "memcached-pecl" definition from wgObjectCaches (T371378) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:35 tgr@deploy1003: Started scap sync-world: Backport for mc: remove unused "memcached-pecl" definition from wgObjectCaches (T371378)
- 20:31 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts lvs3009.esams.wmnet
- 20:30 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts lvs3009.esams.wmnet
- 20:30 dr0ptp4kt@deploy1003: Finished scap sync-world: Backport for Stream config for edge uniques on prod cluster (T391959) (duration: 18m 53s)
- 20:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P75931 and previous config saved to /var/cache/conftool/dbconfig/20250512-202746-fceratto.json
- 20:23 dr0ptp4kt@deploy1003: dr0ptp4kt: Continuing with sync
- 20:16 dr0ptp4kt@deploy1003: dr0ptp4kt: Backport for Stream config for edge uniques on prod cluster (T391959) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:14 sukhe@dns1004: END - running authdns-update
- 20:13 sukhe@dns1004: START - running authdns-update
- 20:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P75930 and previous config saved to /var/cache/conftool/dbconfig/20250512-201240-fceratto.json
- 20:11 dr0ptp4kt@deploy1003: Started scap sync-world: Backport for Stream config for edge uniques on prod cluster (T391959)
- 20:11 bearloga@deploy1003: Finished deploy [airflow-dags/analytics_product@17f8417]: (no justification provided) (duration: 00m 53s)
- 20:10 bearloga@deploy1003: Started deploy [airflow-dags/analytics_product@17f8417]: (no justification provided)
- 19:58 bking@dns1004: START - running authdns-update
- 19:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T392806)', diff saved to https://phabricator.wikimedia.org/P75929 and previous config saved to /var/cache/conftool/dbconfig/20250512-195732-fceratto.json
- 19:50 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) search-chi.svc.eqiad.wmnet on all recursors
- 19:49 bking@cumin2002: START - Cookbook sre.dns.wipe-cache search-chi.svc.eqiad.wmnet on all recursors
- 19:49 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T392806)', diff saved to https://phabricator.wikimedia.org/P75928 and previous config saved to /var/cache/conftool/dbconfig/20250512-194933-fceratto.json
- 19:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 19:40 bking@dns1004: START - running authdns-update
- 19:34 bking@dns1004: START - running authdns-update
- 19:20 jgleeson: payments-wiki upgraded from fac09775 to 92a8cbb8
- 18:46 dwisehaupt@dns1004: END - running authdns-update
- 18:45 dwisehaupt@dns1004: START - running authdns-update
- 18:37 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_ulsfo
- 18:35 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_ulsfo
- 18:01 cmooney@dns2005: END - running authdns-update
- 18:00 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:59 cmooney@dns2005: START - running authdns-update
- 17:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1070.eqiad.wmnet with OS bullseye
- 17:58 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 17:51 cmooney@dns2005: START - running authdns-update
- 17:38 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:38 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate dns recrods for new codfw switches - cmooney@cumin1002"
- 17:38 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: generate dns recrods for new codfw switches - cmooney@cumin1002"
- 17:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 17:31 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1070.eqiad.wmnet with reason: host reimage
- 17:28 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1070.eqiad.wmnet with reason: host reimage
- 17:25 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:25 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:16 krinkle@deploy1003: Finished scap sync-world: Backport for tests: Remove one-off test-only getDblistsUsedInSettings() and isWikiFamily(), multiversion: Update readDbListFile() calls from alias to WmfConfig, tests: Replace array_keys(wikiversions.json) with all.dblist (duration: 17m 05s)
- 17:10 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1070
- 17:10 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1070
- 17:10 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1070.eqiad.wmnet with OS bullseye
- 17:09 krinkle@deploy1003: krinkle: Continuing with sync
- 17:04 krinkle@deploy1003: krinkle: Backport for tests: Remove one-off test-only getDblistsUsedInSettings() and isWikiFamily(), multiversion: Update readDbListFile() calls from alias to WmfConfig, tests: Replace array_keys(wikiversions.json) with all.dblist synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:59 krinkle@deploy1003: Started scap sync-world: Backport for tests: Remove one-off test-only getDblistsUsedInSettings() and isWikiFamily(), multiversion: Update readDbListFile() calls from alias to WmfConfig, tests: Replace array_keys(wikiversions.json) with all.dblist
- 16:52 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 16:52 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:43 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 16:43 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:34 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet with reason: Release v0.10.1 - volans@cumin1003
- 16:33 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet with reason: Release v0.10.1 - volans@cumin1003
- 16:32 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1002.eqiad.wmnet with reason: Release v0.10.1 - volans@cumin1003
- 16:32 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin1002.eqiad.wmnet with reason: Release v0.10.1 - volans@cumin1003
- 16:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1070 to cirrussearch1070
- 16:28 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1070
- 16:27 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1070
- 16:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1070 on all recursors
- 16:27 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1070 on all recursors
- 16:26 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:26 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1070 to cirrussearch1070 - bking@cumin2002"
- 16:25 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1070 to cirrussearch1070 - bking@cumin2002"
- 16:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1069.eqiad.wmnet with OS bullseye
- 16:17 jelto: update helm311 and helm317 on contint1002 contint2002 - T387548
- 16:16 bking@cumin2002: START - Cookbook sre.dns.netbox
- 16:16 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1070 to cirrussearch1070
- 16:16 dwisehaupt@dns1004: END - running authdns-update
- 16:15 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 16:15 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 16:14 dwisehaupt@dns1004: START - running authdns-update
- 16:05 jelto: update helm311 and helm317 on deploy1003 - T387548
- 16:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1069.eqiad.wmnet with reason: host reimage
- 16:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T392806)', diff saved to https://phabricator.wikimedia.org/P75925 and previous config saved to /var/cache/conftool/dbconfig/20250512-160230-fceratto.json
- 15:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1069.eqiad.wmnet with reason: host reimage
- 15:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P75924 and previous config saved to /var/cache/conftool/dbconfig/20250512-154723-fceratto.json
- 15:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1069
- 15:44 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1069
- 15:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1069.eqiad.wmnet with OS bullseye
- 15:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1069 to cirrussearch1069
- 15:42 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1069
- 15:41 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1069
- 15:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1069 on all recursors
- 15:41 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1069 on all recursors
- 15:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:41 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1069 to cirrussearch1069 - bking@cumin2002"
- 15:40 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1069 to cirrussearch1069 - bking@cumin2002"
- 15:35 bking@cumin2002: START - Cookbook sre.dns.netbox
- 15:35 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1069 to cirrussearch1069
- 15:34 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_ulsfo
- 15:34 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_ulsfo
- 15:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P75922 and previous config saved to /var/cache/conftool/dbconfig/20250512-153216-fceratto.json
- 15:27 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1068.eqiad.wmnet with OS bullseye
- 15:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T392806)', diff saved to https://phabricator.wikimedia.org/P75921 and previous config saved to /var/cache/conftool/dbconfig/20250512-151709-fceratto.json
- 15:13 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
- 15:13 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
- 15:12 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
- 15:12 elukey@cumin1002: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
- 15:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T392806)', diff saved to https://phabricator.wikimedia.org/P75920 and previous config saved to /var/cache/conftool/dbconfig/20250512-151020-fceratto.json
- 15:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 15:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 15:05 volans: upgraded spicerack to v10.2.0 on cumin1002
- 15:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T392806)', diff saved to https://phabricator.wikimedia.org/P75919 and previous config saved to /var/cache/conftool/dbconfig/20250512-150454-fceratto.json
- 15:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1068.eqiad.wmnet with reason: host reimage
- 14:59 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1068.eqiad.wmnet with reason: host reimage
- 14:58 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
- 14:58 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
- 14:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 60 hosts
- 14:57 elukey@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
- 14:57 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for 60 hosts
- 14:57 elukey@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
- 14:54 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on 60 hosts with reason: surpress CirrusSearchNodeIndexingNotIncreasing alerts with CODFW is depooled
- 14:50 dancy@deploy1003: Installation of scap version "4.163.0" completed for 2 hosts
- 14:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P75918 and previous config saved to /var/cache/conftool/dbconfig/20250512-144948-fceratto.json
- 14:48 dancy@deploy1003: Installing scap version "4.163.0" for 2 host(s)
- 14:44 jelto: update helm311 and helm317 on deploy2002 - T387548
- 14:42 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1068
- 14:42 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1068
- 14:42 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1068.eqiad.wmnet with OS bullseye
- 14:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1068 to cirrussearch1068
- 14:40 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1068
- 14:39 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7001.magru.wmnet
- 14:39 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp7001.magru.wmnet
- 14:39 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
- 14:35 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P75917 and previous config saved to /var/cache/conftool/dbconfig/20250512-143441-fceratto.json
- 14:27 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1068
- 14:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1068 on all recursors
- 14:27 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1068 on all recursors
- 14:27 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:27 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1068 to cirrussearch1068 - bking@cumin2002"
- 14:27 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1068 to cirrussearch1068 - bking@cumin2002"
- 14:23 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:23 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:23 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:23 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1068 to cirrussearch1068
- 14:22 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1003.eqiad.wmnet with reason: Release v0.10.1 - volans@cumin1003
- 14:21 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin1003.eqiad.wmnet with reason: Release v0.10.1 - volans@cumin1003
- 14:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T392806)', diff saved to https://phabricator.wikimedia.org/P75916 and previous config saved to /var/cache/conftool/dbconfig/20250512-141933-fceratto.json
- 14:17 tgr@deploy1003: Finished scap sync-world: Backport for Improve session logging (T393038) (duration: 17m 24s)
- 14:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T392806)', diff saved to https://phabricator.wikimedia.org/P75915 and previous config saved to /var/cache/conftool/dbconfig/20250512-141139-fceratto.json
- 14:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 14:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T392806)', diff saved to https://phabricator.wikimedia.org/P75914 and previous config saved to /var/cache/conftool/dbconfig/20250512-141114-fceratto.json
- 14:10 tgr@deploy1003: tgr: Continuing with sync
- 14:04 tgr@deploy1003: tgr: Backport for Improve session logging (T393038) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:04 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:01 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 13:59 tgr@deploy1003: Started scap sync-world: Backport for Improve session logging (T393038)
- 13:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P75913 and previous config saved to /var/cache/conftool/dbconfig/20250512-135607-fceratto.json
- 13:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:54 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:52 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
- 13:51 fabfur@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: Testing in progress
- 13:45 hashar@deploy1003: Finished deploy [integration/docroot@21bebf5]: build: Updating mediawiki/mediawiki-codesniffer to 47.0.0 (duration: 00m 11s)
- 13:45 hashar@deploy1003: Started deploy [integration/docroot@21bebf5]: build: Updating mediawiki/mediawiki-codesniffer to 47.0.0
- 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P75912 and previous config saved to /var/cache/conftool/dbconfig/20250512-134100-fceratto.json
- 13:34 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
- 13:34 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 13:33 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for htmlform: Fix rendering contents for cloner fields (T393790) (duration: 14m 50s)
- 13:29 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:29 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T392806)', diff saved to https://phabricator.wikimedia.org/P75911 and previous config saved to /var/cache/conftool/dbconfig/20250512-132552-fceratto.json
- 13:25 lucaswerkmeister-wmde@deploy1003: stran, lucaswerkmeister-wmde: Continuing with sync
- 13:22 lucaswerkmeister-wmde@deploy1003: stran, lucaswerkmeister-wmde: Backport for htmlform: Fix rendering contents for cloner fields (T393790) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:18 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for htmlform: Fix rendering contents for cloner fields (T393790)
- 13:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T392806)', diff saved to https://phabricator.wikimedia.org/P75910 and previous config saved to /var/cache/conftool/dbconfig/20250512-131756-fceratto.json
- 13:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 13:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75909 and previous config saved to /var/cache/conftool/dbconfig/20250512-131731-fceratto.json
- 13:16 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:15 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 13:15 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 13:14 pfischer@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:14 pfischer@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:12 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 13:12 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 13:08 tgr@deploy1003: Finished scap sync-world: Backport for Get rid of ancient session_name call (T124371), Do not use $_SESSION (T29887 T124371), Set wgPHPSessionHandling to 'warn' (T362324) (duration: 32m 12s)
- 13:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P75908 and previous config saved to /var/cache/conftool/dbconfig/20250512-130225-fceratto.json
- 13:01 elukey: `puppet ca destroy thanos.discovery.wmnet` on puppetmaster1001 - old cert not used anymore
- 12:59 tgr@deploy1003: tgr, mszabo: Continuing with sync
- 12:52 tgr@deploy1003: tgr, mszabo: Backport for Get rid of ancient session_name call (T124371), Do not use $_SESSION (T29887 T124371), Set wgPHPSessionHandling to 'warn' (T362324) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P75907 and previous config saved to /var/cache/conftool/dbconfig/20250512-124718-fceratto.json
- 12:36 tgr@deploy1003: Started scap sync-world: Backport for Get rid of ancient session_name call (T124371), Do not use $_SESSION (T29887 T124371), Set wgPHPSessionHandling to 'warn' (T362324)
- 12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75906 and previous config saved to /var/cache/conftool/dbconfig/20250512-123211-fceratto.json
- 12:26 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T392806)', diff saved to https://phabricator.wikimedia.org/P75905 and previous config saved to /var/cache/conftool/dbconfig/20250512-122626-fceratto.json
- 12:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 12:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T392806)', diff saved to https://phabricator.wikimedia.org/P75904 and previous config saved to /var/cache/conftool/dbconfig/20250512-122600-fceratto.json
- 12:25 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 12:24 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 12:18 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 12:18 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 12:18 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 12:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P75903 and previous config saved to /var/cache/conftool/dbconfig/20250512-121053-fceratto.json
- 11:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P75902 and previous config saved to /var/cache/conftool/dbconfig/20250512-115545-fceratto.json
- 11:45 jgleeson: civicrm upgraded from dc096105 to 852c6ee6
- 11:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T392806)', diff saved to https://phabricator.wikimedia.org/P75901 and previous config saved to /var/cache/conftool/dbconfig/20250512-114038-fceratto.json
- 11:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T392806)', diff saved to https://phabricator.wikimedia.org/P75900 and previous config saved to /var/cache/conftool/dbconfig/20250512-113350-fceratto.json
- 11:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 11:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T392806)', diff saved to https://phabricator.wikimedia.org/P75899 and previous config saved to /var/cache/conftool/dbconfig/20250512-113324-fceratto.json
- 11:25 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Revert to v0.9.0 - volans@cumin1003
- 11:22 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Revert to v0.9.0 - volans@cumin1003
- 11:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P75898 and previous config saved to /var/cache/conftool/dbconfig/20250512-111817-fceratto.json
- 11:17 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Revert to v0.9.0 - volans@cumin1003
- 11:16 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Revert to v0.9.0 - volans@cumin1003
- 11:12 volans@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.0 - volans@cumin1003
- 11:11 volans@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin[1002-1003].eqiad.wmnet with reason: Release v0.10.0 - volans@cumin1003
- 11:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 11:08 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 11:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
- 11:03 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1088.eqiad.wmnet with OS bullseye
- 11:03 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
- 11:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P75897 and previous config saved to /var/cache/conftool/dbconfig/20250512-110310-fceratto.json
- 10:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T392806)', diff saved to https://phabricator.wikimedia.org/P75896 and previous config saved to /var/cache/conftool/dbconfig/20250512-104803-fceratto.json
- 10:47 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
- 10:44 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1088.eqiad.wmnet with reason: host reimage
- 10:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T392806)', diff saved to https://phabricator.wikimedia.org/P75895 and previous config saved to /var/cache/conftool/dbconfig/20250512-104116-fceratto.json
- 10:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 10:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 10:32 XioNoX: delete some exterminated cables from Netbox - T393188
- 10:31 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be1088.eqiad.wmnet with OS bullseye
- 10:22 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 10:22 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 10:08 Ammar: Ran fixStuckGlobalRename.php for T393877
- 09:36 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 09:25 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 09:04 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts thanos-fe[2001-2003].codfw.wmnet
- 09:04 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:04 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-fe[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1002"
- 09:03 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: thanos-fe[2001-2003].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1002"
- 09:00 mvernon@cumin1002: START - Cookbook sre.dns.netbox
- 08:55 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:55 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 08:50 mvernon@cumin1002: START - Cookbook sre.hosts.decommission for hosts thanos-fe[2001-2003].codfw.wmnet
- 08:49 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts thanos-be[2001-2003].codfw.wmnet
- 08:48 mvernon@cumin1002: START - Cookbook sre.hosts.decommission for hosts thanos-be[2001-2003].codfw.wmnet
- 08:47 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=1) rolling restart_daemons on P{thanos-fe200[4-7]*} or P{thanos-fe1*} and (A:thanos-fe or A:thanos-fe-codfw or A:thanos-fe-eqiad)
- 08:43 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on P{thanos-fe200[4-7]*} or P{thanos-fe1*} and (A:thanos-fe or A:thanos-fe-codfw or A:thanos-fe-eqiad)
- 08:39 mvernon@cumin1002: END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=1) rolling restart_daemons on A:thanos-fe
- 08:39 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
- 08:35 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 08:34 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 08:33 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 08:31 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 08:29 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=apus,name=apus-fe1003.eqiad.wmnet
- 08:29 mvernon@cumin1002: conftool action : set/weight=40; selector: service=apus,name=apus-fe1003.eqiad.wmnet
- 08:10 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 08:09 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 07:57 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Cicalese out of all services on: 2402 hosts
- 07:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 07:12 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Debt out of all services on: 2402 hosts
2025-05-11
- 22:55 tchin@deploy1003: Finished deploy [airflow-dags/analytics@301c74b]: Deploying airflow artifacts for T384962 (duration: 02m 01s)
- 22:54 tchin@deploy1003: Started deploy [airflow-dags/analytics@301c74b]: Deploying airflow artifacts for T384962
2025-05-10
- 00:41 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 00:41 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 00:41 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 00:41 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 00:41 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 00:41 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 00:23 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 00:22 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 00:22 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 00:22 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 00:22 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 00:22 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 00:16 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 00:16 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 00:16 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 00:15 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 00:15 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 00:15 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
2025-05-09
- 23:02 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 22:10 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 22:09 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:03 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:57 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 21:05 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 20:53 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from elastic1068 to cirrussearch1068
- 20:52 bking@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 20:50 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:49 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1068 to cirrussearch1068
- 20:46 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on elastic1054.eqiad.wmnet with reason: downtime prior to decom
- 20:39 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1006.eqiad.wmnet with OS bullseye
- 20:39 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 20:35 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 20:30 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1053.eqiad.wmnet with OS bullseye
- 20:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1053
- 20:23 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1053
- 20:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1053.eqiad.wmnet with OS bullseye
- 20:20 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1053.eqiad.wmnet with OS bullseye
- 20:18 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1006.eqiad.wmnet with reason: host reimage
- 20:15 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1006.eqiad.wmnet with reason: host reimage
- 20:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host cirrussearch1053
- 20:14 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host cirrussearch1053
- 20:14 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1053.eqiad.wmnet with OS bullseye
- 20:11 jgreen@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:09 jgreen@cumin1002: START - Cookbook sre.dns.netbox
- 20:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1053 to cirrussearch1053
- 20:07 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1053
- 20:06 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1053
- 20:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1053 on all recursors
- 20:05 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1053 on all recursors
- 20:05 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:05 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1053 to cirrussearch1053 - bking@cumin2002"
- 20:04 inflatador: bking@cumin2002 removed unrelated `fran1001` DNS record during a rename
- 20:03 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1053 to cirrussearch1053 - bking@cumin2002"
- 20:00 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:00 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1053 to cirrussearch1053
- 19:55 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1006.eqiad.wmnet with OS bullseye
- 19:50 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
- 19:50 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
- 19:48 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1006.eqiad.wmnet with OS bullseye
- 19:45 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
- 19:45 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['elastic1054.eqiad.wmnet']
- 19:24 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1007.eqiad.wmnet with OS bullseye
- 19:06 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:56 ryankemper@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs1012.eqiad.wmnet|wdqs1013.eqiad.wmnet|wdqs1014.eqiad.wmnet|wdqs1015.eqiad.wmnet|wdqs2007.codfw.wmnet|wdqs2010.codfw.wmnet|wdqs2011.codfw.wmnet|wdqs2012.codfw.wmnet|wdqs2013.codfw.wmnet
- 18:28 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1006.eqiad.wmnet with OS bullseye
- 18:25 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:24 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe1007
- 18:24 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:23 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe1007
- 18:23 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:23 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1007 - vriley@cumin1002"
- 18:22 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1007 - vriley@cumin1002"
- 18:19 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 18:17 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:16 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe1006
- 18:15 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe1006
- 18:14 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:14 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1006 - vriley@cumin1002"
- 18:14 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1006 - vriley@cumin1002"
- 18:11 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 17:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 17:05 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 16:21 krinkle@deploy1003: Finished scap sync-world: Backport for noc: Fix "Class MWMultiVersion not found" in wiki.php (duration: 13m 42s)
- 16:20 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@bfb9c63]: bump image suggestions to 1.6.0 (duration: 01m 49s)
- 16:19 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@bfb9c63]: bump image suggestions to 1.6.0
- 16:14 krinkle@deploy1003: krinkle: Continuing with sync
- 16:14 krinkle@deploy1003: krinkle: Backport for noc: Fix "Class MWMultiVersion not found" in wiki.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:07 krinkle@deploy1003: Started scap sync-world: Backport for noc: Fix "Class MWMultiVersion not found" in wiki.php
- 15:57 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from elastic1053 to cirrussearch1053
- 15:57 bking@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 15:57 bking@cumin2002: START - Cookbook sre.dns.netbox
- 15:57 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1053 to cirrussearch1053
- 15:49 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.rename (exit_code=93) from elastic1053 to cirrussearch1053
- 15:49 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1053 to cirrussearch1053
- 15:41 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641) (duration: 15m 22s)
- 15:36 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:34 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
- 15:32 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 15:25 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641)
- 14:30 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 00m 38s)
- 14:29 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
- 14:25 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 00m 31s)
- 14:24 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
- 14:21 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641) (duration: 14m 12s)
- 14:15 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Continuing with sync
- 14:14 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:07 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Bump wikibase-data-values-value-view to HEAD (T389633 T393641)
- 13:36 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 04m 10s)
- 13:32 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
- 12:51 godog: upload prometheus-blackbox-exporter 0.26.0-0~bpo12+1 to bookworm-wikimedia - T385022
- 11:45 taavi: update toolforge arc-enabled exim4 packages (component/exim4-arc) to latest in debian 12 T356171
- 11:17 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:16 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
- 11:02 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-fe1005.eqiad.wmnet with OS bullseye
- 11:02 mvernon@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 10:58 mvernon@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin1002"
- 10:40 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-fe1005.eqiad.wmnet with reason: host reimage
- 10:37 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-fe1005.eqiad.wmnet with reason: host reimage
- 10:20 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1005.eqiad.wmnet with OS bullseye
- 09:50 moritzm: imported debmonitor-client 0.4.0-3+deb13u1 for trixie-wikimedia T391083
- 09:05 zabe: zabe@deploy1003:~$ mwscript-k8s --comment="T393761" --follow -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=amwiki --logwiki=metawiki 'Jeroen' 'Retireduser-vfs199s31yvbtxsfmygg'
- 09:03 zabe: zabe@deploy1003:~$ mwscript-k8s --comment="T393372" --follow -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwikibooks --logwiki=metawiki 'Adityaindumdum' 'Renamed user a71c8354dc822ea0d3aab24d1ce886f02c25fe91'
- 08:17 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2010.codfw.wmnet -> wdqs2013.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 08:10 volans@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin1003.eqiad.wmnet with reason: Release v0.9.0 - volans@cumin2002
- 08:09 volans@cumin2002: START - Cookbook sre.deploy.python-code homer to cumin1003.eqiad.wmnet with reason: Release v0.9.0 - volans@cumin2002
- 07:57 moritzm: imported puppet-agent 7.23.0-1+wmf13u1 to component/puppet7 for trixie-wikimedia T392790
- 07:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2010.codfw.wmnet -> wdqs2013.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 07:23 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2010.codfw.wmnet -> wdqs2012.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 07:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 07:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1013.eqiad.wmnet -> wdqs1015.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 07:22 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1014.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 07:15 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
- 06:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1013.eqiad.wmnet -> wdqs1015.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 06:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1014.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 06:26 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2010.codfw.wmnet -> wdqs2012.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 06:26 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 06:26 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 06:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 06:10 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2010.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 06:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1013.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 05:30 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on cumin1003.eqiad.wmnet with reason: WIP new Bookworm host
- 05:12 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1013.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 05:12 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2010.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 05:09 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 04:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 04:03 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 06s)
- 04:03 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 04:03 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 05s)
- 04:03 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 04:02 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 05s)
- 04:02 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 04:02 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 05s)
- 04:02 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 04:01 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 05s)
- 04:01 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 00:07 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_drmrs
- 00:00 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_drmrs
2025-05-08
- 23:37 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2012.codfw.wmnet
- 23:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host thanos-fe1005.eqiad.wmnet with OS bullseye
- 23:35 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2011.codfw.wmnet
- 23:35 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1014.eqiad.wmnet
- 23:34 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2010.codfw.wmnet
- 23:30 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1015.eqiad.wmnet
- 23:26 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1013.eqiad.wmnet
- 23:22 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2013.codfw.wmnet
- 23:19 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs2007.codfw.wmnet
- 23:06 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.categories-reload (exit_code=0) reloading categories to wdqs1012.eqiad.wmnet
- 22:28 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1012.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 22:27 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 22:17 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host thanos-fe1005.eqiad.wmnet with OS bullseye
- 22:05 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host thanos-fe1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2047.codfw.wmnet with OS bookworm
- 21:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2048.codfw.wmnet with OS bookworm
- 21:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:50 tzatziki: removing 1 file for legal compliance
- 21:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2013.codfw.wmnet
- 21:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2012.codfw.wmnet
- 21:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2011.codfw.wmnet
- 21:48 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2010.codfw.wmnet
- 21:47 vriley@cumin1002: START - Cookbook sre.hosts.provision for host thanos-fe1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:46 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host thanos-fe1005
- 21:45 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host thanos-fe1005
- 21:44 tzatziki: removing 3 files for legal compliance
- 21:44 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1015.eqiad.wmnet
- 21:44 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:44 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1005 - vriley@cumin1002"
- 21:44 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1014.eqiad.wmnet
- 21:43 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt thanos-fe1005 - vriley@cumin1002"
- 21:43 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1013.eqiad.wmnet
- 21:43 ryankemper: T388134 Cutover completed about an hour ago. Metrics look good; we're in the process of shifting over some of the old `wdqs` hosts to `wdqs-main` to increase capacity
- 21:40 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 21:38 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on wdqs[2007,2013].codfw.wmnet,wdqs[1012-1014].eqiad.wmnet with reason: bringing hosts online with a data transfer
- 21:35 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1247.eqiad.wmnet with reason: Host has crashed - T393612
- 21:34 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 21:33 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs2007.codfw.wmnet
- 21:29 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1012.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 21:29 bking@cumin2002: START - Cookbook sre.wdqs.categories-reload reloading categories to wdqs1012.eqiad.wmnet
- 20:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_drmrs
- 20:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_drmrs
- 20:54 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp50[19-24].eqsin.wmnet} and A:cp
- 20:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:49 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp50[27-32].eqsin.wmnet} and A:cp
- 20:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2047.codfw.wmnet with reason: host reimage
- 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2048.codfw.wmnet with reason: host reimage
- 20:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2047.codfw.wmnet with reason: host reimage
- 20:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2048.codfw.wmnet with reason: host reimage
- 20:21 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2048.codfw.wmnet with OS bookworm
- 20:21 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
- 20:16 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-fe1003.eqiad.wmnet with OS bookworm
- 20:16 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 20:14 ryankemper: T388134 Beginning cutover of query.wikidata.org from `wdqs` to `wdqs-main`. Starting to see requests increase on wdqs-main (and decrease on wdqs) as expected. Rolling change to rest of cp text hosts. Traffic should be fully moved over in ~20 mins
- 20:03 swfrench@deploy1003: Stopping before sync operations
- 20:03 swfrench@deploy1003: Started scap sync-world: Non-deploy scap run to switch mw-script/main to PHP 8.1 - T391057
- 19:30 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 19:08 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-fe1003.eqiad.wmnet with reason: host reimage
- 19:04 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on apus-fe1003.eqiad.wmnet with reason: host reimage
- 19:01 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.28 refs T386223
- 18:48 sukhe@dns1004: END - running authdns-update
- 18:46 sukhe@dns1004: START - running authdns-update
- 18:45 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
- 18:45 zabe: move all translateable subpages of "Wikimedia Foundation Board of Trustees" to subpages of "Wikimedia Foundation/Board of Trustees" on metawiki (T393619)
- 18:43 zabe: mwscript-k8s [...]moveTranslatableBundle.php metawiki "Wikimedia Foundation Board of Trustees/Call for feedback: Board of Trustees elections" "Wikimedia Foundation/Board of Trustees/Call for feedback: Board of Trustees elections" "Zabe" --reason "per request T393619"
- 18:42 zabe: zabe@deploy1003:~$ mwscript-k8s --attach -- extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Wikimedia Foundation Board of Trustees/Call for feedback: Board of Trustees elections" "Wikimedia Foundation/Board of Trustees/Call for feedback: Board of Trustees elections" "Zabe" --reason "per request
- 18:38 zabe: zabe@deploy1003:~$ mwscript-k8s --attach -- extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Wikimedia Foundation Board of Trustees/Call for feedback:2022 Board of Trustees election/Upcoming Call for Feedback about the Board of Trustees elections" "Wikimedia Foundation/Board of Trustees/Call for feedback:2022 Board of
- 18:30 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp50[27-32].eqsin.wmnet} and A:cp
- 18:29 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp50[19-24].eqsin.wmnet} and A:cp
- 18:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2048.codfw.wmnet with OS bookworm
- 18:03 dancy@deploy1003: Installation of scap version "4.162.0" completed for 2 hosts
- 18:01 dancy@deploy1003: Installing scap version "4.162.0" for 2 host(s)
- 17:38 cdanis@dns1004: END - running authdns-update
- 17:36 cdanis@dns1004: START - running authdns-update
- 17:28 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=97) rolling upgrade of Varnish on A:cp-text_eqsin
- 17:28 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=97) rolling upgrade of Varnish on A:cp-upload_eqsin
- 17:25 cdanis@dns1004: END - running authdns-update
- 17:23 cdanis@dns1004: START - running authdns-update
- 17:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2047.codfw.wmnet with OS bookworm
- 17:12 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch1112.eqiad.wmnet|cirrussearch1113.eqiad.wmnet|cirrussearch1114.eqiad.wmnet|cirrussearch1115.eqiad.wmnet|cirrussearch1116.eqiad.wmnet|cirrussearch1117.eqiad.wmnet|cirrussearch1118.eqiad.wmnet|cirrussearch1119.eqiad.wmnet|cirrussearch1120.eqiad.wmnet|cirrussearch1121.eqiad.wmnet|cirrussearch1122.eqiad.wmnet|cirrussearch1123.eqiad.wmn
- 17:09 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch1111.eqiad.wmnet|name=cirrussearch1112.eqiad.wmnet|name=cirrussearch1113.eqiad.wmnet|name=cirrussearch1114.eqiad.wmnet|name=cirrussearch1115.eqiad.wmnet|name=cirrussearch1116.eqiad.wmnet|name=cirrussearch1117.eqiad.wmnet|name=cirrussearch1118.eqiad.wmnet|name=cirrussearch1119.eqiad.wmnet|name=cirrussearch1120.eqiad.wmnet|name=cirru
- 17:06 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 17:05 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 17:05 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 17:05 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 17:05 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 17:04 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 16:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_eqsin
- 16:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_eqsin
- 16:49 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
- 16:48 fabfur: repooling cp7001 (T393671)
- 16:48 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp7001.magru.wmnet
- 16:48 fabfur@cumin1002: START - Cookbook sre.hosts.remove-downtime for cp7001.magru.wmnet
- 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2048.codfw.wmnet with OS bookworm
- 16:29 brett@dns1005: END - running authdns-update
- 16:28 brett@dns1005: START - running authdns-update
- 16:27 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
- 16:22 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: Host has crashed - T393296
- 16:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2048
- 16:13 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2048
- 16:11 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:10 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2048 to codfw - jhancock@cumin2002"
- 16:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2048 to codfw - jhancock@cumin2002"
- 16:09 sukhe@dns1004: END - running authdns-update
- 16:08 sukhe@dns1004: START - running authdns-update
- 16:05 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
- 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
- 15:46 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
- 15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
- 15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
- 15:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:31 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:07 sukhe: sudo cumin -b1 -s10 'A:dnsbox' 'run-puppet-agent'
- 15:03 jmm@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
- 14:57 jmm@cumin1003: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
- 14:45 moritzm: imported ripe-atlas-tools 2.3.0-3+wmf12u1 to apt.wikimedia.org/bookworm T389380
- 14:45 moritzm: imported ripe-atlas-sagan 1.3.1-1~wmf12u1 to apt.wikimedia.org/bookworm T389380
- 14:36 pt1979@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1246.eqiad.wmnet with OS bookworm
- 14:34 James_F: Running `foreachwiki extensions/Echo/maintenance/removeInvalidNotification.php --remove # T389673` for MatmaRex
- 14:23 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts restbase[1028-1030].eqiad.wmnet
- 14:23 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:23 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[1028-1030].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
- 14:21 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 14:21 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 14:21 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 14:20 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 14:20 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 14:20 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: restbase[1028-1030].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eevans@cumin1002"
- 14:20 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 14:14 pt1979@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
- 14:12 pt1979@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: host reimage
- 14:03 eevans@cumin1002: START - Cookbook sre.dns.netbox
- 13:52 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host db1246.eqiad.wmnet with OS bookworm
- 13:51 eevans@cumin1002: START - Cookbook sre.hosts.decommission for hosts restbase[1028-1030].eqiad.wmnet
- 13:42 volans: forced removal of db1246 from puppetdb to unblock reimage (was failing due to a puppet change in the meanwhile)
- 13:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 13:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 13:34 tchanders@deploy1003: Finished scap sync-world: Backport for temp accounts: Remove AutopromoteOnce configuration (T393358) (duration: 16m 30s)
- 13:27 tchanders@deploy1003: tchanders, kharlan: Continuing with sync
- 13:24 tchanders@deploy1003: tchanders, kharlan: Backport for temp accounts: Remove AutopromoteOnce configuration (T393358) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:17 tchanders@deploy1003: Started scap sync-world: Backport for temp accounts: Remove AutopromoteOnce configuration (T393358)
- 13:03 moritzm: installing jetty9 security updates
- 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org
- 12:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
- 12:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon2002.wikimedia.org
- 12:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netmon2002.wikimedia.org
- 11:57 moritzm: import transferpy 1.1+deb12u1 to bookworm-wikimedia T389380
- 11:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping2004.codfw.wmnet
- 11:44 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:43 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 11:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2004.codfw.wmnet
- 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1004.eqiad.wmnet
- 11:32 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
- 11:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1004.eqiad.wmnet
- 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
- 11:19 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
- 11:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
- 11:15 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
- 11:15 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
- 10:45 zabe: zabe@deploy1003:~$ mwscript-k8s --attach -- extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki "Wikimedia Foundation Board of Trustees" "Wikimedia Foundation/Board of Trustees" "Zabe" --reason "per request T393619"
- 10:31 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 10:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 09:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 09:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 09:26 Emperor: swift delete wikipedia-commons-local-public.e7 'e/e7/Hawkmoth_(Meganoton_nyctiphanes)_(8688240817).jpg' ms-fe1009 and ms-fe2009 T392658
- 09:02 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host apus-fe1003.eqiad.wmnet with OS bookworm
- 08:53 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
- 08:52 mvernon@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host apus-fe1003.eqiad.wmnet with OS bookworm
- 08:47 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
- 08:37 fabfur@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp7001.magru.wmnet with reason: Testing in progress
- 08:19 dcausse: closing UTC morning backport window
- 08:12 dcausse@deploy1003: Finished scap sync-world: Backport for cirrus: explicitly route search traffic to codfw (T388610) (duration: 23m 19s)
- 08:06 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
- 08:05 fabfur: depooling and disabling puppet on cp7001 to perform tests (T393671)
- 08:03 dcausse@deploy1003: dcausse: Continuing with sync
- 07:56 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 00m 29s)
- 07:55 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
- 07:55 dcausse@deploy1003: dcausse: Backport for cirrus: explicitly route search traffic to codfw (T388610) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:52 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 00m 42s)
- 07:51 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
- 07:49 dcausse@deploy1003: Started scap sync-world: Backport for cirrus: explicitly route search traffic to codfw (T388610)
- 07:46 fab@deploy1003: Finished deploy [airflow-dags/research@e3ccac9]: (no justification provided) (duration: 05m 42s)
- 07:40 fab@deploy1003: Started deploy [airflow-dags/research@e3ccac9]: (no justification provided)
- 07:40 fab@deploy1003: Finished deploy [airflow-dags/research@4367417]: (no justification provided) (duration: 00m 40s)
- 07:39 fab@deploy1003: Started deploy [airflow-dags/research@4367417]: (no justification provided)
- 07:06 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2034.codfw.wmnet
- 07:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
- 07:04 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
- 06:56 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
- 06:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
- 06:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
- 06:54 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
- 06:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
- 06:47 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.9
- 06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
- 06:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1047.eqiad.wmnet
- 01:52 tstarling@deploy1003: Finished scap sync-world: Backport for Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601), Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601) (duration: 46m 12s)
- 01:38 tstarling@deploy1003: tstarling: Continuing with sync
- 01:37 tstarling@deploy1003: tstarling: Backport for Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601), Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 01:06 tstarling@deploy1003: Started scap sync-world: Backport for Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601), Use CONTENTLANGUAGE rather than USERLANGUAGE (T393601)
- 00:14 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_esams
- 00:09 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_esams
2025-05-07
- 21:33 ladsgroup@deploy1003: Finished scap sync-world: Backport for ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839), ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839) (duration: 14m 12s)
- 21:29 ejegg: payments-wiki upgraded from 822bac34 to fac09775
- 21:27 ladsgroup@deploy1003: ladsgroup, sbisson: Continuing with sync
- 21:26 ladsgroup@deploy1003: ladsgroup, sbisson: Backport for ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839), ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1124.eqiad.wmnet with OS bullseye
- 21:19 ladsgroup@deploy1003: Started scap sync-world: Backport for ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839), ApiQueryPublishedTranslations: Make `from` and `to` mandatory (T392839)
- 21:06 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_esams
- 21:06 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_esams
- 21:05 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-upload_magru
- 21:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for Charts phase 1 deployment (T393517), Clear floats to avoid tall charts (T393286), Clear floats to avoid tall charts (T393286) (duration: 17m 21s)
- 21:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1124.eqiad.wmnet with reason: host reimage
- 21:01 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1124.eqiad.wmnet with reason: host reimage
- 20:59 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on A:cp-text_magru
- 20:56 ladsgroup@deploy1003: jdlrobson, bvibber, ladsgroup: Continuing with sync
- 20:55 ladsgroup@deploy1003: jdlrobson, bvibber, ladsgroup: Backport for Charts phase 1 deployment (T393517), Clear floats to avoid tall charts (T393286), Clear floats to avoid tall charts (T393286) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1124.eqiad.wmnet with OS bullseye
- 20:49 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cirrussearch1124.eqiad.wmnet with OS bullseye
- 20:48 ladsgroup@deploy1003: Started scap sync-world: Backport for Charts phase 1 deployment (T393517), Clear floats to avoid tall charts (T393286), Clear floats to avoid tall charts (T393286)
- 20:46 ladsgroup@deploy1003: Finished scap sync-world: Backport for Remove whatlinkshere hook (T393513), Improve circuit breaking error message (T360930), Remove hard-coded timestamps in SpecialGlobalContributionsTest (T393531) (duration: 41m 41s)
- 20:33 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 20:33 ladsgroup@deploy1003: ladsgroup: Backport for Remove whatlinkshere hook (T393513), Improve circuit breaking error message (T360930), Remove hard-coded timestamps in SpecialGlobalContributionsTest (T393531) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:28 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 20:26 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs3009*} and A:liberica (T393616)
- 20:26 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs3009*} and A:liberica (T393616)
- 20:25 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 20:23 sukhe: depooling lvs3009 for HW maint: T393616
- 20:04 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove whatlinkshere hook (T393513), Improve circuit breaking error message (T360930), Remove hard-coded timestamps in SpecialGlobalContributionsTest (T393531)
- 19:35 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe1003.eqiad.wmnet with OS bookworm
- 18:55 hmonroy@deploy1003: Finished scap sync-world: Backport for Enable Codex and Multiblocks in Hebrew wiki (T377121) (duration: 17m 21s)
- 18:49 hmonroy@deploy1003: hmonroy: Continuing with sync
- 18:45 hmonroy@deploy1003: hmonroy: Backport for Enable Codex and Multiblocks in Hebrew wiki (T377121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 18:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1125.eqiad.wmnet with OS bullseye
- 18:38 hmonroy@deploy1003: Started scap sync-world: Backport for Enable Codex and Multiblocks in Hebrew wiki (T377121)
- 18:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:34 jclark@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:31 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 18:30 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 18:29 dancy@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.28 refs T386223
- 18:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1125.eqiad.wmnet with reason: host reimage
- 18:21 volans: uploaded spicerack_10.2.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia
- 18:20 aokoth@dns1004: END - running authdns-update
- 18:19 aokoth@dns1004: START - running authdns-update
- 18:18 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1125.eqiad.wmnet with reason: host reimage
- 18:14 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.eqiad.wmnet with OS bookworm
- 18:13 dancy@deploy1003: Finished scap build-images: (no justification provided) (duration: 00m 30s)
- 18:12 dancy@deploy1003: Started scap build-images: (no justification provided)
- 18:07 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1125.eqiad.wmnet with OS bullseye
- 18:06 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1124.eqiad.wmnet with OS bullseye
- 18:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1123.eqiad.wmnet with OS bullseye
- 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1125 to cirrussearch1125
- 18:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1122.eqiad.wmnet with OS bullseye
- 18:01 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1125
- 17:53 ladsgroup@deploy1003: Finished scap sync-world: Backport for Remove whatlinkshere hook (T393513) (duration: 36m 00s)
- 17:52 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:40 ladsgroup@deploy1003: ladsgroup: Continuing with sync
- 17:37 ladsgroup@deploy1003: ladsgroup: Backport for Remove whatlinkshere hook (T393513) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 17:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1123.eqiad.wmnet with reason: host reimage
- 17:35 swfrench-wmf: deploy1003 and deploy2002 updated to PHP 8.1 - T392938
- 17:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1122.eqiad.wmnet with reason: host reimage
- 17:34 vriley@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:31 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:30 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1123.eqiad.wmnet with reason: host reimage
- 17:29 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1125
- 17:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1125 on all recursors
- 17:29 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1125 on all recursors
- 17:29 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:29 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1125 to cirrussearch1125 - bking@cumin2002"
- 17:29 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1125 to cirrussearch1125 - bking@cumin2002"
- 17:28 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 17:26 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1122.eqiad.wmnet with reason: host reimage
- 17:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1124 to cirrussearch1124
- 17:24 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-upload_magru
- 17:24 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1124
- 17:23 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1124
- 17:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1124 on all recursors
- 17:23 bking@cumin2002: START - Cookbook sre.dns.netbox
- 17:23 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1124 on all recursors
- 17:23 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:23 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1124 to cirrussearch1124 - bking@cumin2002"
- 17:23 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1124 to cirrussearch1124 - bking@cumin2002"
- 17:20 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1125 to cirrussearch1125
- 17:19 bking@cumin2002: START - Cookbook sre.dns.netbox
- 17:17 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove whatlinkshere hook (T393513)
- 17:17 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1124 to cirrussearch1124
- 17:16 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe1003
- 17:15 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1123.eqiad.wmnet with OS bullseye
- 17:15 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe1003
- 17:14 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:13 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on A:cp-text_magru
- 17:13 swfrench-wmf: disable-puppet "In-place update to PHP 8.1 - T392938" on deploy1003 and deploy2002
- 17:11 vriley@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1123 to cirrussearch1123
- 17:08 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1123
- 17:08 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1122.eqiad.wmnet with OS bullseye
- 17:08 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1123
- 17:08 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1123 on all recursors
- 17:08 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1123 on all recursors
- 17:08 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:08 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1123 to cirrussearch1123 - bking@cumin2002"
- 17:08 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1123 to cirrussearch1123 - bking@cumin2002"
- 17:08 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1122 to cirrussearch1122
- 17:07 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1122
- 17:06 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1122
- 17:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1122 on all recursors
- 17:06 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1122 on all recursors
- 17:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:06 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1122 to cirrussearch1122 - bking@cumin2002"
- 17:04 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1122 to cirrussearch1122 - bking@cumin2002"
- 17:04 bking@cumin2002: START - Cookbook sre.dns.netbox
- 16:58 cdanis: per dwisehaupt T196336 💙cdanis@alert1002.wikimedia.org ~ 🕐☕ sudo systemctl restart nsca.service
- 16:58 bking@cumin2002: START - Cookbook sre.dns.netbox
- 16:56 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1123 to cirrussearch1123
- 16:56 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1122 to cirrussearch1122
- 16:43 ladsgroup@deploy1003: sync-world aborted: Backport for Remove whatlinkshere hook (T393513) (duration: 06m 07s)
- 16:36 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove whatlinkshere hook (T393513)
- 16:36 ladsgroup@deploy1003: sync-world aborted: Backport for Remove whatlinkshere hook (T393513) (duration: 29m 10s)
- 16:31 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 16:31 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 16:31 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 16:30 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1121.eqiad.wmnet with OS bullseye
- 16:09 kamila@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 16:09 kamila@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:07 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 16:07 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 16:07 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove whatlinkshere hook (T393513)
- 15:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1120.eqiad.wmnet with OS bullseye
- 15:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1121.eqiad.wmnet with reason: host reimage
- 15:53 moritzm: uploaded a python-pynetbox 7.4.1-1~wmf12u1 to bookworm-wikimedia (needed for Cumin update) T389380
- 15:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1121.eqiad.wmnet with reason: host reimage
- 15:49 zabe: zabe@mwmaint1002:~$ mwscript findBadBlobs.php enwiki --revisions 276146284,819689534,1289169661 --mark "T393237"
- 15:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1119.eqiad.wmnet with OS bullseye
- 15:43 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1247.eqiad.wmnet with reason: Host has crashed - T393612
- 15:40 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1121.eqiad.wmnet with OS bullseye
- 15:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1121 to cirrussearch1121
- 15:39 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1121
- 15:38 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ms-be1060.eqiad.wmnet
- 15:38 mvernon@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:37 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1121
- 15:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1121 on all recursors
- 15:37 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1121 on all recursors
- 15:37 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:37 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1121 to cirrussearch1121 - bking@cumin2002"
- 15:37 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1120.eqiad.wmnet with reason: host reimage
- 15:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1121 to cirrussearch1121 - bking@cumin2002"
- 15:36 mvernon@cumin1002: START - Cookbook sre.dns.netbox
- 15:32 cdanis@cumin1002: dbctl commit (dc=all): 'depool db1247', diff saved to https://phabricator.wikimedia.org/P75876 and previous config saved to /var/cache/conftool/dbconfig/20250507-153228-cdanis.json
- 15:32 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 15:31 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1120.eqiad.wmnet with reason: host reimage
- 15:31 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 15:31 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 15:31 bking@cumin2002: START - Cookbook sre.dns.netbox
- 15:30 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 15:30 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 15:30 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:30 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 15:30 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
- 15:30 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 15:30 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:29 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 15:29 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:29 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:28 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1121 to cirrussearch1121
- 15:26 mvernon@cumin1002: START - Cookbook sre.hosts.decommission for hosts ms-be1060.eqiad.wmnet
- 15:21 damilare: civicrm upgraded from 6ffbde61 to dc096105
- 15:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1118.eqiad.wmnet with OS bullseye
- 15:14 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1119.eqiad.wmnet with reason: host reimage
- 15:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1120.eqiad.wmnet with OS bullseye
- 15:10 sukhe@dns1004: END - running authdns-update
- 15:10 sukhe: timing authdns-update for T393602
- 15:09 sukhe@dns1004: START - running authdns-update
- 15:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1119.eqiad.wmnet with reason: host reimage
- 15:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1120 to cirrussearch1120
- 15:08 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1120
- 15:08 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1060*,elastic1081*,elastic1083* for thread pool rejections - bking@cumin2002
- 15:08 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1060*,elastic1081*,elastic1083* for thread pool rejections - bking@cumin2002
- 15:06 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1120
- 15:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1120 on all recursors
- 15:06 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1120 on all recursors
- 15:06 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:06 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1120 to cirrussearch1120 - bking@cumin2002"
- 15:06 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1120 to cirrussearch1120 - bking@cumin2002"
- 15:06 sukhe: sudo cumin -b1 -s10 'A:dnsbox' 'sudo -u authdns git -C /srv/authdns/git maintenance run' T393602
- 15:05 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1016.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1016.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1016.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1016.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1015.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1015.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1015.eqiad.wmnet
- 15:04 mvernon@cumin1002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1015.eqiad.wmnet
- 15:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1060*,elastic1081* for thread pool rejections - bking@cumin2002
- 15:04 sukhe@dns1004: END - running authdns-update
- 15:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1060*,elastic1081* for thread pool rejections - bking@cumin2002
- 15:04 Emperor: pool ms-fe1015 ms-fe1016 new frontends T388886 T391354
- 15:02 sukhe@dns1004: START - running authdns-update
- 15:00 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:59 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1081* for thread pool rejections - bking@cumin2002
- 14:59 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1081* for thread pool rejections - bking@cumin2002
- 14:58 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1120 to cirrussearch1120
- 14:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1119.eqiad.wmnet with OS bullseye
- 14:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1119 to cirrussearch1119
- 14:47 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1119
- 14:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1118.eqiad.wmnet with reason: host reimage
- 14:41 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1117.eqiad.wmnet with OS bullseye
- 14:40 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1119
- 14:40 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1119 on all recursors
- 14:40 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1119 on all recursors
- 14:40 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:40 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1119 to cirrussearch1119 - bking@cumin2002"
- 14:39 moritzm: installing openjdk-17 security updates
- 14:39 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1118.eqiad.wmnet with reason: host reimage
- 14:33 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1119 to cirrussearch1119 - bking@cumin2002"
- 14:29 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1116.eqiad.wmnet with OS bullseye
- 14:26 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:26 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1119 to cirrussearch1119
- 14:15 gengh@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1117.eqiad.wmnet with reason: host reimage
- 14:15 gengh@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:14 gengh@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:14 gengh@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:13 gengh@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:12 gengh@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1116.eqiad.wmnet with reason: host reimage
- 14:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1117.eqiad.wmnet with reason: host reimage
- 14:09 sukhe@dns1004: END - running authdns-update
- 14:09 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1118.eqiad.wmnet with OS bullseye
- 14:09 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1116.eqiad.wmnet with reason: host reimage
- 14:08 gengh@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:07 gengh@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:07 gengh@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:07 sukhe@dns1004: START - running authdns-update
- 14:06 gengh@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1118 to cirrussearch1118
- 14:05 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 14:05 gengh@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:05 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1118
- 14:04 gengh@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:03 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1118
- 14:03 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1118 on all recursors
- 14:03 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1118 on all recursors
- 14:03 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:03 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1118 to cirrussearch1118 - bking@cumin2002"
- 14:03 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1118 to cirrussearch1118 - bking@cumin2002"
- 14:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 13:59 sukhe@dns1004: END - running authdns-update
- 13:59 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:58 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1118 to cirrussearch1118
- 13:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1117.eqiad.wmnet with OS bullseye
- 13:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1116.eqiad.wmnet with OS bullseye
- 13:57 sukhe@dns1004: START - running authdns-update
- 13:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1117 to cirrussearch1117
- 13:52 moritzm: installing nginx security updates
- 13:51 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1117
- 13:50 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
- 13:50 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1117
- 13:50 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1117 on all recursors
- 13:50 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1117 on all recursors
- 13:50 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:50 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1117 to cirrussearch1117 - bking@cumin2002"
- 13:50 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1117 to cirrussearch1117 - bking@cumin2002"
- 13:47 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
- 13:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1047.eqiad.wmnet
- 13:43 mvernon@cumin1002: END (ERROR) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=97) rolling restart_daemons on A:swift-fe-eqiad
- 13:43 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
- 13:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 13:41 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1016.eqiad.wmnet
- 13:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 13:37 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-fe1016.eqiad.wmnet
- 13:36 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-fe1015.eqiad.wmnet
- 13:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1116 to cirrussearch1116
- 13:34 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1116
- 13:33 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1116
- 13:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1116 on all recursors
- 13:33 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1116 on all recursors
- 13:33 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:33 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1116 to cirrussearch1116 - bking@cumin2002"
- 13:33 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:32 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1116 to cirrussearch1116 - bking@cumin2002"
- 13:31 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1117 to cirrussearch1117
- 13:30 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-fe1015.eqiad.wmnet
- 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1046.eqiad.wmnet
- 13:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1046.eqiad.wmnet
- 13:28 bking@cumin2002: START - Cookbook sre.dns.netbox
- 13:27 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1116 to cirrussearch1116
- 13:25 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 13:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1046.eqiad.wmnet
- 13:21 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 13:21 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 13:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1046.eqiad.wmnet
- 13:15 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1045.eqiad.wmnet
- 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1045.eqiad.wmnet
- 13:07 moritzm: installing poppler security updates
- 13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1045.eqiad.wmnet
- 13:07 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 13:07 hashar: Restarted Apache httpd server on Gerrit server
- 13:07 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
- 13:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1045.eqiad.wmnet
- 12:58 Amir1: [wikishared]> CREATE INDEX translation_last_updated_timestamp ON cx_translations (translation_last_updated_timestamp); (T392839)
- 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1044.eqiad.wmnet
- 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1044.eqiad.wmnet
- 12:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1044.eqiad.wmnet
- 12:38 moritzm: installing imagemagick security updates
- 12:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1044.eqiad.wmnet
- 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1043.eqiad.wmnet
- 12:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1043.eqiad.wmnet
- 12:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1043.eqiad.wmnet
- 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1043.eqiad.wmnet
- 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1042.eqiad.wmnet
- 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1042.eqiad.wmnet
- 11:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1042.eqiad.wmnet
- 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1042.eqiad.wmnet
- 11:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2203.codfw.wmnet with reason: Maintenance
- 11:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1041.eqiad.wmnet
- 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1041.eqiad.wmnet
- 11:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
- 11:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1041.eqiad.wmnet
- 11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
- 10:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1041.eqiad.wmnet
- 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1040.eqiad.wmnet
- 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1040.eqiad.wmnet
- 10:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1040.eqiad.wmnet
- 10:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1040.eqiad.wmnet
- 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1039.eqiad.wmnet
- 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1039.eqiad.wmnet
- 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1039.eqiad.wmnet
- 10:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1039.eqiad.wmnet
- 10:27 moritzm: upgrading krb2002 to Bookworm T390863
- 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
- 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1038.eqiad.wmnet
- 10:22 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on krb2002.codfw.wmnet with reason: update to Bookworm
- 10:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1038.eqiad.wmnet
- 10:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
- 10:14 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1037.eqiad.wmnet
- 10:14 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1037.eqiad.wmnet
- 10:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
- 09:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
- 09:54 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1036.eqiad.wmnet
- 09:54 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1036.eqiad.wmnet
- 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
- 09:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
- 08:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1035.eqiad.wmnet
- 08:55 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1035.eqiad.wmnet
- 08:54 XioNoX: update `host-inbound-traffic system-services` on pfw1-eqiad - T390052
- 08:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
- 08:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
- 08:09 zabe@deploy1003: Finished scap sync-world: Backport for SkinTemplate: Restore a string 'class' in tabAction() (T393504) (duration: 19m 01s)
- 08:02 zabe@deploy1003: zabe: Continuing with sync
- 07:56 zabe@deploy1003: zabe: Backport for SkinTemplate: Restore a string 'class' in tabAction() (T393504) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:50 zabe@deploy1003: Started scap sync-world: Backport for SkinTemplate: Restore a string 'class' in tabAction() (T393504)
- 07:17 slyngshede@dns1004: END - running authdns-update
- 07:14 slyngshede@dns1004: START - running authdns-update
- 06:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61588
- 06:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61588
- 06:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 24441
- 06:54 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 24441
- 06:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268097
- 06:53 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268097
- 06:53 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 35847
- 06:52 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 35847
- 06:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 264595
- 06:52 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 264595
- 06:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268517
- 06:52 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268517
- 06:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263569
- 06:51 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 263569
- 06:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
- 06:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
- 06:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
- 06:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
- 06:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
- 06:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
- 05:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
- 05:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
- 05:48 XioNoX: decom Tele2 transit in esams - T393401
- 05:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
- 05:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
- 05:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
- 05:29 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
- 04:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T382778)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250507-042334-ladsgroup.json
- 04:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P75869 and previous config saved to /var/cache/conftool/dbconfig/20250507-040826-ladsgroup.json
- 03:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P75868 and previous config saved to /var/cache/conftool/dbconfig/20250507-035319-ladsgroup.json
- 03:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T382778)', diff saved to https://phabricator.wikimedia.org/P75867 and previous config saved to /var/cache/conftool/dbconfig/20250507-033812-ladsgroup.json
- 03:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T382778)', diff saved to https://phabricator.wikimedia.org/P75866 and previous config saved to /var/cache/conftool/dbconfig/20250507-033518-ladsgroup.json
- 03:35 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 03:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T382778)', diff saved to https://phabricator.wikimedia.org/P75865 and previous config saved to /var/cache/conftool/dbconfig/20250507-033455-ladsgroup.json
- 03:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P75864 and previous config saved to /var/cache/conftool/dbconfig/20250507-031947-ladsgroup.json
- 03:07 tstarling@deploy1003: Finished scap sync-world: Backport for Hooks: disable if content model is unset AND CodeMirror beta is set (T373711) (duration: 32m 06s)
- 03:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P75863 and previous config saved to /var/cache/conftool/dbconfig/20250507-030440-ladsgroup.json
- 02:58 tstarling@deploy1003: tstarling, musikanimal: Continuing with sync
- 02:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T382778)', diff saved to https://phabricator.wikimedia.org/P75862 and previous config saved to /var/cache/conftool/dbconfig/20250507-024933-ladsgroup.json
- 02:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T382778)', diff saved to https://phabricator.wikimedia.org/P75861 and previous config saved to /var/cache/conftool/dbconfig/20250507-024638-ladsgroup.json
- 02:46 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 02:45 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T382778)', diff saved to https://phabricator.wikimedia.org/P75860 and previous config saved to /var/cache/conftool/dbconfig/20250507-024518-ladsgroup.json
- 02:41 tstarling@deploy1003: tstarling, musikanimal: Backport for Hooks: disable if content model is unset AND CodeMirror beta is set (T373711) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 02:34 tstarling@deploy1003: Started scap sync-world: Backport for Hooks: disable if content model is unset AND CodeMirror beta is set (T373711)
- 02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P75859 and previous config saved to /var/cache/conftool/dbconfig/20250507-023009-ladsgroup.json
- 02:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P75858 and previous config saved to /var/cache/conftool/dbconfig/20250507-021502-ladsgroup.json
- 01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T382778)', diff saved to https://phabricator.wikimedia.org/P75857 and previous config saved to /var/cache/conftool/dbconfig/20250507-015955-ladsgroup.json
- 01:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T382778)', diff saved to https://phabricator.wikimedia.org/P75856 and previous config saved to /var/cache/conftool/dbconfig/20250507-015658-ladsgroup.json
- 01:56 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 01:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T382778)', diff saved to https://phabricator.wikimedia.org/P75855 and previous config saved to /var/cache/conftool/dbconfig/20250507-015636-ladsgroup.json
- 01:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P75854 and previous config saved to /var/cache/conftool/dbconfig/20250507-014128-ladsgroup.json
- 01:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P75853 and previous config saved to /var/cache/conftool/dbconfig/20250507-012621-ladsgroup.json
- 01:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T382778)', diff saved to https://phabricator.wikimedia.org/P75852 and previous config saved to /var/cache/conftool/dbconfig/20250507-011114-ladsgroup.json
- 01:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T382778)', diff saved to https://phabricator.wikimedia.org/P75851 and previous config saved to /var/cache/conftool/dbconfig/20250507-010811-ladsgroup.json
- 01:08 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 01:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T382778)', diff saved to https://phabricator.wikimedia.org/P75850 and previous config saved to /var/cache/conftool/dbconfig/20250507-010748-ladsgroup.json
- 00:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P75849 and previous config saved to /var/cache/conftool/dbconfig/20250507-005240-ladsgroup.json
- 00:39 hmonroy@deploy1003: Finished scap sync-world: Backport for Revert "JavaScript: ESLint 8.57.0" (T381577) (duration: 47m 14s)
- 00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P75848 and previous config saved to /var/cache/conftool/dbconfig/20250507-003733-ladsgroup.json
- 00:33 andrew@dns1004: END - running authdns-update
- 00:30 andrew@dns1004: START - running authdns-update
- 00:26 hmonroy@deploy1003: hmonroy, musikanimal: Continuing with sync
- 00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T382778)', diff saved to https://phabricator.wikimedia.org/P75847 and previous config saved to /var/cache/conftool/dbconfig/20250507-002226-ladsgroup.json
- 00:21 hmonroy@deploy1003: hmonroy, musikanimal: Backport for Revert "JavaScript: ESLint 8.57.0" (T381577) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 00:19 andrew@dns1004: END - running authdns-update
- 00:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T382778)', diff saved to https://phabricator.wikimedia.org/P75846 and previous config saved to /var/cache/conftool/dbconfig/20250507-001924-ladsgroup.json
- 00:19 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 00:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T382778)', diff saved to https://phabricator.wikimedia.org/P75845 and previous config saved to /var/cache/conftool/dbconfig/20250507-001901-ladsgroup.json
- 00:16 andrew@dns1004: START - running authdns-update
- 00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P75844 and previous config saved to /var/cache/conftool/dbconfig/20250507-000354-ladsgroup.json
2025-05-06
- 23:52 hmonroy@deploy1003: Started scap sync-world: Backport for Revert "JavaScript: ESLint 8.57.0" (T381577)
- 23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P75843 and previous config saved to /var/cache/conftool/dbconfig/20250506-234846-ladsgroup.json
- 23:37 hmonroy@deploy1003: Finished scap sync-world: Backport for InitialiseSettings: enable multiblocks on group0 (T377121) (duration: 14m 17s)
- 23:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T382778)', diff saved to https://phabricator.wikimedia.org/P75842 and previous config saved to /var/cache/conftool/dbconfig/20250506-233339-ladsgroup.json
- 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T382778)', diff saved to https://phabricator.wikimedia.org/P75841 and previous config saved to /var/cache/conftool/dbconfig/20250506-233041-ladsgroup.json
- 23:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 23:30 hmonroy@deploy1003: musikanimal, hmonroy: Continuing with sync
- 23:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T382778)', diff saved to https://phabricator.wikimedia.org/P75840 and previous config saved to /var/cache/conftool/dbconfig/20250506-233002-ladsgroup.json
- 23:29 hmonroy@deploy1003: musikanimal, hmonroy: Backport for InitialiseSettings: enable multiblocks on group0 (T377121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 23:22 hmonroy@deploy1003: Started scap sync-world: Backport for InitialiseSettings: enable multiblocks on group0 (T377121)
- 23:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1115.eqiad.wmnet with OS bullseye
- 23:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1114.eqiad.wmnet with OS bullseye
- 23:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P75839 and previous config saved to /var/cache/conftool/dbconfig/20250506-231454-ladsgroup.json
- 22:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P75838 and previous config saved to /var/cache/conftool/dbconfig/20250506-225947-ladsgroup.json
- 22:51 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1115.eqiad.wmnet with reason: host reimage
- 22:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1114.eqiad.wmnet with reason: host reimage
- 22:45 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1115.eqiad.wmnet with reason: host reimage
- 22:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T382778)', diff saved to https://phabricator.wikimedia.org/P75837 and previous config saved to /var/cache/conftool/dbconfig/20250506-224440-ladsgroup.json
- 22:44 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1114.eqiad.wmnet with reason: host reimage
- 22:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T382778)', diff saved to https://phabricator.wikimedia.org/P75836 and previous config saved to /var/cache/conftool/dbconfig/20250506-224132-ladsgroup.json
- 22:41 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 22:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T382778)', diff saved to https://phabricator.wikimedia.org/P75835 and previous config saved to /var/cache/conftool/dbconfig/20250506-224110-ladsgroup.json
- 22:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1113.eqiad.wmnet with OS bullseye
- 22:34 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1115.eqiad.wmnet with OS bullseye
- 22:32 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1114.eqiad.wmnet with OS bullseye
- 22:29 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
- 22:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P75834 and previous config saved to /var/cache/conftool/dbconfig/20250506-222603-ladsgroup.json
- 22:25 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
- 22:21 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
- 22:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1113.eqiad.wmnet with reason: host reimage
- 22:13 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1113.eqiad.wmnet with reason: host reimage
- 22:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P75833 and previous config saved to /var/cache/conftool/dbconfig/20250506-221056-ladsgroup.json
- 22:10 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
- 22:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
- 22:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1115 to cirrussearch1115
- 22:02 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1115
- 22:02 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1113.eqiad.wmnet with OS bullseye
- 22:02 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
- 22:01 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1115
- 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1115 on all recursors
- 22:01 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1115 on all recursors
- 22:01 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:01 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1115 to cirrussearch1115 - bking@cumin2002"
- 22:00 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
- 21:59 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
- 21:59 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
- 21:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T382778)', diff saved to https://phabricator.wikimedia.org/P75832 and previous config saved to /var/cache/conftool/dbconfig/20250506-215549-ladsgroup.json
- 21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T382778)', diff saved to https://phabricator.wikimedia.org/P75831 and previous config saved to /var/cache/conftool/dbconfig/20250506-215242-ladsgroup.json
- 21:52 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T382778)', diff saved to https://phabricator.wikimedia.org/P75830 and previous config saved to /var/cache/conftool/dbconfig/20250506-215219-ladsgroup.json
- 21:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
- 21:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
- 21:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
- 21:40 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
- 21:40 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
- 21:40 andrew@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
- 21:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1113.eqiad.wmnet with OS bullseye
- 21:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P75829 and previous config saved to /var/cache/conftool/dbconfig/20250506-213712-ladsgroup.json
- 21:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1112.eqiad.wmnet with OS bullseye
- 21:28 ryankemper: T388134 Seeing 502 errors; that explains why the drop in requests to wdqs-full is not matched by an increase to wdqs-main. Rolling back for now while we figure out what piece we're missing
- 21:24 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1115 to cirrussearch1115 - bking@cumin2002"
- 21:23 ryankemper: T388134 Cutover of query.wikidata.org to `wdqs-main` instead of `wdqs` is ongoing. We're seeing the expected drop in queries to the main cluster (https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs&from=1746565806937&to=1746566592047) but not seeing corresponding increase in wdqs-main yet
- 21:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P75828 and previous config saved to /var/cache/conftool/dbconfig/20250506-212204-ladsgroup.json
- 21:20 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
- 21:18 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 21:18 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 21:17 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 21:17 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:17 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 21:16 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1115 to cirrussearch1115
- 21:16 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 21:16 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 21:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1114 to cirrussearch1114
- 21:15 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1113.eqiad.wmnet with reason: host reimage
- 21:15 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1114
- 21:12 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1114
- 21:12 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1114 on all recursors
- 21:12 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1114 on all recursors
- 21:12 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:12 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1114 to cirrussearch1114 - bking@cumin2002"
- 21:12 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1112.eqiad.wmnet with reason: host reimage
- 21:12 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1113.eqiad.wmnet with reason: host reimage
- 21:10 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1114 to cirrussearch1114 - bking@cumin2002"
- 21:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
- 21:07 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1112.eqiad.wmnet with reason: host reimage
- 21:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T382778)', diff saved to https://phabricator.wikimedia.org/P75827 and previous config saved to /var/cache/conftool/dbconfig/20250506-210658-ladsgroup.json
- 21:06 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe1003.wikimedia.org with OS bookworm
- 21:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
- 21:03 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
- 21:03 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
- 21:03 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T382778)', diff saved to https://phabricator.wikimedia.org/P75826 and previous config saved to /var/cache/conftool/dbconfig/20250506-210329-ladsgroup.json
- 21:03 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 21:03 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1114 to cirrussearch1114
- 21:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T382778)', diff saved to https://phabricator.wikimedia.org/P75825 and previous config saved to /var/cache/conftool/dbconfig/20250506-210307-ladsgroup.json
- 21:02 ryankemper@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=wdqs1011.eqiad.wmnet|wdqs1016.eqiad.wmnet|wdqs1017.eqiad.wmnet|wdqs2008.codfw.wmnet|wdqs2014.codfw.wmnet|wdqs2015.codfw.wmnet
- 21:01 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
- 21:00 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1113.eqiad.wmnet with OS bullseye
- 20:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1113 to cirrussearch1113
- 20:58 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1113
- 20:57 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1113
- 20:57 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1113 on all recursors
- 20:57 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1113 on all recursors
- 20:57 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:57 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1113 to cirrussearch1113 - bking@cumin2002"
- 20:56 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1113 to cirrussearch1113 - bking@cumin2002"
- 20:56 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1112.eqiad.wmnet with OS bullseye
- 20:52 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:52 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P75824 and previous config saved to /var/cache/conftool/dbconfig/20250506-204758-ladsgroup.json
- 20:45 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
- 20:44 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
- 20:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
- 20:43 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:42 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1113 to cirrussearch1113
- 20:40 andrew@cumin1002: DONE (FAIL) - Cookbook sre.puppet.renew-cert (exit_code=99) for cloudrabbit2001-dev.codfw.wmnet: Renew puppet certificate - andrew@cumin1002
- 20:40 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:39 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1112 to cirrussearch1112
- 20:37 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1112
- 20:36 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1112
- 20:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1112 on all recursors
- 20:36 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1112 on all recursors
- 20:36 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:36 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1112 to cirrussearch1112 - bking@cumin2002"
- 20:36 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1112 to cirrussearch1112 - bking@cumin2002"
- 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P75823 and previous config saved to /var/cache/conftool/dbconfig/20250506-203251-ladsgroup.json
- 20:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:28 bking@cumin2002: START - Cookbook sre.dns.netbox
- 20:27 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1112 to cirrussearch1112
- 20:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T382778)', diff saved to https://phabricator.wikimedia.org/P75822 and previous config saved to /var/cache/conftool/dbconfig/20250506-201744-ladsgroup.json
- 20:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T382778)', diff saved to https://phabricator.wikimedia.org/P75821 and previous config saved to /var/cache/conftool/dbconfig/20250506-201421-ladsgroup.json
- 20:14 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 20:13 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 20:13 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 20:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 20:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 20:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T382778)', diff saved to https://phabricator.wikimedia.org/P75820 and previous config saved to /var/cache/conftool/dbconfig/20250506-201145-ladsgroup.json
- 19:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P75819 and previous config saved to /var/cache/conftool/dbconfig/20250506-195638-ladsgroup.json
- 19:46 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.wikimedia.org with OS bookworm
- 19:43 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 19:42 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 19:42 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 19:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P75818 and previous config saved to /var/cache/conftool/dbconfig/20250506-194131-ladsgroup.json
- 19:41 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 19:38 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 19:38 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 19:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T382778)', diff saved to https://phabricator.wikimedia.org/P75817 and previous config saved to /var/cache/conftool/dbconfig/20250506-192624-ladsgroup.json
- 19:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling neither afterwards
- 19:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 19:23 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 19:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1251 (T382778)', diff saved to https://phabricator.wikimedia.org/P75816 and previous config saved to /var/cache/conftool/dbconfig/20250506-192333-ladsgroup.json
- 19:23 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1251.eqiad.wmnet with reason: Maintenance
- 19:22 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 19:21 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 19:21 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2014.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling neither afterwards
- 19:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T382778)', diff saved to https://phabricator.wikimedia.org/P75815 and previous config saved to /var/cache/conftool/dbconfig/20250506-192054-ladsgroup.json
- 19:20 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 19:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 19:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P75814 and previous config saved to /var/cache/conftool/dbconfig/20250506-190547-ladsgroup.json
- 18:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P75813 and previous config saved to /var/cache/conftool/dbconfig/20250506-185040-ladsgroup.json
- 18:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T382778)', diff saved to https://phabricator.wikimedia.org/P75812 and previous config saved to /var/cache/conftool/dbconfig/20250506-183533-ladsgroup.json
- 18:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T382778)', diff saved to https://phabricator.wikimedia.org/P75811 and previous config saved to /var/cache/conftool/dbconfig/20250506-183222-ladsgroup.json
- 18:32 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 18:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T382778)', diff saved to https://phabricator.wikimedia.org/P75810 and previous config saved to /var/cache/conftool/dbconfig/20250506-183159-ladsgroup.json
- 18:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 18:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 18:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 18:23 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 18:17 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.28 refs T386223
- 18:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P75808 and previous config saved to /var/cache/conftool/dbconfig/20250506-181652-ladsgroup.json
- 18:13 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling source-only afterwards
- 18:12 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling source-only afterwards
- 18:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P75807 and previous config saved to /var/cache/conftool/dbconfig/20250506-180146-ladsgroup.json
- 17:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet, repooling source-only afterwards
- 17:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling source-only afterwards
- 17:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling source-only afterwards
- 17:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet, repooling source-only afterwards
- 17:49 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 17:49 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet, repooling source-only afterwards
- 17:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T382778)', diff saved to https://phabricator.wikimedia.org/P75806 and previous config saved to /var/cache/conftool/dbconfig/20250506-174639-ladsgroup.json
- 17:44 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 17:44 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet, repooling source-only afterwards
- 17:44 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet, repooling source-only afterwards
- 17:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet, repooling source-only afterwards
- 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T382778)', diff saved to https://phabricator.wikimedia.org/P75805 and previous config saved to /var/cache/conftool/dbconfig/20250506-174325-ladsgroup.json
- 17:43 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T382778)', diff saved to https://phabricator.wikimedia.org/P75804 and previous config saved to /var/cache/conftool/dbconfig/20250506-174313-ladsgroup.json
- 17:41 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling source-only afterwards
- 17:40 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet, repooling source-only afterwards
- 17:39 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer categories from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet, repooling source-only afterwards
- 17:31 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1017.eqiad.wmnet, repooling source-only afterwards
- 17:30 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:29 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 11s)
- 17:29 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 17:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P75803 and previous config saved to /var/cache/conftool/dbconfig/20250506-172807-ladsgroup.json
- 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P75802 and previous config saved to /var/cache/conftool/dbconfig/20250506-171259-ladsgroup.json
- 17:12 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:11 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T382778)', diff saved to https://phabricator.wikimedia.org/P75801 and previous config saved to /var/cache/conftool/dbconfig/20250506-165752-ladsgroup.json
- 16:55 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.wikimedia.org with OS bookworm
- 16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T382778)', diff saved to https://phabricator.wikimedia.org/P75800 and previous config saved to /var/cache/conftool/dbconfig/20250506-165438-ladsgroup.json
- 16:54 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T382778)', diff saved to https://phabricator.wikimedia.org/P75799 and previous config saved to /var/cache/conftool/dbconfig/20250506-165415-ladsgroup.json
- 16:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P75798 and previous config saved to /var/cache/conftool/dbconfig/20250506-163908-ladsgroup.json
- 16:34 denisse: enable Puppet on Grafana2001 - T384841
- 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - cdanis@cumin1002"
- 16:34 cdanis@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - cdanis@cumin1002
- 16:33 cdanis@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - cdanis@cumin1002
- 16:33 cdanis@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - cdanis@cumin1002"
- 16:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P75797 and previous config saved to /var/cache/conftool/dbconfig/20250506-162401-ladsgroup.json
- 16:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T382778)', diff saved to https://phabricator.wikimedia.org/P75796 and previous config saved to /var/cache/conftool/dbconfig/20250506-160854-ladsgroup.json
- 16:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T382778)', diff saved to https://phabricator.wikimedia.org/P75795 and previous config saved to /var/cache/conftool/dbconfig/20250506-160535-ladsgroup.json
- 16:05 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 16:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T382778)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250506-160507-ladsgroup.json
- 16:04 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P75793 and previous config saved to /var/cache/conftool/dbconfig/20250506-155000-ladsgroup.json
- 15:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:48 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:48 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:45 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1246.eqiad.wmnet with reason: Host has crashed - T393296
- 15:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P75792 and previous config saved to /var/cache/conftool/dbconfig/20250506-153453-ladsgroup.json
- 15:28 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
- 15:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T382778)', diff saved to https://phabricator.wikimedia.org/P75790 and previous config saved to /var/cache/conftool/dbconfig/20250506-151946-ladsgroup.json
- 15:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cirrussearch1111.eqiad.wmnet with OS bullseye
- 15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T382778)', diff saved to https://phabricator.wikimedia.org/P75789 and previous config saved to /var/cache/conftool/dbconfig/20250506-151652-ladsgroup.json
- 15:16 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T382778)', diff saved to https://phabricator.wikimedia.org/P75788 and previous config saved to /var/cache/conftool/dbconfig/20250506-151629-ladsgroup.json
- 15:11 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:11 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cirrussearch1111.eqiad.wmnet with reason: host reimage
- 15:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P75787 and previous config saved to /var/cache/conftool/dbconfig/20250506-150122-ladsgroup.json
- 14:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cirrussearch1111.eqiad.wmnet with reason: host reimage
- 14:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P75786 and previous config saved to /var/cache/conftool/dbconfig/20250506-144615-ladsgroup.json
- 14:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cirrussearch1111.eqiad.wmnet with OS bullseye
- 14:44 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1177.eqiad.wmnet with reason: Harddrive replacement
- 14:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from elastic1111 to cirrussearch1111
- 14:43 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1156.eqiad.wmnet with reason: Harddrive replacement
- 14:43 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cirrussearch1111
- 14:41 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cirrussearch1111
- 14:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cirrussearch1111 on all recursors
- 14:41 bking@cumin2002: START - Cookbook sre.dns.wipe-cache cirrussearch1111 on all recursors
- 14:41 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:41 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1111 to cirrussearch1111 - bking@cumin2002"
- 14:41 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1111 to cirrussearch1111 - bking@cumin2002"
- 14:37 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:37 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating IPs for cloudrabbit200[123]-dev - andrew@cumin1002"
- 14:37 bking@cumin2002: START - Cookbook sre.dns.netbox
- 14:37 jnuche@deploy1003: Installation of scap version "4.161.0" completed for 2 hosts
- 14:36 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating IPs for cloudrabbit200[123]-dev - andrew@cumin1002"
- 14:36 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1111 to cirrussearch1111
- 14:35 jnuche@deploy1003: Installing scap version "4.161.0" for 2 host(s)
- 14:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 14:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 14:32 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T382778)', diff saved to https://phabricator.wikimedia.org/P75785 and previous config saved to /var/cache/conftool/dbconfig/20250506-143108-ladsgroup.json
- 14:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T382778)', diff saved to https://phabricator.wikimedia.org/P75784 and previous config saved to /var/cache/conftool/dbconfig/20250506-142748-ladsgroup.json
- 14:27 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 14:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T382778)', diff saved to https://phabricator.wikimedia.org/P75783 and previous config saved to /var/cache/conftool/dbconfig/20250506-142726-ladsgroup.json
- 14:25 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Bugfixes - oblivian@cumin1002"
- 14:25 oblivian@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes - oblivian@cumin1002
- 14:25 oblivian@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfixes - oblivian@cumin1002
- 14:25 oblivian@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Bugfixes - oblivian@cumin1002"
- 14:23 tgr_: UTC afternoon deploys done
- 14:20 tgr@deploy1003: Finished scap sync-world: Backport for logging: Add context processor (T142313) (duration: 20m 37s)
- 14:15 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on wdqs1017.eqiad.wmnet with reason: bringing host online after reimage
- 14:13 tgr@deploy1003: tgr: Continuing with sync
- 14:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P75782 and previous config saved to /var/cache/conftool/dbconfig/20250506-141220-ladsgroup.json
- 14:06 tgr@deploy1003: tgr: Backport for logging: Add context processor (T142313) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:59 tgr@deploy1003: Started scap sync-world: Backport for logging: Add context processor (T142313)
- 13:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P75781 and previous config saved to /var/cache/conftool/dbconfig/20250506-135713-ladsgroup.json
- 13:53 tgr@deploy1003: Finished scap sync-world: Backport for private: Drop $wgCentralAuthSul3SharedDomainRestrictions (T390329) (duration: 16m 32s)
- 13:44 tgr@deploy1003: tgr: Continuing with sync
- 13:43 tgr@deploy1003: tgr: Backport for private: Drop $wgCentralAuthSul3SharedDomainRestrictions (T390329) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T382778)', diff saved to https://phabricator.wikimedia.org/P75780 and previous config saved to /var/cache/conftool/dbconfig/20250506-134207-ladsgroup.json
- 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T382778)', diff saved to https://phabricator.wikimedia.org/P75779 and previous config saved to /var/cache/conftool/dbconfig/20250506-133943-ladsgroup.json
- 13:39 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T382778)', diff saved to https://phabricator.wikimedia.org/P75778 and previous config saved to /var/cache/conftool/dbconfig/20250506-133920-ladsgroup.json
- 13:36 tgr@deploy1003: Started scap sync-world: Backport for private: Drop $wgCentralAuthSul3SharedDomainRestrictions (T390329)
- 13:25 tgr@deploy1003: Finished scap sync-world: Backport for CommonSettings: Document wmfGetPrivilegedGroups usage, Revert "Add .well-known/matrix for wikimedia.org" (T223835 T261531), core-Permissions: add move-subpages to enwiki templateeditor user group (T393167), Growth-Beta: Configure higher Impact Module edit limits for pilot wikis (T341599), [
- 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P75777 and previous config saved to /var/cache/conftool/dbconfig/20250506-132413-ladsgroup.json
- 13:16 tgr@deploy1003: tgr, novemlinguae, cyndywikime, lucaswerkmeister-wmde: Continuing with sync
- {{safesubst:SAL entry|1=13:14 tgr@deploy1003: tgr, novemlinguae, cyndywikime, lucaswerkmeister-wmde: Backport for CommonSettings: Document wmfGetPrivilegedGroups usage, Revert "Add .well-known/matrix for wikimedia.org" (T223835 T261531), core-Permissions: add move-subpages to enwiki templateeditor user group (T393167), [[gerrit:1136986|Growth-Beta: Configure higher Impact Module edit limits f}}
- 13:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P75776 and previous config saved to /var/cache/conftool/dbconfig/20250506-130905-ladsgroup.json
- 13:08 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
- {{safesubst:SAL entry|1=13:07 tgr@deploy1003: Started scap sync-world: Backport for CommonSettings: Document wmfGetPrivilegedGroups usage, Revert "Add .well-known/matrix for wikimedia.org" (T223835 T261531), core-Permissions: add move-subpages to enwiki templateeditor user group (T393167), Growth-Beta: Configure higher Impact Module edit limits for pilot wikis (T341599), [[}}
- 13:01 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-staging-worker
- 12:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T382778)', diff saved to https://phabricator.wikimedia.org/P75775 and previous config saved to /var/cache/conftool/dbconfig/20250506-125358-ladsgroup.json
- 12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T382778)', diff saved to https://phabricator.wikimedia.org/P75774 and previous config saved to /var/cache/conftool/dbconfig/20250506-125034-ladsgroup.json
- 12:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 12:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 12:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T382778)', diff saved to https://phabricator.wikimedia.org/P75773 and previous config saved to /var/cache/conftool/dbconfig/20250506-124954-ladsgroup.json
- 12:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P75772 and previous config saved to /var/cache/conftool/dbconfig/20250506-123448-ladsgroup.json
- 12:27 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-staging-worker
- 12:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P75771 and previous config saved to /var/cache/conftool/dbconfig/20250506-121940-ladsgroup.json
- 12:11 joal@deploy1003: Finished deploy [analytics/refinery@43a5f61] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@43a5f617] (duration: 01m 37s)
- 12:09 joal@deploy1003: Started deploy [analytics/refinery@43a5f61] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@43a5f617]
- 12:09 joal@deploy1003: Finished deploy [analytics/refinery@43a5f61] (thin): Regular analytics weekly train THIN [analytics/refinery@43a5f617] (duration: 01m 20s)
- 12:08 joal@deploy1003: Started deploy [analytics/refinery@43a5f61] (thin): Regular analytics weekly train THIN [analytics/refinery@43a5f617]
- 12:07 joal@deploy1003: Finished deploy [analytics/refinery@43a5f61]: Regular analytics weekly train [analytics/refinery@43a5f617] (duration: 02m 56s)
- 12:04 joal@deploy1003: Started deploy [analytics/refinery@43a5f61]: Regular analytics weekly train [analytics/refinery@43a5f617]
- 12:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T382778)', diff saved to https://phabricator.wikimedia.org/P75770 and previous config saved to /var/cache/conftool/dbconfig/20250506-120434-ladsgroup.json
- 12:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T382778)', diff saved to https://phabricator.wikimedia.org/P75769 and previous config saved to /var/cache/conftool/dbconfig/20250506-120108-ladsgroup.json
- 12:01 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T382778)', diff saved to https://phabricator.wikimedia.org/P75768 and previous config saved to /var/cache/conftool/dbconfig/20250506-120045-ladsgroup.json
- 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P75767 and previous config saved to /var/cache/conftool/dbconfig/20250506-114538-ladsgroup.json
- 11:43 kamila@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 11:42 kamila@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 11:37 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on backup[2010-2014].codfw.wmnet with reason: Upgrade and restart
- 11:36 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on backup1013.eqiad.wmnet with reason: Upgrade and restart
- 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P75766 and previous config saved to /var/cache/conftool/dbconfig/20250506-113031-ladsgroup.json
- 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T382778)', diff saved to https://phabricator.wikimedia.org/P75765 and previous config saved to /var/cache/conftool/dbconfig/20250506-111524-ladsgroup.json
- 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T382778)', diff saved to https://phabricator.wikimedia.org/P75764 and previous config saved to /var/cache/conftool/dbconfig/20250506-111157-ladsgroup.json
- 11:11 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T382778)', diff saved to https://phabricator.wikimedia.org/P75763 and previous config saved to /var/cache/conftool/dbconfig/20250506-111146-ladsgroup.json
- 10:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P75762 and previous config saved to /var/cache/conftool/dbconfig/20250506-105639-ladsgroup.json
- 10:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P75761 and previous config saved to /var/cache/conftool/dbconfig/20250506-104131-ladsgroup.json
- 10:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T382778)', diff saved to https://phabricator.wikimedia.org/P75760 and previous config saved to /var/cache/conftool/dbconfig/20250506-102624-ladsgroup.json
- 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T382778)', diff saved to https://phabricator.wikimedia.org/P75759 and previous config saved to /var/cache/conftool/dbconfig/20250506-102236-ladsgroup.json
- 10:22 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T382778)', diff saved to https://phabricator.wikimedia.org/P75758 and previous config saved to /var/cache/conftool/dbconfig/20250506-102226-ladsgroup.json
- 10:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P75757 and previous config saved to /var/cache/conftool/dbconfig/20250506-100719-ladsgroup.json
- 09:57 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 09:57 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 09:56 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 09:56 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 09:56 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 09:56 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 09:55 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 09:55 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 09:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P75756 and previous config saved to /var/cache/conftool/dbconfig/20250506-095212-ladsgroup.json
- 09:44 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 09:43 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 09:42 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 09:42 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 09:41 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 09:40 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 09:40 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 09:40 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
- 09:40 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database nupwiki (T390714)
- 09:40 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database nupwiki (T390714)
- 09:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T382778)', diff saved to https://phabricator.wikimedia.org/P75755 and previous config saved to /var/cache/conftool/dbconfig/20250506-093704-ladsgroup.json
- 09:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T382778)', diff saved to https://phabricator.wikimedia.org/P75754 and previous config saved to /var/cache/conftool/dbconfig/20250506-093410-ladsgroup.json
- 09:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 09:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:28 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 09:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:28 lucaswerkmeister-wmde@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 09:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:27 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 09:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 09:27 lucaswerkmeister-wmde@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 09:26 lucaswerkmeister-wmde@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 09:26 lucaswerkmeister-wmde@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 07:49 elukey: restart apache2 on puppetmaster1001
- 04:07 mwpresync@deploy1003: Pruned MediaWiki: 1.44.0-wmf.24 (duration: 07m 35s)
- 04:06 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.44.0-wmf.28 refs T386223 (duration: 62m 44s)
- 03:52 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet, repooling source-only afterwards
- 03:45 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2015.codfw.wmnet, repooling source-only afterwards
- 03:45 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15:00:00 on wdqs[2008,2014-2015].codfw.wmnet,wdqs[1011,1016].eqiad.wmnet with reason: T388134
- 03:44 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet, repooling source-only afterwards
- 03:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2014.codfw.wmnet, repooling source-only afterwards
- 03:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet, repooling source-only afterwards
- 03:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs2021.codfw.wmnet -> wdqs2008.codfw.wmnet, repooling source-only afterwards
- 03:25 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet, repooling source-only afterwards
- 03:18 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1011.eqiad.wmnet, repooling source-only afterwards
- 03:18 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 12s)
- 03:17 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 03:17 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 13s)
- 03:17 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 03:17 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 13s)
- 03:16 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 03:16 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 14s)
- 03:16 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 03:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 03:03 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.28 refs T386223
- 03:00 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:52 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wikidata from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:43 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wikidata from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:41 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wikidata from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:32 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:24 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:24 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:24 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host (duration: 00m 12s)
- 02:24 ryankemper@deploy1003: Started deploy [wdqs/wdqs@fe88851]: deploy to freshly reimaged host
- 02:22 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 02:22 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T388134, bring new main graph hosts into service) xfer wdqs-all from wdqs1021.eqiad.wmnet -> wdqs1016.eqiad.wmnet, repooling source-only afterwards
- 00:24 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
2025-05-05
- 23:32 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 23:29 eileen: civicrm upgraded from 5a1f3e8e to 6ffbde61
- 23:14 zabe: zabe@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php enwiki --delete /home/zabe/afl_text_table_deletedump/enwiki --sleep 0.3 # T381599
- 23:04 zabe@deploy1003: Finished scap sync-world: Backport for core-Permissions: refactor enwiki wgRemoveGroups (duration: 11m 13s)
- 23:01 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 23:01 ryankemper@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 23:00 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 22:59 ryankemper@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 22:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 22:57 zabe@deploy1003: zabe, novemlinguae: Continuing with sync
- 22:57 zabe@deploy1003: zabe, novemlinguae: Backport for core-Permissions: refactor enwiki wgRemoveGroups synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:52 zabe@deploy1003: Started scap sync-world: Backport for core-Permissions: refactor enwiki wgRemoveGroups
- 22:47 ryankemper@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 22:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 22:46 ryankemper@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs1017.eqiad.wmnet']
- 22:46 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
- 22:35 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe1003.wikimedia.org with OS bookworm
- 22:12 sbassett: Deployed security fix (2) for T392341
- 21:57 sbassett: Deployed security fix (1) for T392341
- 21:34 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 21:15 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.wikimedia.org with OS bookworm
- 21:14 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe1003.wikimedia.org with OS bookworm
- 21:03 jsn@deploy1003: Finished scap sync-world: Backport for Fix link for first set of Patroller Tools surveys (T389401) (duration: 14m 43s)
- 20:59 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host apus-fe1003.wikimedia.org with OS bookworm
- 20:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
- 20:56 jsn@deploy1003: jsn: Continuing with sync
- 20:56 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:56 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 20:55 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 20:55 jsn@deploy1003: jsn: Backport for Fix link for first set of Patroller Tools surveys (T389401) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:51 vriley@cumin1002: START - Cookbook sre.hosts.provision for host apus-fe1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:50 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe1003
- 20:49 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe1003
- 20:48 jsn@deploy1003: Started scap sync-world: Backport for Fix link for first set of Patroller Tools surveys (T389401)
- 20:48 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:48 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt apus-fe1003 - vriley@cumin1002"
- 20:48 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt apus-fe1003 - vriley@cumin1002"
- 20:44 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on wdqs[2008,2014-2015].codfw.wmnet,wdqs[1011,1016].eqiad.wmnet with reason: T388134
- 20:41 vriley@cumin1002: START - Cookbook sre.dns.netbox
- 20:35 jsn@deploy1003: Finished scap sync-world: Backport for Design Research Participant Survey: Undeploy (T392325), Deploy first set of Patroller Tools surveys (T389401) (duration: 19m 58s)
- 20:28 jsn@deploy1003: dani, jsn: Continuing with sync
- 20:21 jsn@deploy1003: dani, jsn: Backport for Design Research Participant Survey: Undeploy (T392325), Deploy first set of Patroller Tools surveys (T389401) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:15 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1184.eqiad.wmnet with OS bullseye
- 20:15 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 20:15 jsn@deploy1003: Started scap sync-world: Backport for Design Research Participant Survey: Undeploy (T392325), Deploy first set of Patroller Tools surveys (T389401)
- 20:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
- 19:58 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 19:46 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1184.eqiad.wmnet with reason: host reimage
- 19:43 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1184.eqiad.wmnet with reason: host reimage
- 19:37 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:35 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:27 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1184.eqiad.wmnet with OS bullseye
- 18:12 aokoth@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 18:07 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:07 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
- 18:07 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 18:03 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:02 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:02 aokoth@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 17:46 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:30 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:30 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 17:19 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from elastic1111 to cirrussearch1111
- 17:19 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:19 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rolling back cirrussearch1111 to elastic1111 - bking@cumin2002"
- 17:19 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rolling back cirrussearch1111 to elastic1111 - bking@cumin2002"
- 17:16 swfrench@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 17:16 swfrench@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 16:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
- 16:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
- 16:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:49 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
- 16:49 bking@cumin2002: START - Cookbook sre.dns.netbox
- 16:48 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
- 16:48 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:48 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1111 to cirrussearch1111 - bking@cumin2002"
- 16:46 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:45 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming elastic1111 to cirrussearch1111 - bking@cumin2002"
- 16:44 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:39 bking@cumin2002: START - Cookbook sre.dns.netbox
- 16:38 bking@cumin2002: START - Cookbook sre.hosts.rename from elastic1111 to cirrussearch1111
- 16:30 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
- 16:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2047 to codfw - jhancock@cumin2002"
- 16:24 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:20 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
- 16:20 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 16:09 aokoth@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/miscweb: apply
- 16:09 aokoth@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/miscweb: apply
- 16:07 aokoth@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 16:06 aokoth@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 16:03 aokoth@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 16:02 aokoth@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
- 15:46 hoo@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 15:46 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
- 15:46 hoo@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 15:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:46 hoo@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 15:45 hoo@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 15:45 hoo@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 15:44 hoo@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 15:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2048.codfw.wmnet with OS bookworm
- 15:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2047.codfw.wmnet with OS bookworm
- 15:33 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 15:33 dancy@deploy1003: Installation of scap version "4.160.0" completed for 2 hosts
- 15:32 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 15:32 hoo@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
- 15:32 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 15:32 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1031.eqiad.wmnet
- 15:32 hoo@deploy1003: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
- 15:32 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 15:31 hoo@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
- 15:31 dancy@deploy1003: Installing scap version "4.160.0" for 2 host(s)
- 15:31 hoo@deploy1003: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
- 15:30 hoo@deploy1003: helmfile [staging] DONE helmfile.d/services/wikidata-query-gui: apply
- 15:29 hoo@deploy1003: helmfile [staging] START helmfile.d/services/wikidata-query-gui: apply
- 15:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
- 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
- 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
- 15:25 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 15:25 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 15:23 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 15:23 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 15:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
- 15:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
- 15:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
- 15:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
- 15:12 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:12 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:12 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 15:11 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 15:11 kartik@deploy1003: Finished scap sync-world: Backport for Revert "Remove links to Special:ContentTranslationStats from dashboards" (duration: 30m 27s)
- 15:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
- 15:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
- 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
- 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
- 15:00 kartik@deploy1003: kartik: Continuing with sync
- 14:58 kartik@deploy1003: kartik: Backport for Revert "Remove links to Special:ContentTranslationStats from dashboards" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
- 14:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
- 14:44 elukey@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 14:42 elukey@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 14:40 kartik@deploy1003: Started scap sync-world: Backport for Revert "Remove links to Special:ContentTranslationStats from dashboards"
- 14:39 kartik@deploy1003: Finished scap sync-world: Backport for Growth: Remove GELevelingUpFeaturesEnabled and GEMentorDashboardEnabled feature flags (T379566) (duration: 19m 32s)
- 14:38 fabfur: upgrading haproxykafka to version 0.3.10 on A:cp (T393016)
- 14:29 kartik@deploy1003: cyndywikime, kartik: Continuing with sync
- 14:27 fabfur: enable puppet and repooled cp7001 (T393016)
- 14:27 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
- 14:25 kartik@deploy1003: cyndywikime, kartik: Backport for Growth: Remove GELevelingUpFeaturesEnabled and GEMentorDashboardEnabled feature flags (T379566) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:23 fabfur: uploading haproxykafka 0.3.10 on apt repo (T393016)
- 14:19 kartik@deploy1003: Started scap sync-world: Backport for Growth: Remove GELevelingUpFeaturesEnabled and GEMentorDashboardEnabled feature flags (T379566)
- 14:14 kartik@deploy1003: Sync cancelled.
- 14:10 kartik@deploy1003: kartik, abi: Backport for Remove links to Special:ContentTranslationStats from dashboards (T392839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:52 kartik@deploy1003: Started scap sync-world: Backport for Remove links to Special:ContentTranslationStats from dashboards (T392839)
- 13:47 kartik@deploy1003: Finished scap sync-world: Backport for Disable APIs used in Special:ContentTranslationStats (T392839) (duration: 13m 23s)
- 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
- 13:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
- 13:43 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:43 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:43 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.sanitize-wiki (exit_code=1) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:43 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
- 13:43 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:42 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.sanitize-wiki (exit_code=1) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:42 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:41 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:41 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:41 kartik@deploy1003: kartik, abi: Continuing with sync
- 13:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
- 13:39 kartik@deploy1003: kartik, abi: Backport for Disable APIs used in Special:ContentTranslationStats (T392839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:34 kartik@deploy1003: Started scap sync-world: Backport for Disable APIs used in Special:ContentTranslationStats (T392839)
- 13:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
- 13:33 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:33 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:29 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:27 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Setting up permissions and view database sanitization for wikis nupwiki in section s5
- 13:21 kartik@deploy1003: Finished scap sync-world: Backport for Disable Special:ContentTranslationStats page (T392839 T325790) (duration: 15m 29s)
- 13:20 fabfur: disabled puppet on cp7001 to test haproxykafka version (T393016)
- 13:19 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
- 13:18 fabfur: depooling cp7001 to test new haproxykafka version (T393016)
- 13:14 kartik@deploy1003: kartik, abi: Continuing with sync
- 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
- 13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
- 13:11 elukey@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 13:11 elukey@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 13:10 kartik@deploy1003: kartik, abi: Backport for Disable Special:ContentTranslationStats page (T392839 T325790) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:09 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 13:09 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
- 13:09 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 13:09 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
- 13:08 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 13:08 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
- 13:08 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 13:08 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
- 13:07 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 13:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
- 13:06 klausman@cumin2002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:ml-serve-worker-eqiad
- 13:06 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
- 13:06 kartik@deploy1003: Started scap sync-world: Backport for Disable Special:ContentTranslationStats page (T392839 T325790)
- 13:04 tappof: rebooting centrallog1002 to rollback the kernel
- 13:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
- 12:59 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
- 12:56 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
- 12:52 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
- 12:47 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
- 12:47 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
- 12:43 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
- 12:42 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
- 12:39 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
- 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
- 12:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
- 12:28 tappof: Rolling reboot of Prometheus nodes in eqiad (1005, 1006, 1008) to rollback the kernel
- 12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
- 12:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
- 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet
- 12:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
- 12:06 aqu@deploy1003: Finished deploy [analytics/refinery@dbfa557] (thin): Deploying new refinery/source artifacts THIN [analytics/refinery@dbfa557d] (duration: 01m 07s)
- 12:04 aqu@deploy1003: Started deploy [analytics/refinery@dbfa557] (thin): Deploying new refinery/source artifacts THIN [analytics/refinery@dbfa557d]
- 12:04 aqu@deploy1003: Finished deploy [analytics/refinery@dbfa557]: Deploying new refinery/source artifacts [analytics/refinery@dbfa557d] (duration: 03m 17s)
- 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
- 12:01 aqu@deploy1003: Started deploy [analytics/refinery@dbfa557]: Deploying new refinery/source artifacts [analytics/refinery@dbfa557d]
- 12:00 aqu@deploy1003: Finished deploy [analytics/refinery@dbfa557] (hadoop-test): Deploying new refinery/source artifacts TEST [analytics/refinery@dbfa557d] (duration: 00m 53s)
- 11:59 aqu@deploy1003: Started deploy [analytics/refinery@dbfa557] (hadoop-test): Deploying new refinery/source artifacts TEST [analytics/refinery@dbfa557d]
- 11:58 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
- 11:58 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2008.codfw.wmnet
- 11:56 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
- 11:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
- 11:49 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
- 11:49 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet
- 11:49 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
- 11:46 filippo@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host prometheus2006.codfw.wmnet
- 11:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
- 11:45 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet
- 11:44 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
- 11:38 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet
- 11:34 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
- 11:12 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1003.eqiad.wmnet
- 11:05 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1003.eqiad.wmnet
- 11:05 jynus@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 6:00:00 on backup[1010-1014].eqiad.wmnet with reason: Upgrade and restart
- 11:04 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
- 10:57 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
- 10:57 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
- 10:35 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=inference,name=codfw
- 10:32 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
- 10:32 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2002.codfw.wmnet
- 10:24 tappof: rebooting prometheus1007 into linux-image-6.1.0-33-amd64
- 10:17 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2002.codfw.wmnet
- 09:58 elukey@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 09:39 elukey@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 09:39 elukey@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 09:38 elukey: depool inference/codfw from DNS discovery to safely apply new pod/container security settings - T369493
- 09:30 dreamyjazz@deploy1003: Finished scap sync-world: Backport for [plwiki] Add 'abusefilter-view-private' to sysop (T393353) (duration: 13m 04s)
- 09:23 dreamyjazz@deploy1003: dreamyjazz, msz2001: Continuing with sync
- 09:21 dreamyjazz@deploy1003: dreamyjazz, msz2001: Backport for [plwiki] Add 'abusefilter-view-private' to sysop (T393353) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:17 dreamyjazz@deploy1003: Started scap sync-world: Backport for [plwiki] Add 'abusefilter-view-private' to sysop (T393353)
- 09:03 godog: powercycle vrts1003 + vrts2002 - soft lockup T393357
- 08:56 godog: powercycle centrallog2002 - can not login on ssh or console
- 08:40 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2015.codfw.wmnet with OS bullseye
- 08:32 tappof: rebooting prometheus2007 - no ssh, com2 via racadm hangs
- 08:32 godog: powercycle centrallog1002 - can not login on ssh or console
- 08:21 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage
- 08:17 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2015.codfw.wmnet with reason: host reimage
- 08:17 tappof: powercycle prometheus2008 - no ssh, mgmt console showing systemd units being deactivated, no root login
- 08:15 elukey: powercycle prometheus2005 - no ssh, mgmt console showing systemd units being deactivated, no root login
- 08:11 elukey: powercycle prometheus1008 - no ssh, mgmt console showing cpu soft lockup continously
- 08:05 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 08:05 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 08:02 tappof: rebooting prometheus1005 prometheus1006 and prometheus2006
- 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs2015
- 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2015
- 08:00 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2015
- 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs2015.codfw.wmnet 209.48.192.10.in-addr.arpa 9.0.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 08:00 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs2015.codfw.wmnet 209.48.192.10.in-addr.arpa 9.0.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2015 - ryankemper@cumin2002"
- 08:00 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2015 - ryankemper@cumin2002"
- 07:59 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 07:59 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 07:59 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 07:58 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 07:54 Dreamy_Jazz: UTC morning backport window finished
- 07:54 dreamyjazz@deploy1003: Finished scap sync-world: Backport for nnwiki: enable wgCiteResponsiveReferences (T393299), ruwikibooks: enable VisualEditorAvailableNamespaces for Рецепт (recipe) namespace (T392803), Add checkuserwiki favicon (T393246), nupwiki: add timezone (T390711) (duration: 14m 11s)
- 07:47 dreamyjazz@deploy1003: dreamyjazz, bunnypranav, anzx: Continuing with sync
- 07:44 dreamyjazz@deploy1003: dreamyjazz, bunnypranav, anzx: Backport for nnwiki: enable wgCiteResponsiveReferences (T393299), ruwikibooks: enable VisualEditorAvailableNamespaces for Рецепт (recipe) namespace (T392803), Add checkuserwiki favicon (T393246), nupwiki: add timezone (T390711) synced to the testservers (https://wikitech.wikimedia.org
- 07:40 dreamyjazz@deploy1003: Started scap sync-world: Backport for nnwiki: enable wgCiteResponsiveReferences (T393299), ruwikibooks: enable VisualEditorAvailableNamespaces for Рецепт (recipe) namespace (T392803), Add checkuserwiki favicon (T393246), nupwiki: add timezone (T390711)
- 07:31 kartik@deploy1003: Finished scap sync-world: Backport for Mobile frequent languages entrypoint: Add dependency to sitemapper (T393144 T386223) (duration: 17m 27s)
- 07:25 kartik@deploy1003: abi, kartik: Continuing with sync
- 07:21 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 07:21 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs2015
- 07:20 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2015.codfw.wmnet with OS bullseye
- 07:19 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2014.codfw.wmnet with OS bullseye
- 07:19 kartik@deploy1003: abi, kartik: Backport for Mobile frequent languages entrypoint: Add dependency to sitemapper (T393144 T386223) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:15 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 07:15 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 07:14 kartik@deploy1003: Started scap sync-world: Backport for Mobile frequent languages entrypoint: Add dependency to sitemapper (T393144 T386223)
- 07:11 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 07:11 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 07:02 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2014.codfw.wmnet with reason: host reimage
- 06:57 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2014.codfw.wmnet with reason: host reimage
- 06:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs2014
- 06:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2014
- 06:37 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2014
- 06:37 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs2014.codfw.wmnet 192.16.192.10.in-addr.arpa 2.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 06:37 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs2014.codfw.wmnet 192.16.192.10.in-addr.arpa 2.9.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 06:37 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 06:37 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2014 - ryankemper@cumin2002"
- 06:37 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2014 - ryankemper@cumin2002"
- 06:30 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 06:27 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs2014
- 06:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2014.codfw.wmnet with OS bullseye
- 06:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
- 06:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
- 06:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
- 06:16 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
- 05:49 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs2008.codfw.wmnet with OS bullseye
- 05:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs2008.codfw.wmnet with reason: host reimage
- 05:25 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs2008.codfw.wmnet with reason: host reimage
- 05:06 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs2008
- 05:06 ryankemper@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs2008
- 05:06 ryankemper@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs2008
- 05:06 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs2008.codfw.wmnet 194.32.192.10.in-addr.arpa 4.9.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 05:06 ryankemper@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs2008.codfw.wmnet 194.32.192.10.in-addr.arpa 4.9.1.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 05:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 05:05 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2008 - ryankemper@cumin2002"
- 05:05 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs2008 - ryankemper@cumin2002"
- 05:04 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1017.eqiad.wmnet with OS bullseye
- 05:00 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 04:58 ryankemper@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs2008
- 04:58 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs2008.codfw.wmnet with OS bullseye
- 04:38 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
- 04:34 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
- 04:28 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
- 04:19 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
- 04:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
- 04:13 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2002-dev.codfw.wmnet with reason: host reimage
- 04:12 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2003-dev.codfw.wmnet with reason: host reimage
- 04:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
- 04:05 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudrabbit2001-dev.codfw.wmnet with reason: host reimage
- 03:54 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2003-dev.codfw.wmnet with OS bookworm
- 03:54 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2002-dev.codfw.wmnet with OS bookworm
- 03:53 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from cloudcontrol2009-dev to cloudrabbit2003-dev
- 03:52 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudrabbit2003-dev
- 03:52 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudrabbit2003-dev
- 03:52 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 03:50 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from cloudcontrol2008-dev to cloudrabbit2002-dev
- 03:49 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 03:49 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudrabbit2002-dev
- 03:49 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudrabbit2002-dev
- 03:49 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 03:49 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cloudcontrol2008-dev to cloudrabbit2002-dev - andrew@cumin1002"
- 03:48 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cloudcontrol2008-dev to cloudrabbit2002-dev - andrew@cumin1002"
- 03:46 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudrabbit2001-dev.codfw.wmnet with OS bookworm
- 03:44 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 03:43 andrew@cumin1002: START - Cookbook sre.hosts.rename from cloudcontrol2009-dev to cloudrabbit2003-dev
- 03:43 andrew@cumin1002: START - Cookbook sre.hosts.rename from cloudcontrol2008-dev to cloudrabbit2002-dev
- 03:43 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1017.eqiad.wmnet with OS bullseye
- 03:43 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from cloudcontrol2007-dev to cloudrabbit2001-dev
- 03:42 andrew@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudrabbit2001-dev
- 03:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1016.eqiad.wmnet with OS bullseye
- 03:42 andrew@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudrabbit2001-dev
- 03:42 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 03:42 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cloudcontrol2007-dev to cloudrabbit2001-dev - andrew@cumin1002"
- 03:41 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming cloudcontrol2007-dev to cloudrabbit2001-dev - andrew@cumin1002"
- 03:37 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 03:36 andrew@cumin1002: START - Cookbook sre.hosts.rename from cloudcontrol2007-dev to cloudrabbit2001-dev
- 03:26 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
- 03:24 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1016.eqiad.wmnet with reason: host reimage
- 02:59 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1016.eqiad.wmnet with OS bullseye
- 01:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wdqs1011.eqiad.wmnet with OS bullseye
- 01:39 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1011.eqiad.wmnet with reason: host reimage
- 01:36 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1011.eqiad.wmnet with reason: host reimage
- 01:19 ryankemper@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1011.eqiad.wmnet with OS bullseye
2025-05-04
- 23:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1003.eqiad.wmnet
- 23:27 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:27 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 23:27 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 23:22 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 23:16 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1003.eqiad.wmnet
- 23:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1002.eqiad.wmnet
- 23:15 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:15 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 23:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 23:08 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 23:02 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1002.eqiad.wmnet
- 23:02 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1001.eqiad.wmnet
- 23:02 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:02 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 23:01 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
- 22:57 andrew@cumin1002: START - Cookbook sre.dns.netbox
- 22:52 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1001.eqiad.wmnet
- 20:29 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 20:29 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
- 20:07 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1056*,elastic1063* for host appears to have hot shards - bking@cumin2002
- 20:06 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1056*,elastic1063* for host appears to have hot shards - bking@cumin2002
- 19:43 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1063* for host appears to have hot shards - bking@cumin2002
- 19:43 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1063* for host appears to have hot shards - bking@cumin2002
- 19:35 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1062* for hosts appear to have hot shards - bking@cumin2002
- 19:35 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1062* for hosts appear to have hot shards - bking@cumin2002
- 19:10 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1057*,elastic1058* for hosts appear to have hot shards - bking@cumin2002
- 19:10 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1057*,elastic1058* for hosts appear to have hot shards - bking@cumin2002
- 19:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1057* for host appears to have hot shards - bking@cumin2002
- 19:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1057* for host appears to have hot shards - bking@cumin2002
- 19:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1064* for host appears to have hot shards - bking@cumin2002
- 19:03 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1064* for host appears to have hot shards - bking@cumin2002
- 10:36 krinkle@deploy1003: Finished scap sync-world: Backport for actions: Fix handling of redirects to known (non-existing) pages (duration: 30m 22s)
- 10:26 krinkle@deploy1003: krinkle: Continuing with sync
- 10:22 krinkle@deploy1003: krinkle: Backport for actions: Fix handling of redirects to known (non-existing) pages synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:06 krinkle@deploy1003: Started scap sync-world: Backport for actions: Fix handling of redirects to known (non-existing) pages
2025-05-03
- 20:09 taavi@cumin1002: dbctl commit (dc=all): 'depool db1246', diff saved to https://phabricator.wikimedia.org/P75739 and previous config saved to /var/cache/conftool/dbconfig/20250503-200910-taavi.json
- 18:35 hnowlan: delete a stuck thumbor pod in codfw
- 13:53 krinkle@deploy1003: Finished scap sync-world: Backport for multiversion: Remove getMWConfigForCacheing() as identical to getConfigGlobals() (T169821), tests: Move buildLogoHTML.php to tests/ alongside buildConfigCache.php, multiversion: Separate wmf-config reading from actual Multiversion (T169821) (duration: 16m 22s)
- 13:46 krinkle@deploy1003: krinkle: Continuing with sync
- 13:41 krinkle@deploy1003: krinkle: Backport for multiversion: Remove getMWConfigForCacheing() as identical to getConfigGlobals() (T169821), tests: Move buildLogoHTML.php to tests/ alongside buildConfigCache.php, multiversion: Separate wmf-config reading from actual Multiversion (T169821) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:36 krinkle@deploy1003: Started scap sync-world: Backport for multiversion: Remove getMWConfigForCacheing() as identical to getConfigGlobals() (T169821), tests: Move buildLogoHTML.php to tests/ alongside buildConfigCache.php, multiversion: Separate wmf-config reading from actual Multiversion (T169821)
- 12:19 reedy@deploy1003: Synchronized wmf-config/InitialiseSettings-labs.php: Allow all users to use 2FA on beta (duration: 11m 14s)
2025-05-02
- 21:38 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1054.eqiad.wmnet with OS bookworm
- 21:23 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
- 20:34 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
- 20:31 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:29 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:27 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1053.eqiad.wmnet with OS bookworm
- 20:23 tzatziki: removed 3 files for legal compliance
- 20:18 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1054.eqiad.wmnet with OS bookworm
- 20:16 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:15 tzatziki: removed 1 file for legal compliance
- 20:11 tzatziki: removed 1 file for legal compliance
- 20:09 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 20:09 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:57 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1054.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:41 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host ganeti1053.eqiad.wmnet with OS bookworm
- 19:38 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 19:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host ganeti1053.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 17:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1168.eqiad.wmnet
- 17:27 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1168.eqiad.wmnet
- 17:26 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1167.eqiad.wmnet
- 17:19 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1167.eqiad.wmnet
- 17:17 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1166.eqiad.wmnet
- 17:09 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1166.eqiad.wmnet
- 16:53 sukhe@dns1004: END - running authdns-update
- 16:51 sukhe@dns1004: START - running authdns-update
- 16:47 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-f1-codfw.mgmt.codfw.wmnet
- 16:28 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 18:00:00 on ms-fe1016.eqiad.wmnet with reason: not yet in prod
- 16:28 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 18:00:00 on ms-fe1015.eqiad.wmnet with reason: not yet in prod
- 16:26 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 16:24 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-f1-codfw.mgmt.codfw.wmnet
- 15:45 stevemunene@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1166.eqiad.wmnet
- 15:11 herron: power cycling prometheus200[78] via rac
- 15:06 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1168.eqiad.wmnet
- 15:05 jgleeson: SmashPig changed from 9b3c4587 to ddf64519
- 15:04 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1168.eqiad.wmnet
- 15:03 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1167.eqiad.wmnet
- 15:01 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1167.eqiad.wmnet
- 15:01 bking@cumin2002: conftool action : set/pooled=yes:weight=10; selector: name=cirrussearch2076.codfw.wmnet|cirrussearch2080.codfw.wmnet|cirrussearch2081.codfw.wmnet|cirrussearch2083.codfw.wmnet|cirrussearch2084.codfw.wmnet|cirrussearch2092.codfw.wmnet|cirrussearch2093.codfw.wmnet|cirrussearch2100.codfw.wmnet|cirrussearch2106.codfw.wmnet|cirrussearch2108.codfw.wmnet
- 15:01 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1166.eqiad.wmnet
- 14:55 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1166.eqiad.wmnet
- 14:48 dancy@deploy1003: Installation of scap version "4.159.0" completed for 2 hosts
- 14:46 dancy@deploy1003: Installing scap version "4.159.0" for 2 host(s)
- 14:11 inflatador: bking@localhost set search_codfw num_concurrent_incoming_recoveries from 20 back down to 4 after migration T391350
- 13:49 moritzm: imported ruby-defaults 1:3.3~wmf13u1 to component/puppet7 for trixie-wikimedia T392790
- 13:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2008.wikimedia.org
- 13:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2008.wikimedia.org
- 13:25 urandom: invoked manual `garbagecollect`, Cassandra sessionstore — T390514
- 13:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2007.codfw.wmnet
- 13:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2007.codfw.wmnet
- 12:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2006.codfw.wmnet
- 12:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2006.codfw.wmnet
- 10:06 moritzm: imported ruby-concurrent 1.1.6+dfsg-5~wmf13u1 to component/puppet7 for trixie-wikimedia T392790
- 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2001.codfw.wmnet
- 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2001.codfw.wmnet
- 09:54 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1167.eqiad.wmnet
- 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
- 09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
- 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
- 09:31 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1167.eqiad.wmnet
- 09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
- 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2003.codfw.wmnet
- 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2003.codfw.wmnet
- 08:29 XioNoX: update codfw pfw NAT - T392843
- 08:16 jmm@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2003.wikimedia.org
- 08:13 XioNoX: push pfw policies - T393098
- 08:09 jmm@cumin1002: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
- 06:46 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1167.eqiad.wmnet
- 06:42 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1167.eqiad.wmnet
- 06:30 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MarkTraceur out of all services on: 2404 hosts
- 06:21 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1167.eqiad.wmnet
- 06:18 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1167.eqiad.wmnet
- 06:14 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1166.eqiad.wmnet
- 06:09 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1166.eqiad.wmnet
- 00:41 dwisehaupt: starting staging db refresh on frdb1006 with civicrm/drupal/fredge restores from 20250430
2025-05-01
- 22:27 thcipriani: mwscript-k8s -- resetAuthenticationThrottle.pp --wiki=aawiki --signup --ip=<istanbul ips> (x17)
- 22:09 dzahn@deploy1003: Finished scap sync-world: Backport for Add another throttle rule for Istanbul Hackathon 2025 (T382309) (duration: 14m 32s)
- 22:02 dzahn@deploy1003: dzahn: Continuing with sync
- 22:00 dzahn@deploy1003: dzahn: Backport for Add another throttle rule for Istanbul Hackathon 2025 (T382309) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:54 dzahn@deploy1003: Started scap sync-world: Backport for Add another throttle rule for Istanbul Hackathon 2025 (T382309)
- 21:40 dzahn@deploy1003: Finished scap sync-world: Backport for Add throttle rule for Istanbul Hackathon 2025 (T382309) (duration: 25m 16s)
- 21:34 dzahn@deploy1003: dzahn: Continuing with sync
- 21:20 dzahn@deploy1003: dzahn: Backport for Add throttle rule for Istanbul Hackathon 2025 (T382309) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:15 dzahn@deploy1003: Started scap sync-world: Backport for Add throttle rule for Istanbul Hackathon 2025 (T382309)
- 21:03 ryankemper: T376151 [wdqs-internal lvs teardown] Declaring this officially done. No more irc log spam from me today :)
- 21:01 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:01 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove VIPs for wdqs-internal - ryankemper@cumin2002"
- 21:01 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove VIPs for wdqs-internal - ryankemper@cumin2002"
- 21:01 ryankemper: T376151 [wdqs-internal lvs teardown] `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/codfw/wdqs-internal/wdqs` && `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/codfw/wdqs-internal/`
- 21:01 ryankemper: T376151 [wdqs-internal lvs teardown] `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/eqiad/wdqs-internal/wdqs` && `sudo etcdctl -C https://conf1007.eqiad.wmnet:4001 --username root rmdir /conftool/v1/pools/eqiad/wdqs-internal/`
- 20:54 ryankemper: T376151 [wdqs-internal lvs teardown] `sudo rm -fv /srv/config-master/pybal/eqiad/wdqs-internal && sudo rm -fv /srv/config-master/pybal/codfw/wdqs-internal` on `config-master[1,2]001`
- 20:53 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 20:50 ryankemper: T376151 [wdqs-internal lvs teardown] Surrendered `10.2.2.41/32` (eqiad wdqs-internal vip) and `10.2.1.41/32` (codfw wdqs-internal vip) from netbox interface
- 20:48 ryankemper@dns1004: END - running authdns-update
- 20:46 ryankemper@dns1004: START - running authdns-update
- 20:45 jhuneidi@deploy1003: Finished scap sync-world: Backport for Check for content validity before extracting license (T389125), Fix localization for validation errors checking tabular data (T389126) (duration: 30m 35s)
- 20:40 sukhe: restart pybal on lvs1020
- 20:35 jhuneidi@deploy1003: bvibber, jhuneidi: Continuing with sync
- 20:33 jhuneidi@deploy1003: bvibber, jhuneidi: Backport for Check for content validity before extracting license (T389125), Fix localization for validation errors checking tabular data (T389126) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:32 sukhe: sudo cumin 'O:config_master' 'run-puppet-agent'
- 20:14 jhuneidi@deploy1003: Started scap sync-world: Backport for Check for content validity before extracting license (T389125), Fix localization for validation errors checking tabular data (T389126)
- 19:37 sukhe: no pending Netbox changes
- 19:37 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:34 sukhe: [correction] running sre.dns.netbox to ensure no pending changes (NOT in dry-run)
- 19:34 sukhe: running sre.dns.netbox to ensure no pending changes
- 19:34 sukhe@cumin1002: START - Cookbook sre.dns.netbox
- 19:33 dduvall: re-ran scap sync to fix mw-jobrunner codfw deployments following failed helmfile apply and verified correct image ref manually (T386222)
- 19:30 dduvall@deploy1003: Finished scap sync-world: retrying sync-world following spurious helmfile apply error (mw-jobrunner codfw) (duration: 11m 24s)
- 19:20 sukhe: sukhe@netbox1003:~$ sudo systemctl start uwsgi-netbox.service: service was OOM'ed, restarting
- 19:18 dduvall@deploy1003: Started scap sync-world: retrying sync-world following spurious helmfile apply error (mw-jobrunner codfw)
- 19:16 jhathaway@dns1004: END - running authdns-update
- 19:14 jhathaway@dns1004: START - running authdns-update
- 19:09 ryankemper: T376151 [wdqs-internal lvs teardown] running puppet across `A:wdqs-internal` now that pybal has been restarted
- 19:09 dduvall: deployment of mw-jobrunner-main for codfw failed during scap train (group2) (T386222)
- 19:09 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] all IPVS diff check alerts have recovered, rolling restart complete
- 19:06 dduvall: helm error during group2 deployment "Get "https://kubemaster.svc.codfw.wmnet:6443/api/v1/namespaces/mw-jobrunner/services/mediawiki-main-tls-service": dial tcp 10.2.1.8:6443: connect: no route to host - error from a previous attempt: read tcp 10.64.16.93:41894->10.2.1.8:6443: read: connection reset by peer"
- 19:04 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] `ipvsadm --delete-service --tcp-service 10.2.2.41:80` on `lvs1019` and `lvs1020`
- 19:03 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] `ipvsadm --delete-service --tcp-service 10.2.1.41:80` on `A:lvs-secondary-codfw OR A:lvs-low-traffic-codfw`(lvs2013, lvs2014)
- 18:59 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-low-traffic-codfw` (lvs2013)
- 18:58 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-secondary-codfw` (lvs2014), waiting 2 mins before proceeding
- 18:55 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-low-traffic-eqiad` (lvs1019), waiting few mins before proceeding
- 18:48 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] Restarted pybal on `A:lvs-secondary-eqiad`, it only restarted on ` lvs1020` but for some reason ` lvs1013` doesn't have a pybal service running
- 18:44 ryankemper: T376151 [wdqs-internal lvs teardown -> pybal rolling restart] ran puppet on `O:Lvs::balancer` after merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1136747
- 18:32 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
- 18:31 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: apply
- 18:30 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: apply
- 18:29 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: apply
- 18:28 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: apply
- 18:27 eevans@deploy1003: helmfile [staging] START helmfile.d/services/echostore: apply
- 18:26 ryankemper: T376151 (wdqs-internal lvs teardown) Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/1136744 to flip `wdqs-internal` service state to `lvs_setup` and running puppet across `A:dnsbox`
- 18:24 dduvall@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.27 refs T386222
- 18:23 ryankemper@dns1004: END - running authdns-update
- 18:21 ryankemper@dns1004: START - running authdns-update
- 17:31 jhathaway: testing sasl email relaying on mx-in{1001,2001}
- 16:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 16:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 16:39 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 16:38 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 16:04 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:02 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti2045.codfw.wmnet with OS bookworm
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti2045.codfw.wmnet with reason: host reimage
- 15:40 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti2045.codfw.wmnet with reason: host reimage
- 15:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:34 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2048.codfw.wmnet with OS bookworm
- 15:29 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2047.codfw.wmnet with OS bookworm
- 15:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
- 15:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 15:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:54 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:54 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 14:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 13:51 TheresNoTime: ran `[samtar@deploy1003 ~]$ mwscript-k8s --comment="T393093" --follow -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=knwikiquote --logwiki=metawiki '~aanzx' 'A826'` for T393093
- 13:49 samtar@deploy1003: Finished scap sync-world: Backport for mswikisource: add NamespacesToBeSearchedDefault (T392984) (duration: 12m 44s)
- 13:42 samtar@deploy1003: anzx, samtar: Continuing with sync
- 13:41 samtar@deploy1003: anzx, samtar: Backport for mswikisource: add NamespacesToBeSearchedDefault (T392984) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:39 urandom: invoking garbagecollect on sessionstore cluster — T390514
- 13:36 samtar@deploy1003: Started scap sync-world: Backport for mswikisource: add NamespacesToBeSearchedDefault (T392984)
- 13:34 urandom: lowering sessionstore gc_grace_seconds to 172800 (two days) — T390514
- 13:31 samtar@deploy1003: Finished scap sync-world: Backport for [arwiki] Change logo and tagline with sync wordmark (T392858) (duration: 21m 53s)
- 13:24 samtar@deploy1003: gergesshamon, samtar: Continuing with sync
- 13:17 samtar@deploy1003: gergesshamon, samtar: Backport for [arwiki] Change logo and tagline with sync wordmark (T392858) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:09 samtar@deploy1003: Started scap sync-world: Backport for [arwiki] Change logo and tagline with sync wordmark (T392858)
- 12:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:46 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 12:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 11:24 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 11:24 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 11:12 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
- 11:12 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
- 09:46 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 09:45 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 05:29 eileen: civicrm upgraded from 6c99f0c9 to 5a1f3e8e
- 05:14 eileen: config revision changed from b200409c to ddf64519
- 01:32 tstarling@deploy1003: Finished scap sync-world: Backport for testwiki: enable wgUseCodexSpecialBlock and wgEnableMultiBlocks (T377121) (duration: 13m 52s)
- 01:25 tstarling@deploy1003: tstarling, musikanimal: Continuing with sync
- 01:25 tstarling@deploy1003: tstarling, musikanimal: Backport for testwiki: enable wgUseCodexSpecialBlock and wgEnableMultiBlocks (T377121) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 01:18 tstarling@deploy1003: Started scap sync-world: Backport for testwiki: enable wgUseCodexSpecialBlock and wgEnableMultiBlocks (T377121)