Server Admin Log/Archive 83
Appearance
2024-07-31
- 22:23 pt1979@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 22:19 pt1979@cumin1002: START - Cookbook sre.dns.netbox
- 22:17 pt1979@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 22:15 pt1979@cumin1002: START - Cookbook sre.dns.netbox
- 22:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
- 22:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
- 21:52 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:50 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 21:28 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:27 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
- 21:17 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@82674dc]: deploy hot airflow analytics dag hot fix T368756 (duration: 01m 05s)
- 21:16 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@82674dc]: deploy hot airflow analytics dag hot fix T368756
- 21:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp7015.magru.wmnet with reason: T371554
- 21:10 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp7015.magru.wmnet with reason: T371554
- 21:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:06 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 21:04 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:02 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 20:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1257.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1255.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1254.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1253.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1259.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1251.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1252.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1250.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1258.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1258.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:49 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1258.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:47 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp7015.magru.wmnet
- 20:45 cjming: end of UTC late backport window
- 20:44 cjming@deploy1003: Finished scap: Backport for beta: Enable NetworkSession extension (T355267) (duration: 07m 47s)
- 20:40 cjming@deploy1003: ebernhardson, cjming: Continuing with sync
- 20:39 cjming@deploy1003: ebernhardson, cjming: Backport for beta: Enable NetworkSession extension (T355267) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:37 cjming@deploy1003: Started scap sync-world: Backport for beta: Enable NetworkSession extension (T355267)
- 20:34 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1257.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1257.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:31 cjming@deploy1003: Finished scap: Backport for [arwiki] Set noindex for namespace user (T371470) (duration: 17m 28s)
- 20:27 cjming@deploy1003: cjming, gergesshamon: Continuing with sync
- 20:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1258.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1257.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1259.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1255.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1254.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1253.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1252.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:23 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1251.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:23 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1250.mgmt.eqiad.wmnet with reboot policy FORCED
- 20:19 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:17 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 20:16 cjming@deploy1003: cjming, gergesshamon: Backport for [arwiki] Set noindex for namespace user (T371470) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:14 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 20:14 cjming@deploy1003: Started scap sync-world: Backport for [arwiki] Set noindex for namespace user (T371470)
- 20:12 cjming@deploy1003: Finished scap: Backport for [wmf-config] Remove trailing slash in SSO domain (duration: 08m 04s)
- 20:09 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 20:07 cjming@deploy1003: cjming, d3r1ck01: Continuing with sync
- 20:06 cjming@deploy1003: cjming, d3r1ck01: Backport for [wmf-config] Remove trailing slash in SSO domain synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:06 cstone: payments-wiki upgraded from c4c43c74 to e8d1c5ad
- 20:04 cjming@deploy1003: Started scap sync-world: Backport for [wmf-config] Remove trailing slash in SSO domain
- 20:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: old netbox
- 20:02 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: old netbox
- 19:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
- 19:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
- 19:23 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:20 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 19:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 19:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 19:13 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@ea93090]: deploy latest DAGS to analyics Airflow instance. (duration: 01m 30s)
- 19:11 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@ea93090]: deploy latest DAGS to analyics Airflow instance.
- 18:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
- 18:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
- 18:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
- 18:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
- 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
- 18:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
- 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:24 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 18:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.16 refs T366961
- 18:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 18:09 brennen: 1.43.0-wmf.16 train (T366961): no current blockers, logs clean, rolling to group1.
- 17:52 ejegg: payments-wiki upgraded from 91624a2e to c4c43c74
- 17:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T367856)', diff saved to https://phabricator.wikimedia.org/P67171 and previous config saved to /var/cache/conftool/dbconfig/20240731-171255-marostegui.json
- 17:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
- 17:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
- 17:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367856)', diff saved to https://phabricator.wikimedia.org/P67170 and previous config saved to /var/cache/conftool/dbconfig/20240731-171233-marostegui.json
- 16:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67169 and previous config saved to /var/cache/conftool/dbconfig/20240731-165726-marostegui.json
- 16:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67168 and previous config saved to /var/cache/conftool/dbconfig/20240731-164219-marostegui.json
- 16:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367856)', diff saved to https://phabricator.wikimedia.org/P67167 and previous config saved to /var/cache/conftool/dbconfig/20240731-162712-marostegui.json
- 16:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2228.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 16:08 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2228.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 16:08 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2227.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 16:07 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:04 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 15:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2227.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 15:56 ayounsi@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 15:55 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 15:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67166 and previous config saved to /var/cache/conftool/dbconfig/20240731-154912-root.json
- 15:40 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2226.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 15:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67165 and previous config saved to /var/cache/conftool/dbconfig/20240731-153407-root.json
- 15:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: CR1058609 - ayounsi@cumin1002
- 15:30 jgiannelos@deploy1003: Finished deploy [restbase/deploy@59a40a0]: (no justification provided) (duration: 19m 22s)
- 15:28 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: CR1058609 - ayounsi@cumin1002
- 15:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2226.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 15:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2225.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 15:19 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2225.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67164 and previous config saved to /var/cache/conftool/dbconfig/20240731-151901-root.json
- 15:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2224.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 15:11 jgiannelos@deploy1003: Started deploy [restbase/deploy@59a40a0]: (no justification provided)
- 15:04 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2224.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 15:03 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67163 and previous config saved to /var/cache/conftool/dbconfig/20240731-150356-root.json
- 14:48 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67162 and previous config saved to /var/cache/conftool/dbconfig/20240731-144850-root.json
- 14:45 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2223.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 5%: Repooling', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240731-143340-root.json
- 14:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2223.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 14:21 sukhe: [done] upgrade cp4044 to ATS 9.2.5: T339134
- 14:21 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4044*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-
- 14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2148', diff saved to https://phabricator.wikimedia.org/P67160 and previous config saved to /var/cache/conftool/dbconfig/20240731-141959-marostegui.json
- 14:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 14:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 14:17 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4044*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
- 13:54 Lucas_WMDE: UTC afternoon backport+config window done
- 13:53 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for EventStreamConfig - fix for private wiki streams (T346046 T371433) (duration: 11m 31s)
- 13:49 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, otto: Continuing with sync
- 13:49 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s6
- 13:46 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:45 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:44 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, otto: Backport for EventStreamConfig - fix for private wiki streams (T346046 T371433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:42 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for EventStreamConfig - fix for private wiki streams (T346046 T371433)
- 13:40 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=recdns [reason: [done] pdns-rec upgrade]
- 13:39 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org,service=recdns [reason: pdns-rec upgrade]
- 13:39 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for TranslatablePage: Store source page ids as string in WAN cache (T366455), TranslatablePage: Store source page ids as string in WAN cache (T366455) (duration: 12m 34s)
- 13:39 sukhe: upgrade pdns-recursor to 4.8.8 from from 4.8.7 on dns6001
- 13:34 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, abi: Continuing with sync
- 13:28 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, abi: Backport for TranslatablePage: Store source page ids as string in WAN cache (T366455), TranslatablePage: Store source page ids as string in WAN cache (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:27 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 13:26 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for TranslatablePage: Store source page ids as string in WAN cache (T366455), TranslatablePage: Store source page ids as string in WAN cache (T366455)
- 13:25 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for Fix tracking parameter casing (T370045) (duration: 12m 30s)
- 13:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.7.0 - ayounsi@cumin1002
- 13:24 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 13:21 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 13:20 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, joelyrookewmde: Continuing with sync
- 13:19 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.7.0 - ayounsi@cumin1002
- 13:18 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 13:16 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, joelyrookewmde: Backport for Fix tracking parameter casing (T370045) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:13 fabfur: running `sudo cumin -b 1 -s300 A:cp-ulsfo 'depool-cdn && sleep 30 && enable-puppet "T370741" && run-puppet-agent && pool-cdn'` (T370741)
- 13:12 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for Fix tracking parameter casing (T370045)
- 12:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet [reason: pooling after cookbook depooled as puppet was disabled]
- 12:57 elukey: update debmonitor-server and python3-debmonitor to bookworm-wikimedia - T368744
- 12:54 sukhe@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=1) Rolling upgrade/restart of Apache Traffic Server on P{cp4044*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-
- 12:53 sukhe: upgrade cp4044 to ATS 9.2.5: T339134
- 12:53 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4044*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
- 12:50 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
- 12:50 fabfur: repool cp4037, haproxy configuration modified to exclude benthos logging (T370741)
- 12:46 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 12:44 klausman@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 12:39 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
- 12:39 fabfur: temporary depooling cp4037 to test remove all Benthos resources (T370741)
- 12:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.8 to future netbox prod - ayounsi@cumin1002 - T336275
- 12:33 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 12:30 fabfur: temporary disabling puppet on cp-ulsfo to test remove benthos from cp4037 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1057823) (T370741)
- 12:25 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.8 to future netbox prod - ayounsi@cumin1002 - T336275
- 12:22 klausman@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 12:12 dreamyjazz@deploy1003: Finished scap: Backport for Grant checkuser-temporary-account-no-preference to suppress group (T371364) (duration: 08m 57s)
- 12:11 Dreamy_Jazz: Running `mwscript extensions/MediaModeration/maintenance/updateMetrics.php --wiki=commonswiki --verbose
- 12:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67159 and previous config saved to /var/cache/conftool/dbconfig/20240731-120844-root.json
- 12:07 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
- 12:07 dreamyjazz@deploy1003: dreamyjazz: Backport for Grant checkuser-temporary-account-no-preference to suppress group (T371364) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:06 akosiaris@cumin1002: conftool action : set/pooled=yes; selector: name=parse2001.codfw.wmnet
- 12:06 akosiaris@cumin1002: conftool action : set/weight=10; selector: name=parse2001.codfw.wmnet
- 12:03 dreamyjazz@deploy1003: Started scap sync-world: Backport for Grant checkuser-temporary-account-no-preference to suppress group (T371364)
- 11:55 klausman@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67158 and previous config saved to /var/cache/conftool/dbconfig/20240731-115338-root.json
- 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67156 and previous config saved to /var/cache/conftool/dbconfig/20240731-113833-root.json
- 11:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
- 11:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
- 11:25 akosiaris@cumin1002: conftool action : set/pooled=yes; selector: name=parse1001.eqiad.wmnet
- 11:25 akosiaris@cumin1002: conftool action : set/weight=10; selector: name=parse1001.eqiad.wmnet
- 11:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67155 and previous config saved to /var/cache/conftool/dbconfig/20240731-112327-root.json
- 11:11 urbanecm@deploy1003: Finished scap: Backport for EventStreamConfig: Re-enable mediawiki_eventbus on private wikis (T371433) (duration: 08m 02s)
- 11:11 claime: Removing /var/lib/puppet/server/ssl/ca/signed/docker-registry.discovery.wmnet.pem on puppetmaster1001
- 11:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67154 and previous config saved to /var/cache/conftool/dbconfig/20240731-110822-root.json
- 11:07 klausman@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 11:07 urbanecm@deploy1003: urbanecm: Continuing with sync
- 11:05 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse2001.codfw.wmnet with OS bullseye
- 11:05 urbanecm@deploy1003: urbanecm: Backport for EventStreamConfig: Re-enable mediawiki_eventbus on private wikis (T371433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:03 urbanecm@deploy1003: Started scap sync-world: Backport for EventStreamConfig: Re-enable mediawiki_eventbus on private wikis (T371433)
- 11:01 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1001.eqiad.wmnet with OS bullseye
- 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67153 and previous config saved to /var/cache/conftool/dbconfig/20240731-105317-root.json
- 10:46 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: host reimage
- 10:46 dreamyjazz@deploy1003: Finished scap: Backport for Unblock CI (T371324), Unblock CI (T371324) (duration: 07m 29s)
- 10:43 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: host reimage
- 10:42 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1001.eqiad.wmnet with reason: host reimage
- 10:41 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
- 10:41 dreamyjazz@deploy1003: dreamyjazz: Backport for Unblock CI (T371324), Unblock CI (T371324) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:39 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1001.eqiad.wmnet with reason: host reimage
- 10:39 dreamyjazz@deploy1003: Started scap sync-world: Backport for Unblock CI (T371324), Unblock CI (T371324)
- 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67152 and previous config saved to /var/cache/conftool/dbconfig/20240731-103811-root.json
- 10:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2218 T371462', diff saved to https://phabricator.wikimedia.org/P67151 and previous config saved to /var/cache/conftool/dbconfig/20240731-103704-marostegui.json
- 10:35 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2220 to s7 primary T371462', diff saved to https://phabricator.wikimedia.org/P67150 and previous config saved to /var/cache/conftool/dbconfig/20240731-103513-root.json
- 10:33 marostegui: Starting s7 codfw failover from db2218 to db2220 - T371462
- 10:26 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host parse2001.codfw.wmnet with OS bullseye
- 10:25 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host parse1001.eqiad.wmnet with OS bullseye
- 10:18 akosiaris: revoke docker-registry.discovery.wmnet old certificate from Puppet CA that would expire in a few days. It hasn't been in use since https://gerrit.wikimedia.org/r/c/operations/puppet/+/1018251
- 10:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T371462
- 10:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T371462
- 10:14 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@6ef5a7a]: (no justification provided) (duration: 00m 30s)
- 10:13 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@6ef5a7a]: (no justification provided)
- 09:56 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2220 from API/vslow/dump T371462', diff saved to https://phabricator.wikimedia.org/P67149 and previous config saved to /var/cache/conftool/dbconfig/20240731-095640-root.json
- 09:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T371462
- 09:56 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2220 with weight 0 T371462', diff saved to https://phabricator.wikimedia.org/P67148 and previous config saved to /var/cache/conftool/dbconfig/20240731-095609-root.json
- 09:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T371462
- 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repool db2220', diff saved to https://phabricator.wikimedia.org/P67147 and previous config saved to /var/cache/conftool/dbconfig/20240731-095545-marostegui.json
- 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67146 and previous config saved to /var/cache/conftool/dbconfig/20240731-095200-root.json
- 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67145 and previous config saved to /var/cache/conftool/dbconfig/20240731-095050-root.json
- 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67144 and previous config saved to /var/cache/conftool/dbconfig/20240731-093654-root.json
- 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67143 and previous config saved to /var/cache/conftool/dbconfig/20240731-093545-root.json
- 09:25 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s4
- 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67142 and previous config saved to /var/cache/conftool/dbconfig/20240731-092149-root.json
- 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67141 and previous config saved to /var/cache/conftool/dbconfig/20240731-092039-root.json
- 09:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 09:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Move db2121 to vslow T371361', diff saved to https://phabricator.wikimedia.org/P67140 and previous config saved to /var/cache/conftool/dbconfig/20240731-091706-root.json
- 09:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2220 T371361', diff saved to https://phabricator.wikimedia.org/P67139 and previous config saved to /var/cache/conftool/dbconfig/20240731-091450-root.json
- 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67138 and previous config saved to /var/cache/conftool/dbconfig/20240731-090643-root.json
- 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67137 and previous config saved to /var/cache/conftool/dbconfig/20240731-085138-root.json
- 08:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67136 and previous config saved to /var/cache/conftool/dbconfig/20240731-084705-root.json
- 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67135 and previous config saved to /var/cache/conftool/dbconfig/20240731-083633-root.json
- 08:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67134 and previous config saved to /var/cache/conftool/dbconfig/20240731-083159-root.json
- 08:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67133 and previous config saved to /var/cache/conftool/dbconfig/20240731-082127-root.json
- 08:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2205 T371455', diff saved to https://phabricator.wikimedia.org/P67132 and previous config saved to /var/cache/conftool/dbconfig/20240731-081801-root.json
- 08:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67131 and previous config saved to /var/cache/conftool/dbconfig/20240731-081654-root.json
- 08:16 marostegui: Starting s3 codfw failover from db2205 to db2209 - T371455
- 08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Switchover s3
- 08:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Switchover s3
- 08:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67130 and previous config saved to /var/cache/conftool/dbconfig/20240731-080148-root.json
- 08:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67129 and previous config saved to /var/cache/conftool/dbconfig/20240731-080017-root.json
- 07:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2222.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 07:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2222.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 07:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67128 and previous config saved to /var/cache/conftool/dbconfig/20240731-074643-root.json
- 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67127 and previous config saved to /var/cache/conftool/dbconfig/20240731-074512-root.json
- 07:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 64049
- 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'clear' for AS: 64049
- 07:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2221.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67126 and previous config saved to /var/cache/conftool/dbconfig/20240731-073006-root.json
- 07:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2221.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 07:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T371455
- 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2209 with weight 0 T371455', diff saved to https://phabricator.wikimedia.org/P67125 and previous config saved to /var/cache/conftool/dbconfig/20240731-071645-root.json
- 07:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T371455
- 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67124 and previous config saved to /var/cache/conftool/dbconfig/20240731-071500-root.json
- 07:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1179.mgmt.eqiad.wmnet with reboot policy GRACEFUL
- 07:01 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db1179.mgmt.eqiad.wmnet with reboot policy GRACEFUL
- 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67123 and previous config saved to /var/cache/conftool/dbconfig/20240731-065955-root.json
- 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67122 and previous config saved to /var/cache/conftool/dbconfig/20240731-065341-root.json
- 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67121 and previous config saved to /var/cache/conftool/dbconfig/20240731-065320-root.json
- 06:50 slyngs: Upgrading CAS to version 7.0
- 06:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1179.eqiad.wmnet with reason: Maintenance
- 06:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1179.eqiad.wmnet with reason: Maintenance
- 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1179 T371132', diff saved to https://phabricator.wikimedia.org/P67120 and previous config saved to /var/cache/conftool/dbconfig/20240731-064752-root.json
- 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67119 and previous config saved to /var/cache/conftool/dbconfig/20240731-064449-root.json
- 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67118 and previous config saved to /var/cache/conftool/dbconfig/20240731-063835-root.json
- 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67117 and previous config saved to /var/cache/conftool/dbconfig/20240731-063814-root.json
- 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67116 and previous config saved to /var/cache/conftool/dbconfig/20240731-062330-root.json
- 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67115 and previous config saved to /var/cache/conftool/dbconfig/20240731-062308-root.json
- 05:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67112 and previous config saved to /var/cache/conftool/dbconfig/20240731-055645-root.json
- 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67111 and previous config saved to /var/cache/conftool/dbconfig/20240731-055319-root.json
- 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67110 and previous config saved to /var/cache/conftool/dbconfig/20240731-055256-root.json
- 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'Make db2127 vslow and remove it as candidate master T371361', diff saved to https://phabricator.wikimedia.org/P67109 and previous config saved to /var/cache/conftool/dbconfig/20240731-055004-marostegui.json
- 05:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2209.codfw.wmnet with reason: Change binlog format
- 05:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2209.codfw.wmnet with reason: Change binlog format
- 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2209 T371361', diff saved to https://phabricator.wikimedia.org/P67108 and previous config saved to /var/cache/conftool/dbconfig/20240731-054653-root.json
- 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T367856)', diff saved to https://phabricator.wikimedia.org/P67107 and previous config saved to /var/cache/conftool/dbconfig/20240731-054414-marostegui.json
- 05:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 05:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67106 and previous config saved to /var/cache/conftool/dbconfig/20240731-054352-marostegui.json
- 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67105 and previous config saved to /var/cache/conftool/dbconfig/20240731-054140-root.json
- 05:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67104 and previous config saved to /var/cache/conftool/dbconfig/20240731-053813-root.json
- 05:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P67103 and previous config saved to /var/cache/conftool/dbconfig/20240731-052845-marostegui.json
- 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67102 and previous config saved to /var/cache/conftool/dbconfig/20240731-052634-root.json
- 05:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67101 and previous config saved to /var/cache/conftool/dbconfig/20240731-052308-root.json
- 05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1209 T371368', diff saved to https://phabricator.wikimedia.org/P67100 and previous config saved to /var/cache/conftool/dbconfig/20240731-052216-marostegui.json
- 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1193 to s8 primary and set section read-write T371368', diff saved to https://phabricator.wikimedia.org/P67099 and previous config saved to /var/cache/conftool/dbconfig/20240731-052114-root.json
- 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T371368', diff saved to https://phabricator.wikimedia.org/P67098 and previous config saved to /var/cache/conftool/dbconfig/20240731-052036-root.json
- 05:20 marostegui: Starting s8 eqiad failover from db1209 to db1193 - T371368
- 05:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P67097 and previous config saved to /var/cache/conftool/dbconfig/20240731-051339-marostegui.json
- 05:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67096 and previous config saved to /var/cache/conftool/dbconfig/20240731-051129-root.json
- 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67095 and previous config saved to /var/cache/conftool/dbconfig/20240731-045832-marostegui.json
- 04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1193 from API/vslow/dump T371368', diff saved to https://phabricator.wikimedia.org/P67094 and previous config saved to /var/cache/conftool/dbconfig/20240731-045649-root.json
- 04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1193 with weight 0 T371368', diff saved to https://phabricator.wikimedia.org/P67093 and previous config saved to /var/cache/conftool/dbconfig/20240731-045631-root.json
- 04:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67092 and previous config saved to /var/cache/conftool/dbconfig/20240731-045623-root.json
- 04:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T371368
- 04:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T371368
- 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1173 T371365', diff saved to https://phabricator.wikimedia.org/P67091 and previous config saved to /var/cache/conftool/dbconfig/20240731-045158-marostegui.json
- 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T371365', diff saved to https://phabricator.wikimedia.org/P67089 and previous config saved to /var/cache/conftool/dbconfig/20240731-044954-root.json
- 04:49 marostegui: Starting s6 eqiad failover from db1173 to db1201 - T371365
- 04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1201 from API/vslow/dump T371365', diff saved to https://phabricator.wikimedia.org/P67088 and previous config saved to /var/cache/conftool/dbconfig/20240731-043528-marostegui.json
- 04:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s6 T371365
- 04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1201 with weight 0 T371365', diff saved to https://phabricator.wikimedia.org/P67087 and previous config saved to /var/cache/conftool/dbconfig/20240731-043459-marostegui.json
- 04:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s6 T371365
- 02:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T367856)', diff saved to https://phabricator.wikimedia.org/P67086 and previous config saved to /var/cache/conftool/dbconfig/20240731-022920-marostegui.json
- 02:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
- 02:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
- 00:55 eileen: civicrm upgraded from 4d3d2720 to d1f1d7bd
- 00:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1248.eqiad.wmnet with OS bullseye
- 00:03 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 00:02 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
2024-07-30
- 23:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1249.eqiad.wmnet with OS bullseye
- 23:53 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:52 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:50 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1247.eqiad.wmnet with OS bullseye
- 23:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1246.eqiad.wmnet with OS bullseye
- 23:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1248.eqiad.wmnet with reason: host reimage
- 23:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1244.eqiad.wmnet with OS bullseye
- 23:44 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:42 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1248.eqiad.wmnet with reason: host reimage
- 23:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1249.eqiad.wmnet with reason: host reimage
- 23:34 tzatziki: removing 1 file for legal compliance
- 23:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1249.eqiad.wmnet with reason: host reimage
- 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1247.eqiad.wmnet with reason: host reimage
- 23:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1246.eqiad.wmnet with reason: host reimage
- 23:26 tzatziki: removing 1 file for legal compliance
- 23:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1248.eqiad.wmnet with OS bullseye
- 23:25 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1247.eqiad.wmnet with reason: host reimage
- 23:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1244.eqiad.wmnet with reason: host reimage
- 23:23 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1246.eqiad.wmnet with reason: host reimage
- 23:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1244.eqiad.wmnet with reason: host reimage
- 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:17 eileen: civicrm upgraded from 3db16342 to 4d3d2720
- 23:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1241.eqiad.wmnet with OS bullseye
- 23:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1249.eqiad.wmnet with OS bullseye
- 23:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1249.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1245.eqiad.wmnet with OS bullseye
- 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:11 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1247.eqiad.wmnet with OS bullseye
- 23:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1243.eqiad.wmnet with OS bullseye
- 23:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:08 tzatziki: removing 2 files for legal compliance
- 23:07 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:06 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1246.eqiad.wmnet with OS bullseye
- 23:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1242.eqiad.wmnet with OS bullseye
- 23:06 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:06 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1244.eqiad.wmnet with OS bullseye
- 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
- 23:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 22:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1241.eqiad.wmnet with reason: host reimage
- 22:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1245.eqiad.wmnet with reason: host reimage
- 22:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1243.eqiad.wmnet with reason: host reimage
- 22:49 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1245.eqiad.wmnet with reason: host reimage
- 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1242.eqiad.wmnet with reason: host reimage
- 22:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1241.eqiad.wmnet with reason: host reimage
- 22:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1243.eqiad.wmnet with reason: host reimage
- 22:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1242.eqiad.wmnet with reason: host reimage
- 22:41 eileen: config revision changed from d2484ce6 to e8cc0ed6
- 22:35 eileen: config revision changed from 10ead940 to d2484ce6
- 22:34 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:32 eileen: civicrm upgraded from 5ac353bd to 3db16342
- 22:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1245.eqiad.wmnet with OS bullseye
- 22:28 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1242.eqiad.wmnet with OS bullseye
- 22:28 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1243.eqiad.wmnet with OS bullseye
- 22:28 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1241.eqiad.wmnet with OS bullseye
- 21:53 urbanecm@deploy1003: Finished scap: Backport for Fix resource response to use JSON content type header (T263870), Fix resource response to use JSON content type header (T263870) (duration: 08m 09s)
- 21:45 urbanecm@deploy1003: Started scap sync-world: Backport for Fix resource response to use JSON content type header (T263870), Fix resource response to use JSON content type header (T263870)
- 21:23 cjming@deploy1003: Finished scap: Backport for Deploy MetricsPlatform to beta cluster (T366234) (duration: 11m 41s)
- 21:18 cjming@deploy1003: cjming: Continuing with sync
- 21:14 cjming@deploy1003: cjming: Backport for Deploy MetricsPlatform to beta cluster (T366234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:11 cjming@deploy1003: Started scap sync-world: Backport for Deploy MetricsPlatform to beta cluster (T366234)
- 21:06 cjming@deploy1003: Finished scap: Backport for Enable Parsoid Read Views on {en,he}wikivoyage (T365367) (duration: 13m 18s)
- 21:01 cjming@deploy1003: cjming, cscott: Continuing with sync
- 20:58 cjming@deploy1003: cjming, cscott: Backport for Enable Parsoid Read Views on {en,he}wikivoyage (T365367) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:52 cjming@deploy1003: Started scap sync-world: Backport for Enable Parsoid Read Views on {en,he}wikivoyage (T365367)
- 20:48 cjming@deploy1003: Finished scap: Backport for Add NetworkSession extension (T355267) (duration: 45m 08s)
- 20:40 cjming@deploy1003: ebernhardson, cjming: Continuing with sync
- 20:38 cjming@deploy1003: ebernhardson, cjming: Backport for Add NetworkSession extension (T355267) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:16 godog: bounce benthos@webrequest_live.service on centrallog for excessive lag
- 20:06 topranks: re-enable BGP to lvs2011 on lsw1-a2-codfw (restores as primary for traffic) T370891
- 20:03 cjming@deploy1003: Started scap sync-world: Backport for Add NetworkSession extension (T355267)
- 19:58 topranks: rebooting lvs2011 to force new network config T370891
- 19:37 eileen: civicrm upgraded from 5e72c64f to 5ac353bd
- 19:29 topranks: disable BGP to lvs2011 on lsw1-a2-codfw (moves traffic to lvs2014) in advnace of vlan change T370891
- 19:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2011.codfw.wmnet with reason: reconfigure vlans on lvs2011
- 19:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2011.codfw.wmnet with reason: reconfigure vlans on lvs2011
- 19:28 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lsw1-a2-codfw.mgmt with reason: reconfigure vlans on lvs2011
- 19:28 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lsw1-a2-codfw.mgmt with reason: reconfigure vlans on lvs2011
- 19:21 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: 0.3.145 (duration: 07m 59s)
- 19:13 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: 0.3.145
- 18:53 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.16 refs T366961
- 18:39 topranks: re-enabling BGP to lvs2012 from lsw1-b2-codfw T370862
- 18:33 brennen: 1.43.0-wmf.16 train (T366961): blockers resolved, rolling to group0
- 18:31 brennen@deploy1003: Finished scap: Backport for Bump wikimedia/parsoid to 0.20.0-a16 (T371376 T371126) (duration: 08m 54s)
- 18:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
- 18:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
- 18:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
- 18:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
- 18:27 topranks: rebooting lvs2012 (again) to force new network config T370862
- 18:26 brennen@deploy1003: brennen, cscott: Continuing with sync
- 18:25 brennen@deploy1003: brennen, cscott: Backport for Bump wikimedia/parsoid to 0.20.0-a16 (T371376 T371126) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 18:23 brennen@deploy1003: Started scap sync-world: Backport for Bump wikimedia/parsoid to 0.20.0-a16 (T371376 T371126)
- 18:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repool db1174', diff saved to https://phabricator.wikimedia.org/P67083 and previous config saved to /var/cache/conftool/dbconfig/20240730-181331-ladsgroup.json
- 18:13 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
- 18:13 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
- 18:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P67082 and previous config saved to /var/cache/conftool/dbconfig/20240730-181242-ladsgroup.json
- 18:05 Dreamy_Jazz: Stopped MediaModeration scanning script on ruwiki
- 17:56 topranks: rebooting lvs2012 to force new network config T370862
- 17:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
- 17:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
- 17:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
- 17:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
- 17:51 hashar@deploy1003: Finished deploy [gerrit/gerrit@40e4e0f]: wm-pcc: separate v5 and v7 in two runs - T371407 (duration: 00m 09s)
- 17:50 hashar@deploy1003: Started deploy [gerrit/gerrit@40e4e0f]: wm-pcc: separate v5 and v7 in two runs - T371407
- 17:20 topranks: disable BGP to PyBal on lvs2012 from lsw1-b2-codfw (moving traffic to lvs2014)
- 17:18 otto@deploy1003: Finished scap: mediawiki.org - Apache Rewrite /beacon/event -> EventLogging rest handler - T353817 (duration: 05m 56s)
- 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
- 17:18 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
- 17:17 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
- 17:17 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
- 17:13 otto@deploy1003: Started scap sync-world: mediawiki.org - Apache Rewrite /beacon/event -> EventLogging rest handler - T353817
- 17:12 topranks: adding row C/D vlans to lsw1-b2-codfw and adding on trunk to lvs2012 T370862
- 16:09 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 16:08 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 16:07 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 16:07 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 16:06 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
- 16:06 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
- 15:56 akosiaris: restart pybal for parsoid-php removal on lvs1019, lvs2013 T359387
- 15:50 jnuche@deploy1003: Installation of scap version "latest" completed for 213 hosts
- 15:49 jnuche@deploy1003: Installing scap version "latest" for 213 hosts
- 15:48 jnuche@deploy1003: Installing scap version "latest" for 214 hosts
- 15:47 jnuche@deploy1003: Installation of scap version "latest" completed for 2 hosts
- 15:47 jnuche@deploy1003: Installing scap version "latest" for 2 hosts
- 15:20 akosiaris: restart pybal for parsoid-php removal on lvs1020, lvs2014 T359387
- 15:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
- 15:04 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
- 15:03 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
- 15:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
- 14:59 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2017.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 14:58 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
- 14:51 sukhe: [dns7001] upgrade anycast-healthchecker to 0.9.8-1+wmf12u2: T370068
- 14:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: upgrading anycast-hc: T370068]
- 14:48 mforns@deploy1003: Finished deploy [airflow-dags/analytics@e1fdaac]: (no justification provided) (duration: 00m 26s)
- 14:47 mforns@deploy1003: Started deploy [airflow-dags/analytics@e1fdaac]: (no justification provided)
- 14:47 mforns@deploy1003: Finished deploy [airflow-dags/analytics@e1fdaac]: (no justification provided) (duration: 00m 15s)
- 14:47 mforns@deploy1003: Started deploy [airflow-dags/analytics@e1fdaac]: (no justification provided)
- 14:45 urbanecm: mwmaint1002: mwscript extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=enwiki --all --verbose (T370802; log kept at mwmaint1002:/home/urbanecm/revalidateLinkRecommendations-T370802-july-2024.log)
- 14:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host pc2017.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 14:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1017.mgmt.eqiad.wmnet with reboot policy GRACEFUL
- 14:36 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
- 14:36 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
- 14:35 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
- 14:35 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
- 14:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host pc1017.mgmt.eqiad.wmnet with reboot policy GRACEFUL
- 14:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1247.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1246.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:26 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 14:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1243.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:25 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1249.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:22 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1242.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:21 jnuche@deploy1003: Installation of scap version "latest" completed for 2 hosts
- 14:21 jnuche@deploy1003: Installing scap version "latest" for 2 hosts
- 14:20 jnuche@deploy1003: Installing scap version "latest" for 3 hosts
- 14:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1247.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:09 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1246.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:07 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=97)
- 14:07 jclark@cumin1002: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1241-9 - jclark@cumin1002"
- 14:06 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1243.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:06 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1241-9 - jclark@cumin1002"
- 14:02 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 13:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:58 marostegui: Remove clouddb1021 from zarcillo database T368518
- 13:57 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:57 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:56 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1241-9 - jclark@cumin1002"
- 13:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1241-9 - jclark@cumin1002"
- 13:55 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1242.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:49 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 13:48 urbanecm@deploy1003: Finished scap: Backport for [eswiki] Enable Visual Editor in namespace Project (T370158), [euwiki] Enable Visual Editor in namespaces Project and Wikiproiektu (T368632), Enable VisualEditor at Spanish Wikiquote (T355336) (duration: 16m 12s)
- 13:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67079 and previous config saved to /var/cache/conftool/dbconfig/20240730-134352-root.json
- 13:43 urbanecm@deploy1003: urbanecm, gergesshamon: Continuing with sync
- 13:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1240.eqiad.wmnet with OS bullseye
- 13:39 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 13:34 urbanecm@deploy1003: urbanecm, gergesshamon: Backport for [eswiki] Enable Visual Editor in namespace Project (T370158), [euwiki] Enable Visual Editor in namespaces Project and Wikiproiektu (T368632), Enable VisualEditor at Spanish Wikiquote (T355336) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 13:32 urbanecm@deploy1003: Started scap sync-world: Backport for [eswiki] Enable Visual Editor in namespace Project (T370158), [euwiki] Enable Visual Editor in namespaces Project and Wikiproiektu (T368632), Enable VisualEditor at Spanish Wikiquote (T355336)
- 13:31 urbanecm@deploy1003: Finished scap: Backport for Update nlwiki AbuseFilter config per consensus (T370605) (duration: 09m 35s)
- 13:30 elukey: deprecate the sre-admins posix group fleetwide (replaced by ops-limited) - T360356
- 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67078 and previous config saved to /var/cache/conftool/dbconfig/20240730-132846-root.json
- 13:26 urbanecm@deploy1003: xxblackburnxx, urbanecm: Continuing with sync
- 13:25 urbanecm@deploy1003: xxblackburnxx, urbanecm: Backport for Update nlwiki AbuseFilter config per consensus (T370605) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:22 urbanecm@deploy1003: Started scap sync-world: Backport for Update nlwiki AbuseFilter config per consensus (T370605)
- 13:21 urbanecm@deploy1003: Finished scap: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316) (duration: 22m 31s)
- 13:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1240.eqiad.wmnet with reason: host reimage
- 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67077 and previous config saved to /var/cache/conftool/dbconfig/20240730-131341-root.json
- 13:13 Dreamy_Jazz: ruwiki scan is set to time out after 5 hours
- 13:13 Dreamy_Jazz: Started MediaModeration scan on ruwiki to catch-up on monthly limit
- 13:12 Dreamy_Jazz: Started MediaModeration script after it crashed - https://wikitech.wikimedia.org/wiki/MediaModeration
- 13:12 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1240.eqiad.wmnet with reason: host reimage
- 13:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67076 and previous config saved to /var/cache/conftool/dbconfig/20240730-131223-root.json
- 12:58 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316)
- 12:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67074 and previous config saved to /var/cache/conftool/dbconfig/20240730-125836-root.json
- 12:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67073 and previous config saved to /var/cache/conftool/dbconfig/20240730-125717-root.json
- 12:56 jnuche@deploy1003: Installation of scap version "latest" completed for 2 hosts
- 12:56 jnuche@deploy1003: Installing scap version "latest" for 2 hosts
- 12:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1240.eqiad.wmnet with OS bullseye
- 12:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1240.eqiad.wmnet with OS bullseye
- 12:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67072 and previous config saved to /var/cache/conftool/dbconfig/20240730-124330-root.json
- 12:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67071 and previous config saved to /var/cache/conftool/dbconfig/20240730-124212-root.json
- 12:41 urbanecm: mwdebug1001: scap pull to overcome scap issues
- 12:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67070 and previous config saved to /var/cache/conftool/dbconfig/20240730-122825-root.json
- 12:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67069 and previous config saved to /var/cache/conftool/dbconfig/20240730-122706-root.json
- 12:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1193.eqiad.wmnet with reason: Change binlog format
- 12:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1193.eqiad.wmnet with reason: Change binlog format
- 12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1193 T371361', diff saved to https://phabricator.wikimedia.org/P67068 and previous config saved to /var/cache/conftool/dbconfig/20240730-122243-root.json
- 12:21 JustHannah: T371253 Ran mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=dewiktionary --logwiki=metawiki 'Gregorjohannes' 'Klegul'
- 12:17 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316)
- 12:16 urbanecm@deploy1003: sync-world aborted: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316) (duration: 14m 10s)
- 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1231 T371361', diff saved to https://phabricator.wikimedia.org/P67066 and previous config saved to /var/cache/conftool/dbconfig/20240730-121500-root.json
- 12:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67065 and previous config saved to /var/cache/conftool/dbconfig/20240730-121201-root.json
- 12:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1201.eqiad.wmnet with reason: Change binlog format
- 12:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1201.eqiad.wmnet with reason: Change binlog format
- 12:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1201 T371361', diff saved to https://phabricator.wikimedia.org/P67064 and previous config saved to /var/cache/conftool/dbconfig/20240730-120805-root.json
- 12:02 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316)
- 11:54 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 11:52 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 11:47 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
- 11:47 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
- 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67062 and previous config saved to /var/cache/conftool/dbconfig/20240730-111622-root.json
- 11:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67061 and previous config saved to /var/cache/conftool/dbconfig/20240730-111331-root.json
- 11:10 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 11:03 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 11:03 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 11:02 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 11:02 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 11:02 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 11:02 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 11:02 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 11:01 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 11:01 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:01 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 11:01 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67060 and previous config saved to /var/cache/conftool/dbconfig/20240730-110117-root.json
- 11:00 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:00 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 11:00 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 10:59 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 10:58 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 10:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67059 and previous config saved to /var/cache/conftool/dbconfig/20240730-105825-root.json
- 10:56 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 10:54 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 10:54 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 10:54 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 10:53 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 10:51 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 10:50 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2227.codfw.wmnet with OS bookworm
- 10:50 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - volans@cumin2002"
- 10:49 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - volans@cumin2002"
- 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67058 and previous config saved to /var/cache/conftool/dbconfig/20240730-104705-root.json
- 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67057 and previous config saved to /var/cache/conftool/dbconfig/20240730-104612-root.json
- 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67056 and previous config saved to /var/cache/conftool/dbconfig/20240730-104318-root.json
- 10:33 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 10:32 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2227.codfw.wmnet with reason: host reimage
- 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67054 and previous config saved to /var/cache/conftool/dbconfig/20240730-103200-root.json
- 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67053 and previous config saved to /var/cache/conftool/dbconfig/20240730-103106-root.json
- 10:29 volans@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2227.codfw.wmnet with reason: host reimage
- 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67052 and previous config saved to /var/cache/conftool/dbconfig/20240730-102813-root.json
- 10:21 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 10:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67051 and previous config saved to /var/cache/conftool/dbconfig/20240730-101654-root.json
- 10:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67050 and previous config saved to /var/cache/conftool/dbconfig/20240730-101600-root.json
- 10:14 volans@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
- 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67049 and previous config saved to /var/cache/conftool/dbconfig/20240730-101307-root.json
- 10:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
- 10:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
- 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67048 and previous config saved to /var/cache/conftool/dbconfig/20240730-100148-root.json
- 10:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67047 and previous config saved to /var/cache/conftool/dbconfig/20240730-100055-root.json
- 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67046 and previous config saved to /var/cache/conftool/dbconfig/20240730-095802-root.json
- 09:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67045 and previous config saved to /var/cache/conftool/dbconfig/20240730-094643-root.json
- 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67044 and previous config saved to /var/cache/conftool/dbconfig/20240730-094549-root.json
- 09:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67043 and previous config saved to /var/cache/conftool/dbconfig/20240730-094256-root.json
- 09:42 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1179.eqiad.wmnet onto db1224.eqiad.wmnet
- 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67042 and previous config saved to /var/cache/conftool/dbconfig/20240730-093138-root.json
- 09:29 marostegui: Deploy schema change on db2203 s1 codfw dbmaint T367856
- 09:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 09:26 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 09:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2203.codfw.wmnet with reason: Long schema change
- 09:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2203.codfw.wmnet with reason: Long schema change
- 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2203 T371345', diff saved to https://phabricator.wikimedia.org/P67041 and previous config saved to /var/cache/conftool/dbconfig/20240730-091925-marostegui.json
- 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2212 to s1 primary T371345', diff saved to https://phabricator.wikimedia.org/P67040 and previous config saved to /var/cache/conftool/dbconfig/20240730-091742-root.json
- 09:10 marostegui: Starting s1 codfw failover from db2203 to db2212 - T371345
- 08:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 08:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67039 and previous config saved to /var/cache/conftool/dbconfig/20240730-084525-root.json
- 08:32 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2216.codfw.wmnet onto db2212.codfw.wmnet
- 08:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67038 and previous config saved to /var/cache/conftool/dbconfig/20240730-083020-root.json
- 08:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T371345
- 08:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T371345
- 08:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67037 and previous config saved to /var/cache/conftool/dbconfig/20240730-081515-root.json
- 08:11 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy2002.codfw.wmnet with OS bullseye
- 08:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
- 08:06 marostegui: Update db1224 on zarcillo T371276
- 08:06 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1179.eqiad.wmnet onto db1224.eqiad.wmnet
- 08:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1179.eqiad.wmnet with reason: Move db1224 to x1
- 08:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1179.eqiad.wmnet with reason: Move db1224 to x1
- 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1179 T371276', diff saved to https://phabricator.wikimedia.org/P67035 and previous config saved to /var/cache/conftool/dbconfig/20240730-080538-root.json
- 08:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1224.eqiad.wmnet with reason: Move db1224 to x1
- 08:05 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
- 08:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1224.eqiad.wmnet with reason: Move db1224 to x1
- 08:03 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
- 08:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
- 08:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67034 and previous config saved to /var/cache/conftool/dbconfig/20240730-080135-root.json
- 08:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67033 and previous config saved to /var/cache/conftool/dbconfig/20240730-080010-root.json
- 07:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67032 and previous config saved to /var/cache/conftool/dbconfig/20240730-074629-root.json
- 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67031 and previous config saved to /var/cache/conftool/dbconfig/20240730-074505-root.json
- 07:33 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy2002.codfw.wmnet with reason: host reimage
- 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67030 and previous config saved to /var/cache/conftool/dbconfig/20240730-073124-root.json
- 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67029 and previous config saved to /var/cache/conftool/dbconfig/20240730-072959-root.json
- 07:28 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy2002.codfw.wmnet with reason: host reimage
- 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67028 and previous config saved to /var/cache/conftool/dbconfig/20240730-071619-root.json
- 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67027 and previous config saved to /var/cache/conftool/dbconfig/20240730-071454-root.json
- 07:14 godog: finish rolling out benthos 4.27.0-1
- 07:10 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy2002.codfw.wmnet with OS bullseye
- 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67026 and previous config saved to /var/cache/conftool/dbconfig/20240730-070114-root.json
- 06:58 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1244.eqiad.wmnet onto db1238.eqiad.wmnet
- 06:56 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2216.codfw.wmnet onto db2212.codfw.wmnet
- 06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2216', diff saved to https://phabricator.wikimedia.org/P67025 and previous config saved to /var/cache/conftool/dbconfig/20240730-064853-root.json
- 06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2212', diff saved to https://phabricator.wikimedia.org/P67024 and previous config saved to /var/cache/conftool/dbconfig/20240730-064835-root.json
- 06:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T371345
- 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2212 with weight 0 T371345', diff saved to https://phabricator.wikimedia.org/P67023 and previous config saved to /var/cache/conftool/dbconfig/20240730-064128-marostegui.json
- 06:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T371345
- 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67022 and previous config saved to /var/cache/conftool/dbconfig/20240730-052420-root.json
- 05:20 marostegui: Change candidate master in s4 eqiad (this is a NOOP) T371343
- 05:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67021 and previous config saved to /var/cache/conftool/dbconfig/20240730-050914-root.json
- 05:04 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1244.eqiad.wmnet onto db1238.eqiad.wmnet
- 04:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Recloning db1238
- 04:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Recloning db1238
- 04:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Long schema change
- 04:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Long schema change
- 04:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67020 and previous config saved to /var/cache/conftool/dbconfig/20240730-045409-root.json
- 04:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1238 T371251', diff saved to https://phabricator.wikimedia.org/P67019 and previous config saved to /var/cache/conftool/dbconfig/20240730-045336-marostegui.json
- 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1160 to s4 primary and set section read-write T371251', diff saved to https://phabricator.wikimedia.org/P67018 and previous config saved to /var/cache/conftool/dbconfig/20240730-045104-marostegui.json
- 04:50 marostegui@cumin1002: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T371251', diff saved to https://phabricator.wikimedia.org/P67017 and previous config saved to /var/cache/conftool/dbconfig/20240730-045032-root.json
- 04:50 marostegui: Starting s4 eqiad failover from db1238 to db1160 - T371251
- 04:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67016 and previous config saved to /var/cache/conftool/dbconfig/20240730-043904-root.json
- 04:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67015 and previous config saved to /var/cache/conftool/dbconfig/20240730-042755-marostegui.json
- 04:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 04:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 04:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T371251
- 04:25 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1160 with weight 0 T371251', diff saved to https://phabricator.wikimedia.org/P67014 and previous config saved to /var/cache/conftool/dbconfig/20240730-042528-root.json
- 04:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s4 T371251
- 04:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67013 and previous config saved to /var/cache/conftool/dbconfig/20240730-042358-root.json
- 04:07 mwpresync@deploy1003: Pruned MediaWiki: 1.43.0-wmf.13 (duration: 06m 51s)
- 03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.16 refs T366961
- 02:52 eileen: disabled audit modules (Adyen audit etc)
- 02:09 eileen: civicrm upgraded from 2837c4e9 to 5e72c64f
- 02:05 eileen: config revision changed from 8e2f7c03 to 10ead940
- 01:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367856)', diff saved to https://phabricator.wikimedia.org/P67011 and previous config saved to /var/cache/conftool/dbconfig/20240730-010232-marostegui.json
- 00:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P67010 and previous config saved to /var/cache/conftool/dbconfig/20240730-004725-marostegui.json
- 00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P67009 and previous config saved to /var/cache/conftool/dbconfig/20240730-003218-marostegui.json
- 00:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367856)', diff saved to https://phabricator.wikimedia.org/P67008 and previous config saved to /var/cache/conftool/dbconfig/20240730-001710-marostegui.json
2024-07-29
- 23:19 eileen: civicrm upgraded from efbb874e to 2837c4e9
- 22:19 eileen: * civicrm upgraded from 1dc4f944 to efbb874e
- 21:42 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:42 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1002"
- 21:41 dwisehaupt@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1002"
- 21:38 dwisehaupt@cumin1002: START - Cookbook sre.dns.netbox
- 21:09 cjming: end of UTC late backport window
- 21:06 cjming@deploy1003: Finished scap: Backport for Produce a limited set of event streams on private wikis (pt 2) (T346046) (duration: 10m 40s)
- 21:00 cjming@deploy1003: ebernhardson, cjming: Continuing with sync
- 21:00 cjming@deploy1003: ebernhardson, cjming: Backport for Produce a limited set of event streams on private wikis (pt 2) (T346046) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:55 cjming@deploy1003: Started scap sync-world: Backport for Produce a limited set of event streams on private wikis (pt 2) (T346046)
- 20:52 cjming@deploy1003: Finished scap: Backport for Clean up night mode exclude namespaces and allow font size on submit (T370092 T370505) (duration: 08m 18s)
- 20:48 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
- 20:48 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
- 20:46 cjming@deploy1003: cjming, jdlrobson: Continuing with sync
- 20:45 cjming@deploy1003: cjming, jdlrobson: Backport for Clean up night mode exclude namespaces and allow font size on submit (T370092 T370505) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:45 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
- 20:45 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
- 20:43 cjming@deploy1003: Started scap sync-world: Backport for Clean up night mode exclude namespaces and allow font size on submit (T370092 T370505)
- 20:42 cjming@deploy1003: Finished scap: Backport for Produce a limited set of event streams on private wikis (pt 1) (T346046) (duration: 07m 30s)
- 20:37 cjming@deploy1003: ebernhardson, cjming: Continuing with sync
- 20:36 cjming@deploy1003: ebernhardson, cjming: Backport for Produce a limited set of event streams on private wikis (pt 1) (T346046) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:34 cjming@deploy1003: Started scap sync-world: Backport for Produce a limited set of event streams on private wikis (pt 1) (T346046)
- 20:33 cjming@deploy1003: Finished scap: Backport for enwiki, commonswiki: lift IP cap for edit-a-thon (T371026) (duration: 07m 59s)
- 20:27 cjming@deploy1003: superzerocool, cjming: Continuing with sync
- 20:27 cjming@deploy1003: superzerocool, cjming: Backport for enwiki, commonswiki: lift IP cap for edit-a-thon (T371026) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:25 cjming@deploy1003: Started scap sync-world: Backport for enwiki, commonswiki: lift IP cap for edit-a-thon (T371026)
- 20:19 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
- 20:15 cjming@deploy1003: Finished scap: Backport for Increase edit count requirement for autoconfirmed on English Wikivoyage (T371186) (duration: 08m 52s)
- 20:10 cjming@deploy1003: nmw03, cjming: Continuing with sync
- 20:08 cjming@deploy1003: nmw03, cjming: Backport for Increase edit count requirement for autoconfirmed on English Wikivoyage (T371186) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:06 cjming@deploy1003: Started scap sync-world: Backport for Increase edit count requirement for autoconfirmed on English Wikivoyage (T371186)
- 18:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
- 18:58 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2227.codfw.wmnet with OS bookworm
- 17:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
- 17:51 urbanecm: mwmaint1002: kill extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php for enwiki (T370802)
- 17:50 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
- 17:26 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet [reason: testing ATS 9.2.5 upgrade]
- 17:25 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
- 17:24 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
- 17:17 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4052*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-
- 17:14 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4052*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
- 17:14 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.5-1wm2_amd64.changes T339134
- 16:47 urbanecm@deploy1003: Finished scap: Backport for Display a GlobalBlock link to stewards in Special:CheckUser (T370463 T178571), Ignore help-links with no title configured (T370941) (duration: 10m 56s)
- 16:45 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
- 16:44 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
- 16:42 urbanecm@deploy1003: dreamyjazz, migr, urbanecm: Continuing with sync
- 16:38 urbanecm@deploy1003: dreamyjazz, migr, urbanecm: Backport for Display a GlobalBlock link to stewards in Special:CheckUser (T370463 T178571), Ignore help-links with no title configured (T370941) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2003.wikimedia.org with OS bookworm
- 16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 16:36 urbanecm@deploy1003: Started scap sync-world: Backport for Display a GlobalBlock link to stewards in Special:CheckUser (T370463 T178571), Ignore help-links with no title configured (T370941)
- 16:30 Emperor: restart swift-proxy on ms-fe2011 T360913
- 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2003.wikimedia.org with reason: host reimage
- 16:17 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet [reason: testing ATS 9.2.5 upgrade]
- 16:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2003.wikimedia.org with reason: host reimage
- 16:04 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4052*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-
- 16:01 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4052*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
- 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2003.wikimedia.org with OS bookworm
- 15:56 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:56 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add public vlan for gerrit2003 - pt1979@cumin2002"
- 15:56 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.5-1wm1_amd64.changes T339134
- 15:55 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add public vlan for gerrit2003 - pt1979@cumin2002"
- 15:55 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:54 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 15:53 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 15:49 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:48 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gerrit2003.codfw.wmnet with OS bookworm
- 15:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2003.codfw.wmnet with OS bookworm
- 15:40 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gerrit2003.codfw.wmnet with OS bookworm
- 15:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2233.codfw.wmnet with OS bookworm
- 15:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 15:23 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:23 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 15:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1240.eqiad.wmnet with OS bullseye
- 15:18 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:17 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 15:16 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 15:16 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 15:14 sukhe: running authdns-update after dns2006 depool
- 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2233.codfw.wmnet with reason: host reimage
- 15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2006.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
- 15:10 sukhe: [dns2006] upgrade anycast-healthchecker to 0.9.8-1+wmf12u2: T370068
- 15:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2233.codfw.wmnet with reason: host reimage
- 15:09 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2006.wikimedia.org [reason: upgrading anycast-hc: T370068]
- 15:02 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
- 14:59 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2003.codfw.wmnet with OS bookworm
- 14:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
- 14:57 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2233.codfw.wmnet with OS bookworm
- 14:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2233.codfw.wmnet with OS bookworm
- 14:45 Lucas_WMDE: UTC afternoon backport+config window done
- 14:41 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards" (T366455) (duration: 07m 58s)
- 14:39 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:37 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 14:35 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde: Continuing with sync
- 14:35 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards" (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:34 sukhe: sudo cumin -b1 -s120 'O:wikidough' 'run-puppet-agent'
- 14:33 sukhe: A:wikidough: debdeploy upgrade anycast-hc to 0.9.8: T370068
- 14:33 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards" (T366455)
- 14:33 sukhe: A:wikidough: debdeploy upgrade anycast-hc to 0.9.8
- 14:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2229.codfw.wmnet with OS bookworm
- 14:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 14:24 herron: the grafana default datasource has been changed from graphite to thanos T269333
- 14:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 14:23 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2231.codfw.wmnet with OS bookworm
- 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 14:21 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) (duration: 19m 24s)
- 14:21 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 14:20 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 14:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 14:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 14:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2230.codfw.wmnet with OS bookworm
- 14:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 14:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 14:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2232.codfw.wmnet with OS bookworm
- 14:15 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 14:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
- 14:13 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, abi: Continuing with sync
- 14:13 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, abi: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2228.codfw.wmnet with OS bookworm
- 14:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 14:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 14:09 SandraEbele_: rerunning airflow mediawiki_history_check_denormalize dag as down stream task after rerunning mediawiki_history_denormalize dag
- 14:07 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2039.codfw.wmnet),cluster=kubernetes,service=kubesvc [reason: Pooling and uncordoning - T351074]
- 14:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 14:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2231.codfw.wmnet with reason: host reimage
- 14:02 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455)
- 14:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1240.eqiad.wmnet with OS bullseye
- 14:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
- 14:01 jnuche@deploy1003: Installation of scap version "4.94.0" completed for 210 hosts
- 14:00 jnuche@deploy1003: Installing scap version "4.94.0" for 210 hosts
- 13:59 jnuche@deploy1003: Installing scap version "4.94.0" for 211 hosts
- 13:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2229.codfw.wmnet with reason: host reimage
- 13:56 claime: homer 'cr*codfw*' commit 'T351074'
- 13:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2230.codfw.wmnet with reason: host reimage
- 13:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2232.codfw.wmnet with reason: host reimage
- 13:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2228.codfw.wmnet with reason: host reimage
- 13:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
- 13:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2231.codfw.wmnet with reason: host reimage
- 13:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2229.codfw.wmnet with reason: host reimage
- 13:48 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:48 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1240 - jclark@cumin1002"
- 13:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2232.codfw.wmnet with reason: host reimage
- 13:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2230.codfw.wmnet with reason: host reimage
- 13:47 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1240 - jclark@cumin1002"
- 13:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2228.codfw.wmnet with reason: host reimage
- 13:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
- 13:45 logmsgbot: lucaswerkmeister-wmde@deploy1003 Synchronized php-1.43.0-wmf.15/extensions/ContentTranslation/extension.json: Backport for AX: Unregister "axArticleFooterEntrypointRegistrar" hook handler (T363338) (duration: 06m 36s)
- 13:44 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 13:41 XioNoX: push new pfw policies - T371137
- 13:36 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2003.codfw.wmnet
- 13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2233.codfw.wmnet with OS bookworm
- 13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2232.codfw.wmnet with OS bookworm
- 13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2231.codfw.wmnet with OS bookworm
- 13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2230.codfw.wmnet with OS bookworm
- 13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2229.codfw.wmnet with OS bookworm
- 13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2228.codfw.wmnet with OS bookworm
- 13:33 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafkamon2003.codfw.wmnet
- 13:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS bookworm
- 13:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 13:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 13:24 logmsgbot: lucaswerkmeister-wmde@deploy1003 Synchronized wmf-config/: Backport for Enable mul language code on Wikidata (limited mode) (T330281) (duration: 06m 47s)
- 13:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2225.codfw.wmnet with OS bookworm
- 13:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage
- 13:11 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage
- 13:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2226.codfw.wmnet with OS bookworm
- 13:10 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 13:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2225.codfw.wmnet with reason: host reimage
- 13:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2225.codfw.wmnet with reason: host reimage
- 13:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 13:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2225.codfw.wmnet with OS bookworm
- 13:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2223.codfw.wmnet with OS bookworm
- 13:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 13:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 12:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db2225.codfw.wmnet with OS bookworm
- 12:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 12:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 12:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2224.codfw.wmnet with OS bookworm
- 12:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 12:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS bookworm
- 12:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 12:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS bookworm
- 12:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 12:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2039.codfw.wmnet with OS bullseye
- 12:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 12:48 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2226.codfw.wmnet with reason: host reimage
- 12:47 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 12:46 godog: upgrade and roll-restart benthos@mw_accesslog_sampler on logstash hosts
- 12:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2222.codfw.wmnet with OS bookworm
- 12:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2223.codfw.wmnet with reason: host reimage
- 12:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2225.codfw.wmnet with reason: host reimage
- 12:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2224.codfw.wmnet with reason: host reimage
- 12:35 godog: test benthos 4.27 on logstash1023
- 12:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2221.codfw.wmnet with reason: host reimage
- 12:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2225.codfw.wmnet with reason: host reimage
- 12:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2224.codfw.wmnet with reason: host reimage
- 12:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2223.codfw.wmnet with reason: host reimage
- 12:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2226.codfw.wmnet with reason: host reimage
- 12:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2039.codfw.wmnet with reason: host reimage
- 12:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2221.codfw.wmnet with reason: host reimage
- 12:27 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2039.codfw.wmnet with reason: host reimage
- 12:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
- 12:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2226.codfw.wmnet with OS bookworm
- 12:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2225.codfw.wmnet with OS bookworm
- 12:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2224.codfw.wmnet with OS bookworm
- 12:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2223.codfw.wmnet with OS bookworm
- 12:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS bookworm
- 12:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS bookworm
- 12:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2039.codfw.wmnet with OS bullseye
- 12:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2441 to wikikube-worker2039
- 12:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2039
- 12:06 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 12:02 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
- 12:02 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
- 12:01 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:59 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 11:51 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2039
- 11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2441 to wikikube-worker2039 - cgoubert@cumin1002"
- 11:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2441 to wikikube-worker2039 - cgoubert@cumin1002"
- 11:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 11:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2441 to wikikube-worker2039
- 11:26 akosiaris@deploy1003: Finished scap: check the deployment server after switchover (duration: 32m 28s)
- 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67004 and previous config saved to /var/cache/conftool/dbconfig/20240729-111410-root.json
- 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67003 and previous config saved to /var/cache/conftool/dbconfig/20240729-105904-root.json
- 10:54 akosiaris@deploy1003: Started scap sync-world: check the deployment server after switchover
- 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67002 and previous config saved to /var/cache/conftool/dbconfig/20240729-104358-root.json
- 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67001 and previous config saved to /var/cache/conftool/dbconfig/20240729-102853-root.json
- 10:20 marostegui: Deploy schema change on s7 eqiad master with replication dbmaint T370394
- 10:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2441.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67000 and previous config saved to /var/cache/conftool/dbconfig/20240729-101348-root.json
- 10:12 godog: bounce benthos@mw_accesslog_sampler on logstash collectors
- 10:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1032.eqiad.wmnet with reason: Long schema change
- 10:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1032.eqiad.wmnet with reason: Long schema change
- 10:07 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2441.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 09:31 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
- 09:27 dcausse@deploy1002: Finished deploy [airflow-dags/search@7da1ef0]: search: process_sparql_query workaround oom issues (duration: 00m 20s)
- 09:27 dcausse@deploy1002: Started deploy [airflow-dags/search@7da1ef0]: search: process_sparql_query workaround oom issues
- 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1032 investigate access denied errors', diff saved to https://phabricator.wikimedia.org/P66999 and previous config saved to /var/cache/conftool/dbconfig/20240729-092239-root.json
- 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T367856)', diff saved to https://phabricator.wikimedia.org/P66998 and previous config saved to /var/cache/conftool/dbconfig/20240729-091658-marostegui.json
- 09:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
- 09:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
- 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T367856)', diff saved to https://phabricator.wikimedia.org/P66997 and previous config saved to /var/cache/conftool/dbconfig/20240729-091637-marostegui.json
- 09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repool 25% of es1032', diff saved to https://phabricator.wikimedia.org/P66996 and previous config saved to /var/cache/conftool/dbconfig/20240729-090953-marostegui.json
- 09:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1032.eqiad.wmnet with reason: Long schema change
- 09:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1032.eqiad.wmnet with reason: Long schema change
- 09:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1032 investigate access denied errors', diff saved to https://phabricator.wikimedia.org/P66995 and previous config saved to /var/cache/conftool/dbconfig/20240729-090730-root.json
- 09:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P66994 and previous config saved to /var/cache/conftool/dbconfig/20240729-090129-marostegui.json
- 08:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P66992 and previous config saved to /var/cache/conftool/dbconfig/20240729-084622-marostegui.json
- 08:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T367856)', diff saved to https://phabricator.wikimedia.org/P66991 and previous config saved to /var/cache/conftool/dbconfig/20240729-083115-marostegui.json
- 07:54 dcausse: closing the backport window
- 07:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 24482
- 07:51 dcausse@deploy1002: Finished scap: Backport for GeoData: add pool counter settings (T370621) (duration: 11m 36s)
- 07:47 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts karapace1001.eqiad.wmnet
- 07:47 brouberol@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:47 brouberol@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: karapace1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
- 07:46 brouberol@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: karapace1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
- 07:46 dcausse@deploy1002: dcausse: Continuing with sync
- 07:42 dcausse@deploy1002: dcausse: Backport for GeoData: add pool counter settings (T370621) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:41 brouberol@cumin1002: START - Cookbook sre.dns.netbox
- 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 24482
- 07:39 dcausse@deploy1002: Started scap sync-world: Backport for GeoData: add pool counter settings (T370621)
- 07:39 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 24482
- 07:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 24482
- 07:34 kartik@deploy1002: Finished scap: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko (duration: 14m 42s)
- 07:34 brouberol@cumin1002: START - Cookbook sre.hosts.decommission for hosts karapace1001.eqiad.wmnet
- 07:34 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts karapace1002.eqiad.wmnet
- 07:34 brouberol@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:34 brouberol@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: karapace1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
- 07:32 brouberol@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: karapace1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
- 07:29 brouberol@cumin1002: START - Cookbook sre.dns.netbox
- 07:25 kartik@deploy1002: kartik: Continuing with sync
- 07:25 kartik@deploy1002: kartik: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:25 brouberol@cumin1002: START - Cookbook sre.hosts.decommission for hosts karapace1002.eqiad.wmnet
- 07:19 kartik@deploy1002: Started scap sync-world: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko
- 07:19 kartik@deploy1002: Sync cancelled.
- 07:19 kartik@deploy1002: kartik: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:03 kartik@deploy1002: Started scap sync-world: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko
- 06:48 marostegui: Deploy schema change on s4 codfw db2179 dbmaint T367856
- 06:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Long schema change
- 06:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Long schema change
- 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2179 T371205', diff saved to https://phabricator.wikimedia.org/P66990 and previous config saved to /var/cache/conftool/dbconfig/20240729-064405-marostegui.json
- 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2140 to s4 primary T371205', diff saved to https://phabricator.wikimedia.org/P66989 and previous config saved to /var/cache/conftool/dbconfig/20240729-064250-marostegui.json
- 06:42 marostegui: Starting s4 codfw failover from db2179 to db2140 - T371205
- 03:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T367856)', diff saved to https://phabricator.wikimedia.org/P66984 and previous config saved to /var/cache/conftool/dbconfig/20240729-030804-marostegui.json
- 03:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 03:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 03:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66983 and previous config saved to /var/cache/conftool/dbconfig/20240729-030742-marostegui.json
- 02:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P66982 and previous config saved to /var/cache/conftool/dbconfig/20240729-025235-marostegui.json
- 02:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P66981 and previous config saved to /var/cache/conftool/dbconfig/20240729-023728-marostegui.json
- 02:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66980 and previous config saved to /var/cache/conftool/dbconfig/20240729-022221-marostegui.json
2024-07-28
- 19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T367856)', diff saved to https://phabricator.wikimedia.org/P66979 and previous config saved to /var/cache/conftool/dbconfig/20240728-190050-marostegui.json
- 19:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 19:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367856)', diff saved to https://phabricator.wikimedia.org/P66978 and previous config saved to /var/cache/conftool/dbconfig/20240728-190028-marostegui.json
- 18:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P66977 and previous config saved to /var/cache/conftool/dbconfig/20240728-184521-marostegui.json
- 18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P66976 and previous config saved to /var/cache/conftool/dbconfig/20240728-183013-marostegui.json
- 18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367856)', diff saved to https://phabricator.wikimedia.org/P66975 and previous config saved to /var/cache/conftool/dbconfig/20240728-181506-marostegui.json
- 04:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66974 and previous config saved to /var/cache/conftool/dbconfig/20240728-044200-marostegui.json
- 04:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 04:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T367856)', diff saved to https://phabricator.wikimedia.org/P66973 and previous config saved to /var/cache/conftool/dbconfig/20240728-042021-marostegui.json
- 04:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
- 04:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
- 04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367856)', diff saved to https://phabricator.wikimedia.org/P66972 and previous config saved to /var/cache/conftool/dbconfig/20240728-042000-marostegui.json
- 04:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P66971 and previous config saved to /var/cache/conftool/dbconfig/20240728-040453-marostegui.json
- 03:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P66970 and previous config saved to /var/cache/conftool/dbconfig/20240728-034946-marostegui.json
- 03:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367856)', diff saved to https://phabricator.wikimedia.org/P66969 and previous config saved to /var/cache/conftool/dbconfig/20240728-033440-marostegui.json
2024-07-27
- 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T367856)', diff saved to https://phabricator.wikimedia.org/P66968 and previous config saved to /var/cache/conftool/dbconfig/20240727-135859-marostegui.json
- 13:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
- 13:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
- 13:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367856)', diff saved to https://phabricator.wikimedia.org/P66967 and previous config saved to /var/cache/conftool/dbconfig/20240727-135838-marostegui.json
- 13:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P66966 and previous config saved to /var/cache/conftool/dbconfig/20240727-134331-marostegui.json
- 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P66965 and previous config saved to /var/cache/conftool/dbconfig/20240727-132824-marostegui.json
- 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367856)', diff saved to https://phabricator.wikimedia.org/P66964 and previous config saved to /var/cache/conftool/dbconfig/20240727-131316-marostegui.json
- 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P66963 and previous config saved to /var/cache/conftool/dbconfig/20240727-113018-ladsgroup.json
- 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P66962 and previous config saved to /var/cache/conftool/dbconfig/20240727-111512-ladsgroup.json
- 11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P66961 and previous config saved to /var/cache/conftool/dbconfig/20240727-110007-ladsgroup.json
- 10:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P66960 and previous config saved to /var/cache/conftool/dbconfig/20240727-104502-ladsgroup.json
- 10:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1246.eqiad.wmnet with reason: Sad
- 10:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1246.eqiad.wmnet with reason: Sad
- 10:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1246, paged', diff saved to https://phabricator.wikimedia.org/P66959 and previous config saved to /var/cache/conftool/dbconfig/20240727-100533-ladsgroup.json
- 07:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 07:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367856)', diff saved to https://phabricator.wikimedia.org/P66958 and previous config saved to /var/cache/conftool/dbconfig/20240727-070839-marostegui.json
- 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P66957 and previous config saved to /var/cache/conftool/dbconfig/20240727-065332-marostegui.json
- 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P66956 and previous config saved to /var/cache/conftool/dbconfig/20240727-063824-marostegui.json
- 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367856)', diff saved to https://phabricator.wikimedia.org/P66955 and previous config saved to /var/cache/conftool/dbconfig/20240727-062317-marostegui.json
- 01:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2234.codfw.wmnet with OS bookworm
- 01:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 01:13 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2233.codfw.wmnet with OS bookworm
- 01:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2234.codfw.wmnet with reason: host reimage
- 01:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2234.codfw.wmnet with reason: host reimage
- 01:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2233.codfw.wmnet with OS bookworm
- 00:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2234.codfw.wmnet with OS bookworm
- 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2235.codfw.wmnet with OS bookworm
- 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2236.codfw.wmnet with OS bookworm
- 00:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2235.codfw.wmnet with reason: host reimage
- 00:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2235.codfw.wmnet with reason: host reimage
- 00:20 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T352010)', diff saved to https://phabricator.wikimedia.org/P66954 and previous config saved to /var/cache/conftool/dbconfig/20240727-002016-ladsgroup.json
- 00:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2235.codfw.wmnet with OS bookworm
- 00:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2237.codfw.wmnet with OS bookworm
- 00:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P66953 and previous config saved to /var/cache/conftool/dbconfig/20240727-000509-ladsgroup.json
- 00:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2236.codfw.wmnet with reason: host reimage
- 00:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2236.codfw.wmnet with reason: host reimage
- 00:01 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
2024-07-26
- 23:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P66952 and previous config saved to /var/cache/conftool/dbconfig/20240726-235001-ladsgroup.json
- 23:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2236.codfw.wmnet with OS bookworm
- 23:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2237.codfw.wmnet with reason: host reimage
- 23:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2237.codfw.wmnet with reason: host reimage
- 23:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2238.codfw.wmnet with OS bookworm
- 23:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 23:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T367856)', diff saved to https://phabricator.wikimedia.org/P66951 and previous config saved to /var/cache/conftool/dbconfig/20240726-233648-marostegui.json
- 23:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 23:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 23:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
- 23:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
- 23:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367856)', diff saved to https://phabricator.wikimedia.org/P66950 and previous config saved to /var/cache/conftool/dbconfig/20240726-233619-marostegui.json
- 23:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 23:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T352010)', diff saved to https://phabricator.wikimedia.org/P66949 and previous config saved to /var/cache/conftool/dbconfig/20240726-233454-ladsgroup.json
- 23:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2237.codfw.wmnet with OS bookworm
- 23:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P66948 and previous config saved to /var/cache/conftool/dbconfig/20240726-232112-marostegui.json
- 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2238.codfw.wmnet with reason: host reimage
- 23:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2238.codfw.wmnet with reason: host reimage
- 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2239.codfw.wmnet with OS bookworm
- 23:10 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 23:09 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 23:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P66947 and previous config saved to /var/cache/conftool/dbconfig/20240726-230605-marostegui.json
- 23:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2238.codfw.wmnet with OS bookworm
- 22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2239.codfw.wmnet with reason: host reimage
- 22:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367856)', diff saved to https://phabricator.wikimedia.org/P66946 and previous config saved to /var/cache/conftool/dbconfig/20240726-225058-marostegui.json
- 22:50 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2239.codfw.wmnet with reason: host reimage
- 22:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2239.codfw.wmnet with OS bookworm
- 22:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2239.codfw.wmnet with OS bookworm
- 20:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2239.codfw.wmnet with OS bookworm
- 18:52 mutante: [deploy1002:~] $ echo 'https://sep11.wikipedia.org' | mwscript purgeList.php --wiki=aawiki - T367014
- 18:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1005.eqiad.wmnet with OS bullseye
- 18:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1006.eqiad.wmnet with OS bullseye
- 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:53 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1005.eqiad.wmnet with reason: host reimage
- 17:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1006.eqiad.wmnet with reason: host reimage
- 17:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1005.eqiad.wmnet with reason: host reimage
- 17:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1006.eqiad.wmnet with reason: host reimage
- 17:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
- 17:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1006.eqiad.wmnet with OS bullseye
- 17:16 cjming@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 17:16 cjming@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 16:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2239.mgmt.codfw.wmnet with reboot policy FORCED
- 16:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2238.mgmt.codfw.wmnet with reboot policy FORCED
- 16:52 cjming@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2237.mgmt.codfw.wmnet with reboot policy FORCED
- 16:52 cjming@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 16:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2236.mgmt.codfw.wmnet with reboot policy FORCED
- 16:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2235.mgmt.codfw.wmnet with reboot policy FORCED
- 16:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2234.mgmt.codfw.wmnet with reboot policy FORCED
- 16:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2233.mgmt.codfw.wmnet with reboot policy FORCED
- 16:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2232.mgmt.codfw.wmnet with reboot policy FORCED
- 16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2231.mgmt.codfw.wmnet with reboot policy FORCED
- 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2239.mgmt.codfw.wmnet with reboot policy FORCED
- 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2238.mgmt.codfw.wmnet with reboot policy FORCED
- 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2230.mgmt.codfw.wmnet with reboot policy FORCED
- 16:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2237.mgmt.codfw.wmnet with reboot policy FORCED
- 16:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2236.mgmt.codfw.wmnet with reboot policy FORCED
- 16:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2235.mgmt.codfw.wmnet with reboot policy FORCED
- 16:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2229.mgmt.codfw.wmnet with reboot policy FORCED
- 16:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2234.mgmt.codfw.wmnet with reboot policy FORCED
- 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2228.mgmt.codfw.wmnet with reboot policy FORCED
- 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2233.mgmt.codfw.wmnet with reboot policy FORCED
- 16:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2232.mgmt.codfw.wmnet with reboot policy FORCED
- 16:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2231.mgmt.codfw.wmnet with reboot policy FORCED
- 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2230.mgmt.codfw.wmnet with reboot policy FORCED
- 16:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2229.mgmt.codfw.wmnet with reboot policy FORCED
- 16:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2228.mgmt.codfw.wmnet with reboot policy FORCED
- 16:24 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2229 to codfw - jhancock@cumin2002"
- 16:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2229 to codfw - jhancock@cumin2002"
- 16:20 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:55 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@845502d]: (no justification provided) (duration: 00m 37s)
- 15:55 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@845502d]: (no justification provided)
- 15:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1163 (T352010)', diff saved to https://phabricator.wikimedia.org/P66945 and previous config saved to /var/cache/conftool/dbconfig/20240726-153145-ladsgroup.json
- 15:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 15:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 15:12 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
- 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2227.mgmt.codfw.wmnet with reboot policy FORCED
- 14:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2227.mgmt.codfw.wmnet with reboot policy FORCED
- 14:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2227 to codfw - jhancock@cumin2002"
- 14:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2227 to codfw - jhancock@cumin2002"
- 14:48 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:42 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 14:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2226']
- 14:41 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2226']
- 14:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2226']
- 14:41 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2226']
- 14:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2240.codfw.wmnet with OS bookworm
- 14:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 14:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 14:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2226.mgmt.codfw.wmnet with reboot policy FORCED
- 14:07 dcausse@deploy1002: Finished deploy [airflow-dags/search@fb00e94]: search: process_sparql_query_hourly tune the number of partitions to prevent OOM (duration: 00m 21s)
- 14:07 dcausse@deploy1002: Started deploy [airflow-dags/search@fb00e94]: search: process_sparql_query_hourly tune the number of partitions to prevent OOM
- 14:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2240.codfw.wmnet with reason: host reimage
- 14:03 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2240.codfw.wmnet with reason: host reimage
- 13:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2226.mgmt.codfw.wmnet with reboot policy FORCED
- 13:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2226 to codfw - jhancock@cumin2002"
- 13:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2226 to codfw - jhancock@cumin2002"
- 13:52 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 13:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2240.codfw.wmnet with OS bookworm
- 13:42 elukey: move dump_cloud_ip_ranges's write to /srv/private capabilities back to puppetmaster1001 - T368023
- 13:23 dcausse@deploy1002: Finished deploy [airflow-dags/search@d09039f]: search: fix drop dailies and bump discolitycs to fix numpy & pyarrow version conflict (duration: 00m 45s)
- 13:23 dcausse@deploy1002: Started deploy [airflow-dags/search@d09039f]: search: fix drop dailies and bump discolitycs to fix numpy & pyarrow version conflict
- 13:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
- 13:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
- 12:58 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
- 12:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1006.eqiad.wmnet with OS bullseye
- 12:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1005.eqiad.wmnet with OS bullseye
- 12:42 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
- 12:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm
- 11:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
- 11:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1006.eqiad.wmnet with OS bullseye
- 11:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
- 11:48 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
- 11:45 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
- 11:05 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 10:40 akosiaris@deploy1003: Synchronized .mailmap: Testing a noop deploy from deploy1003 (duration: 20m 28s)
- 10:03 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
- 10:00 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 10:00 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 09:38 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1073.eqiad.wmnet
- 09:35 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host analytics1073.eqiad.wmnet
- 09:33 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1072.eqiad.wmnet
- 09:27 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host analytics1072.eqiad.wmnet
- 09:21 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: sync
- 09:21 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: sync
- 09:21 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/machinetranslation: sync
- 09:21 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/machinetranslation: sync
- 09:21 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: sync
- 09:16 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
- 09:10 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
- 09:09 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/recommendation-api: sync
- 09:09 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
- 09:09 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
- 09:09 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
- 09:09 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
- 09:06 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: sync
- 09:06 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
- 09:06 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: sync
- 09:06 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: sync
- 09:06 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
- 09:06 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/echostore: sync
- 09:06 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: sync
- 09:06 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
- 09:06 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: sync
- 09:06 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: sync
- 09:06 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: sync
- 09:05 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/machinetranslation: sync
- 09:02 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync
- 09:02 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/linkrecommendation: sync
- 09:02 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync
- 09:01 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/linkrecommendation: sync
- 09:01 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync
- 09:01 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/linkrecommendation: sync
- 08:56 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: sync
- 08:55 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: sync
- 08:55 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: sync
- 08:55 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: sync
- 08:55 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: sync
- 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T367856)', diff saved to https://phabricator.wikimedia.org/P66942 and previous config saved to /var/cache/conftool/dbconfig/20240726-085529-marostegui.json
- 08:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 08:55 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: sync
- 08:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367856)', diff saved to https://phabricator.wikimedia.org/P66941 and previous config saved to /var/cache/conftool/dbconfig/20240726-085507-marostegui.json
- 08:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 08:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 08:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P66940 and previous config saved to /var/cache/conftool/dbconfig/20240726-083959-marostegui.json
- 08:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
- 08:32 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
- 08:25 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 08:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P66939 and previous config saved to /var/cache/conftool/dbconfig/20240726-082452-marostegui.json
- 08:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 08:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 08:16 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 08:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
- 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367856)', diff saved to https://phabricator.wikimedia.org/P66938 and previous config saved to /var/cache/conftool/dbconfig/20240726-080945-marostegui.json
- 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T367856)', diff saved to https://phabricator.wikimedia.org/P66937 and previous config saved to /var/cache/conftool/dbconfig/20240726-074330-marostegui.json
- 07:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
- 07:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
- 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66936 and previous config saved to /var/cache/conftool/dbconfig/20240726-074308-marostegui.json
- 07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P66935 and previous config saved to /var/cache/conftool/dbconfig/20240726-072801-marostegui.json
- 07:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P66934 and previous config saved to /var/cache/conftool/dbconfig/20240726-071254-marostegui.json
- 06:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66933 and previous config saved to /var/cache/conftool/dbconfig/20240726-065747-marostegui.json
- 06:56 XioNoX: continue rolling out "LVS-and-NS-service-ips" prefix-list rename to network device
- 00:47 ladsgroup@deploy1002: Finished scap: Backport for Update UI classes and CSS for review notices (T191156), Add CSS class to watchlist pending notice (T191156) (duration: 09m 49s)
- 00:42 ladsgroup@deploy1002: ladsgroup: Continuing with sync
- 00:40 ladsgroup@deploy1002: ladsgroup: Backport for Update UI classes and CSS for review notices (T191156), Add CSS class to watchlist pending notice (T191156) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 00:37 ladsgroup@deploy1002: Started scap sync-world: Backport for Update UI classes and CSS for review notices (T191156), Add CSS class to watchlist pending notice (T191156)
2024-07-25
- 23:09 ladsgroup@deploy1002: ladsgroup: Continuing with sync
- 23:05 ladsgroup@deploy1002: ladsgroup: Backport for Add CSS class to watchlist pending notice (T191156) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 23:03 ladsgroup@deploy1002: Started scap sync-world: Backport for Add CSS class to watchlist pending notice (T191156)
- 22:56 ladsgroup@deploy1002: Finished scap: Backport for Revert "Use expression builder to avoid IDatabase::makeList" (T371052) (duration: 10m 08s)
- 22:50 ladsgroup@deploy1002: ladsgroup, umherirrender: Continuing with sync
- 22:48 ladsgroup@deploy1002: ladsgroup, umherirrender: Backport for Revert "Use expression builder to avoid IDatabase::makeList" (T371052) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:46 ladsgroup@deploy1002: Started scap sync-world: Backport for Revert "Use expression builder to avoid IDatabase::makeList" (T371052)
- 22:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2240.mgmt.codfw.wmnet with reboot policy FORCED
- 22:10 eoghan@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T370973
- 22:04 eoghan@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T370973
- 22:04 eoghan@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T370973
- 22:03 eoghan@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T370973
- 22:00 eoghan@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade for T370973
- 21:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2240.mgmt.codfw.wmnet with reboot policy FORCED
- 21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2240 to codfw - jhancock@cumin2002"
- 21:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2240 to codfw - jhancock@cumin2002"
- 21:54 eoghan@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade for T370973
- 21:52 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2224']
- 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2225']
- 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2223']
- 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2222']
- 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2221']
- 21:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2225']
- 21:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2224']
- 21:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2223']
- 21:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2222']
- 21:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2221']
- 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2225']
- 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2224']
- 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2223']
- 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2222']
- 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2221']
- 21:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2225']
- 21:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2224']
- 21:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2223']
- 21:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2222']
- 21:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2221']
- 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2225.mgmt.codfw.wmnet with reboot policy FORCED
- 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2222.mgmt.codfw.wmnet with reboot policy FORCED
- 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2223.mgmt.codfw.wmnet with reboot policy FORCED
- 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2221.mgmt.codfw.wmnet with reboot policy FORCED
- 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2224.mgmt.codfw.wmnet with reboot policy FORCED
- 21:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2225.mgmt.codfw.wmnet with reboot policy FORCED
- 21:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2222.mgmt.codfw.wmnet with reboot policy FORCED
- 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2225.mgmt.codfw.wmnet with reboot policy FORCED
- 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2222.mgmt.codfw.wmnet with reboot policy FORCED
- 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2225.mgmt.codfw.wmnet with reboot policy FORCED
- 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2224.mgmt.codfw.wmnet with reboot policy FORCED
- 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2223.mgmt.codfw.wmnet with reboot policy FORCED
- 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2222.mgmt.codfw.wmnet with reboot policy FORCED
- 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2221.mgmt.codfw.wmnet with reboot policy FORCED
- 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2221 to codfw - jhancock@cumin2002"
- 21:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2221 to codfw - jhancock@cumin2002"
- 21:14 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 19:16 cstone: payments-wiki upgraded from a37746fe to 91624a2e
- 19:12 pt1979@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 19:12 pt1979@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1002"
- 18:59 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 18:26 pt1979@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1002"
- 18:12 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.15 refs T366960
- 18:10 pt1979@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
- 18:07 pt1979@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
- 18:05 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 17:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 17:56 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 17:32 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 17:20 swfrench@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1032.eqiad.wmnet),cluster=kubernetes,service=kubesvc [reason: T351074 - pooling after reimage]
- 17:08 swfrench@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1032.eqiad.wmnet with OS bullseye
- 17:06 swfrench-wmf: running homer 'cr*eqiad*' commit 'T351074' for k8s worker reimage
- 17:03 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@b1a04fc]: bump discolytics to 0.25 (duration: 00m 25s)
- 17:03 ebernhardson@deploy1002: Started deploy [airflow-dags/search@b1a04fc]: bump discolytics to 0.25
- 16:48 swfrench@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1032.eqiad.wmnet with reason: host reimage
- 16:46 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@8c8f4c2]: Add new fields to search_satisfaction metrics (duration: 00m 19s)
- 16:46 ebernhardson@deploy1002: Started deploy [airflow-dags/search@8c8f4c2]: Add new fields to search_satisfaction metrics
- 16:45 swfrench@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1032.eqiad.wmnet with reason: host reimage
- 16:45 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 16:30 swfrench@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1032.eqiad.wmnet with OS bullseye
- 16:29 swfrench@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1032.eqiad.wmnet on all recursors
- 16:29 swfrench@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1032.eqiad.wmnet on all recursors
- 16:27 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 16:27 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 16:25 swfrench@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1364 to wikikube-worker1032
- 16:24 swfrench@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1032
- 16:24 swfrench@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1032
- 16:23 swfrench@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:23 swfrench@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1364 to wikikube-worker1032 - swfrench@cumin1002"
- 16:21 swfrench@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1364 to wikikube-worker1032 - swfrench@cumin1002"
- 16:18 swfrench@cumin1002: START - Cookbook sre.dns.netbox
- 16:18 swfrench@cumin1002: START - Cookbook sre.hosts.rename from mw1364 to wikikube-worker1032
- 16:17 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 16:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:07 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 15:15 elukey: upgrade spicerack to 8.9.0 on cumin nodes
- 15:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66930 and previous config saved to /var/cache/conftool/dbconfig/20240725-150739-marostegui.json
- 15:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
- 15:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
- 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T367856)', diff saved to https://phabricator.wikimedia.org/P66929 and previous config saved to /var/cache/conftool/dbconfig/20240725-150717-marostegui.json
- 14:53 elukey: uploaded spicerack_8.9.0 to apt.wikimedia.org bullseye-wikimedia
- 14:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P66928 and previous config saved to /var/cache/conftool/dbconfig/20240725-145210-marostegui.json
- 14:51 sukhe: running authdns-update after dns4003 depool
- 14:48 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns4003.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
- 14:46 sukhe: [dns4003] upgrade anycast-healthchecker to 0.9.8-1+wmf12u2: T370068
- 14:44 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns4003.wikimedia.org [reason: upgrading anycast-hc: T370068]
- 14:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P66926 and previous config saved to /var/cache/conftool/dbconfig/20240725-143703-marostegui.json
- 14:36 dcausse@deploy1002: Finished deploy [airflow-dags/search@87b91b6]: search: drop hourly weighted_tags support (duration: 00m 20s)
- 14:36 dcausse@deploy1002: Started deploy [airflow-dags/search@87b91b6]: search: drop hourly weighted_tags support
- 14:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T367856)', diff saved to https://phabricator.wikimedia.org/P66925 and previous config saved to /var/cache/conftool/dbconfig/20240725-142155-marostegui.json
- 14:19 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: sync
- 14:12 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: sync
- 14:12 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: sync
- 14:04 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
- 14:04 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/recommendation-api: sync
- 14:04 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
- 14:03 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
- 14:03 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
- 14:03 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
- 13:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: sync
- 13:57 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: sync
- 13:53 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: sync
- 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
- 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
- 13:52 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
- 13:52 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
- 13:48 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
- 13:48 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
- 13:48 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
- 13:48 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: apply
- 13:48 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
- 13:48 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
- 13:47 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
- 13:45 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
- 13:45 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
- 13:45 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
- 13:45 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
- 13:45 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
- 13:43 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 13:43 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 13:43 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 13:43 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 13:42 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 13:42 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 13:41 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes1051.eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning kubernetes1051 for missed upgrades - T369011]
- 13:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1051.eqiad.wmnet
- 13:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host pc1017.eqiad.wmnet with OS bookworm
- 13:32 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubernetes1051.eqiad.wmnet
- 13:30 Lucas_WMDE: UTC afternoon backport+config window done
- 13:30 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=kubernetes1051.eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Cordoning kubernetes1051 for missed upgrades - T369011]
- 13:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add wikibase client interaction stream (T370045) (duration: 07m 56s)
- 13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, joelyrookewmde: Continuing with sync
- 13:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, joelyrookewmde: Backport for Add wikibase client interaction stream (T370045) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Add wikibase client interaction stream (T370045)
- 13:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable optional MathJax rendering in everywhere (T370507) (duration: 09m 57s)
- 13:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
- 13:15 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
- 13:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, physikerwelt: Continuing with sync
- 13:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, physikerwelt: Backport for Enable optional MathJax rendering in everywhere (T370507) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Enable optional MathJax rendering in everywhere (T370507)
- 13:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
- 12:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host pc1017.eqiad.wmnet with OS bookworm
- 12:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 12:42 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 12:42 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 12:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 12:29 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
- 12:28 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
- 12:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
- 12:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
- 12:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 12:26 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 12:26 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 12:25 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 12:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 12:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 12:24 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
- 12:23 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
- 12:23 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
- 12:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
- 12:22 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
- 12:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
- 12:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
- 12:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
- 12:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
- 12:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
- 12:18 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 12:17 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 12:17 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 12:16 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 12:16 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 12:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
- 12:15 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 12:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 12:14 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 12:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
- 12:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 12:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 12:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 12:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 12:08 cgoubert@deploy1002: sync-world aborted: Deploying mpic envoy listener - 1056163 - T366234 (duration: 17m 59s)
- 11:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
- 11:53 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 11:51 cgoubert@deploy1002: Started scap sync-world: Deploying mpic envoy listener - 1056163 - T366234
- 11:45 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 11:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 11:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 10:42 elukey: upload docker-report 0.0.15 to bullseye-wimedia and upgrade build2001
- 10:00 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes1051.eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning kubernetes1051 - T369011]
- 09:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:54 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 09:27 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 09:26 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 09:19 elukey: move dump_cloud_ip_ranges from puppetmaster1001 to puppetserver1001 - T368023
- 07:38 kart_: Updated cxserver to 2024-07-22-050142-production (T363968)
- 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T367856)', diff saved to https://phabricator.wikimedia.org/P66924 and previous config saved to /var/cache/conftool/dbconfig/20240725-073742-marostegui.json
- 07:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 07:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66923 and previous config saved to /var/cache/conftool/dbconfig/20240725-073720-marostegui.json
- 07:37 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 07:36 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 07:36 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 07:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 07:35 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 07:35 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 07:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P66922 and previous config saved to /var/cache/conftool/dbconfig/20240725-072213-marostegui.json
- 07:14 XioNoX: add transit BGP session to KPN in esams
- 07:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P66921 and previous config saved to /var/cache/conftool/dbconfig/20240725-070706-marostegui.json
- 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66920 and previous config saved to /var/cache/conftool/dbconfig/20240725-065159-marostegui.json
- 00:43 zabe@deploy1002: Finished scap: Backport for Further configs for cswikivoyage (T370913) (duration: 08m 22s)
- 00:39 zabe@deploy1002: zabe: Continuing with sync
- 00:37 zabe@deploy1002: zabe: Backport for Further configs for cswikivoyage (T370913) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 00:35 zabe@deploy1002: Started scap sync-world: Backport for Further configs for cswikivoyage (T370913)
- 00:11 eileen: civicrm upgraded from c656ab2f to 1dc4f944
- 00:00 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 00:00 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 00:00 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
2024-07-24
- 23:59 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 23:59 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 23:59 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 23:20 zabe@deploy1002: Finished scap: update interwiki cache (duration: 08m 25s)
- 23:11 zabe@deploy1002: Started scap sync-world: update interwiki cache
- 23:09 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=cswikivoyage --cluster=all 2>&1 | tee /tmp/cswikivoyage.UpdateSearchIndexConfig.log # T370905
- 23:08 zabe@deploy1002: Finished scap: T370905 (duration: 09m 14s)
- 23:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T367856)', diff saved to https://phabricator.wikimedia.org/P66919 and previous config saved to /var/cache/conftool/dbconfig/20240724-230209-marostegui.json
- 23:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
- 23:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
- 22:59 zabe@deploy1002: Started scap sync-world: T370905
- 22:59 zabe: Create Wikivoyage Czech # T370905
- 22:42 ejegg: re-enabled Adyen job runner
- 22:41 ejegg: SmashPig upgraded from f2aca230 to 1b2d9a6e across all frack servers
- 22:34 ejegg: SmashPig upgraded from f2aca230 to 1b2d9a6e on frpig1002 only
- 22:34 ejegg: SmashPig upgraded from f2aca230 to 1b2d9a6e on frpig2001 only
- 22:33 ejegg: disabled Adyen job runner
- 21:59 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1021.eqiad.wmnet
- 21:59 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1020.eqiad.wmnet
- 21:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1019.eqiad.wmnet
- 21:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1018.eqiad.wmnet
- 21:55 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1021.eqiad.wmnet
- 21:55 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1020.eqiad.wmnet
- 21:55 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1019.eqiad.wmnet
- 21:55 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1018.eqiad.wmnet
- 21:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wdqs[1018-1021].eqiad.wmnet with reason: T366555 security
- 21:54 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on wdqs[1018-1021].eqiad.wmnet with reason: T366555 security
- 21:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1014.eqiad.wmnet
- 21:50 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1015.eqiad.wmnet
- 21:47 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1014.eqiad.wmnet
- 21:47 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1015.eqiad.wmnet
- 21:47 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wdqs[1014-1015].eqiad.wmnet with reason: T366555 security
- 21:47 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on wdqs[1014-1015].eqiad.wmnet with reason: T366555 security
- 21:46 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2007.codfw.wmnet
- 21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2010.codfw.wmnet
- 21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2011.codfw.wmnet
- 21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1012.eqiad.wmnet
- 21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2012.codfw.wmnet
- 21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2009.codfw.wmnet
- 21:44 ryankemper@cumin2002: END (PASS) - Cookbook sre.apifeatureusage.roll-restart-reboot-logstash (exit_code=0) rolling reboot on A:apifeatureusage
- 21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet
- 21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2011.codfw.wmnet
- 21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2010.codfw.wmnet
- 21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
- 21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2007.codfw.wmnet
- 21:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1013.eqiad.wmnet
- 21:41 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wdqs[2007,2009-2012].codfw.wmnet with reason: T366555 security
- 21:40 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on wdqs[2007,2009-2012].codfw.wmnet with reason: T366555 security
- 21:38 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1013.eqiad.wmnet
- 21:38 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1012.eqiad.wmnet
- 21:38 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wdqs[1012-1013].eqiad.wmnet with reason: T366555 security
- 21:38 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on wdqs[1012-1013].eqiad.wmnet with reason: T366555 security
- 21:35 ryankemper@cumin2002: START - Cookbook sre.apifeatureusage.roll-restart-reboot-logstash rolling reboot on A:apifeatureusage
- 21:32 ebernhardson@deploy1002: Finished scap: Backport for Check the output of RevisionStore::getRevisionById (T370770) (duration: 12m 07s)
- 21:28 ebernhardson@deploy1002: ebernhardson: Continuing with sync
- 21:26 ebernhardson@deploy1002: ebernhardson: Backport for Check the output of RevisionStore::getRevisionById (T370770) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:20 ebernhardson@deploy1002: Started scap sync-world: Backport for Check the output of RevisionStore::getRevisionById (T370770)
- 21:17 zabe@deploy1002: Finished scap: Backport for Create dark mode launch banner for Vector 2022 (T370303) (duration: 41m 44s)
- 21:11 zabe@deploy1002: jdrewniak, zabe: Continuing with sync
- 21:07 zabe@deploy1002: jdrewniak, zabe: Backport for Create dark mode launch banner for Vector 2022 (T370303) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
- 20:36 zabe@deploy1002: Started scap sync-world: Backport for Create dark mode launch banner for Vector 2022 (T370303)
- 20:24 sergi0: mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=frwiktionary #T369711
- 20:23 sergi0: sgimeno@mwmaint1002:~$ mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=dewiki --force
- 20:18 zabe@deploy1002: Finished scap: Backport for frwiktionary, dewiki: enable CommunityConfiguration (T370261 T369711) (duration: 09m 43s)
- 20:13 zabe@deploy1002: zabe, sgimeno: Continuing with sync
- 20:11 zabe@deploy1002: zabe, sgimeno: Backport for frwiktionary, dewiki: enable CommunityConfiguration (T370261 T369711) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:08 zabe@deploy1002: Started scap sync-world: Backport for frwiktionary, dewiki: enable CommunityConfiguration (T370261 T369711)
- 19:31 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 19:31 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 18:10 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.15 refs T366960
- 17:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
- 17:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
- 17:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 17:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
- 17:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
- 16:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:54 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
- 16:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
- 16:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
- 16:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
- 16:38 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:33 sukhe: sudo cumin -b1 -s120 'O:wikidough' 'systemctl restart anycast-healthchecker.service'
- 15:43 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 15:42 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 15:30 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
- 15:24 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
- 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2017.codfw.wmnet with OS bookworm
- 15:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage
- 15:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3010.esams.wmnet
- 15:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage
- 15:04 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc2017.codfw.wmnet with OS bookworm
- 15:01 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs3010.esams.wmnet
- 14:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host pc2017.codfw.wmnet with OS bookworm
- 14:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 14:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 14:52 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards", Revert "TranslatablePage: Split translatable page id cache into multiple shards" (duration: 09m 37s)
- 14:47 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, trainbranchbot: Continuing with sync
- 14:44 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, trainbranchbot: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards", Revert "TranslatablePage: Split translatable page id cache into multiple shards" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:42 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards", Revert "TranslatablePage: Split translatable page id cache into multiple shards"
- 14:36 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
- 14:35 ecarg@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage
- 14:33 ecarg@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:32 ecarg@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
- 14:31 ecarg@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:30 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
- 14:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage
- 14:29 ecarg@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:28 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5006.eqsin.wmnet
- 14:27 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org [reason: upgrading anycast-hc: T370068]
- 14:27 ecarg@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:26 ecarg@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:26 ecarg@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:26 kamila@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 14:25 kamila@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 14:25 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
- 14:24 kamila@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
- 14:24 kamila@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 14:24 sukhe: upgrade O:durum to anycast-hc 0.9.8-1+wmf12u2
- 14:22 kamila@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 14:22 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
- 14:20 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6003.drmrs.wmnet
- 14:20 ecarg@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:19 ecarg@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:19 ecarg@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:18 sukhe: disable puppet on O:durum
- 14:18 ecarg@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:16 ecarg@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:15 ecarg@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:14 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs6003.drmrs.wmnet
- 14:10 Lucas_WMDE: UTC afternoon backport+config window done
- 14:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) (duration: 11m 21s)
- 14:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 abi, lucaswerkmeister-wmde: Continuing with sync
- 14:00 logmsgbot: lucaswerkmeister-wmde@deploy1002 abi, lucaswerkmeister-wmde: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc2017.codfw.wmnet with OS bookworm
- 13:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455)
- 13:57 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) (duration: 10m 21s)
- 13:52 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, abi: Continuing with sync
- 13:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
- 13:49 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, abi: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:48 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
- 13:46 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455)
- 13:37 godog: silence OtelCollectorRefusedSpans in codfw for 7d - T370043
- 13:35 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
- 13:28 sukhe: reprepro -C main include bookworm-wikimedia anycast-healthchecker_0.9.8-1+wmf12u2_amd64.changes: T370068
- 13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for knwikisource: Enable local uploads (T370765) (duration: 10m 14s)
- 13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Continuing with sync
- 13:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Backport for knwikisource: Enable local uploads (T370765) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for knwikisource: Enable local uploads (T370765)
- 13:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1017.eqiad.wmnet with OS bookworm
- 12:39 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bullseye
- 12:31 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@24f95a8]: (no justification provided) (duration: 00m 30s)
- 12:31 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@24f95a8]: (no justification provided)
- 11:11 dreamyjazz@deploy1002: Finished scap: Backport for Remove now unused $wgGlobalBlockingDatabase definition (T370856) (duration: 07m 27s)
- 11:06 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
- 11:06 dreamyjazz@deploy1002: dreamyjazz: Backport for Remove now unused $wgGlobalBlockingDatabase definition (T370856) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:03 dreamyjazz@deploy1002: Started scap sync-world: Backport for Remove now unused $wgGlobalBlockingDatabase definition (T370856)
- 11:00 jiji@deploy1002: Finished scap: Noop, bumping mediawiki chart version (duration: 02m 32s)
- 10:57 jiji@deploy1002: Started scap sync-world: Noop, bumping mediawiki chart version
- 10:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 10:54 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 10:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 10:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 10:33 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
- 10:28 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
- 10:16 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
- 10:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 16 hosts with reason: Legacy appserver spindown
- 10:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 16 hosts with reason: Legacy appserver spindown
- 06:54 XioNoX: deploy CR1056198 Rename LVS-service-IPs prefix-list
- 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P66908 and previous config saved to /var/cache/conftool/dbconfig/20240724-060142-marostegui.json
- 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P66907 and previous config saved to /var/cache/conftool/dbconfig/20240724-054635-marostegui.json
- 05:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 05:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T367856)', diff saved to https://phabricator.wikimedia.org/P66906 and previous config saved to /var/cache/conftool/dbconfig/20240724-053128-marostegui.json
- 05:12 akosiaris@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host deploy1003.eqiad.wmnet with OS bullseye
- 01:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc2017.codfw.wmnet with OS bookworm
- 00:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
2024-07-23
- 23:58 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1017.eqiad.wmnet with OS bookworm
- 23:54 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc2017.codfw.wmnet with OS bookworm
- 23:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc2017']
- 23:43 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2017']
- 23:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['pc2017']
- 23:42 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2017']
- 23:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2017.mgmt.codfw.wmnet with reboot policy FORCED
- 23:23 eileen: civicrm upgraded from 4247715d to c656ab2f
- 23:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host pc2017.mgmt.codfw.wmnet with reboot policy FORCED
- 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding pc2017 to codfw - jhancock@cumin2002"
- 23:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding pc2017 to codfw - jhancock@cumin2002"
- 23:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 23:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
- 23:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1017.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host pc1017.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:56 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:54 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P66905 and previous config saved to /var/cache/conftool/dbconfig/20240723-223855-ladsgroup.json
- 22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P66904 and previous config saved to /var/cache/conftool/dbconfig/20240723-223826-ladsgroup.json
- 22:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P66903 and previous config saved to /var/cache/conftool/dbconfig/20240723-223742-ladsgroup.json
- 22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P66902 and previous config saved to /var/cache/conftool/dbconfig/20240723-222349-ladsgroup.json
- 22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P66901 and previous config saved to /var/cache/conftool/dbconfig/20240723-222320-ladsgroup.json
- 22:22 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 22:22 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
- 22:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P66900 and previous config saved to /var/cache/conftool/dbconfig/20240723-222236-ladsgroup.json
- 22:08 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 22:08 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt pc1017 - jclark@cumin1002"
- 22:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P66899 and previous config saved to /var/cache/conftool/dbconfig/20240723-220844-ladsgroup.json
- 22:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P66898 and previous config saved to /var/cache/conftool/dbconfig/20240723-220815-ladsgroup.json
- 22:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P66897 and previous config saved to /var/cache/conftool/dbconfig/20240723-220731-ladsgroup.json
- 22:07 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt pc1017 - jclark@cumin1002"
- 22:03 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 21:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P66896 and previous config saved to /var/cache/conftool/dbconfig/20240723-215338-ladsgroup.json
- 21:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P66895 and previous config saved to /var/cache/conftool/dbconfig/20240723-215309-ladsgroup.json
- 21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P66894 and previous config saved to /var/cache/conftool/dbconfig/20240723-215225-ladsgroup.json
- away: UTC late deploys done
- 20:53 tgr@deploy1002: Finished scap: Backport for Respect wgTranslateNumerals in Cite footnote markers (T370585), Respect wgTranslateNumerals in Cite footnote markers (T370585) (duration: 09m 34s)
- 20:48 tgr@deploy1002: wmde-fisch, tgr: Continuing with sync
- 20:46 tgr@deploy1002: wmde-fisch, tgr: Backport for Respect wgTranslateNumerals in Cite footnote markers (T370585), Respect wgTranslateNumerals in Cite footnote markers (T370585) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:44 tgr@deploy1002: Started scap sync-world: Backport for Respect wgTranslateNumerals in Cite footnote markers (T370585), Respect wgTranslateNumerals in Cite footnote markers (T370585)
- 20:38 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 20:38 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
- 20:22 tgr@deploy1002: Finished scap: Backport for debug: Enable Special:WikimediaDebug (T350094) (duration: 09m 28s)
- 20:16 tgr@deploy1002: tgr: Continuing with sync
- 20:14 tgr@deploy1002: tgr: Backport for debug: Enable Special:WikimediaDebug (T350094) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:12 tgr@deploy1002: Started scap sync-world: Backport for debug: Enable Special:WikimediaDebug (T350094)
- 18:59 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@01e1952]: (no justification provided) (duration: 00m 30s)
- 18:58 milimetric@deploy1002: Started deploy [airflow-dags/analytics@01e1952]: (no justification provided)
- 18:45 mutante: puppetmaster1001/puppetmaster2001 - rm /var/run/confd-template/*.err to clear pybal icinga alerts after T367949
- 18:42 mutante: puppetmaster1001/puppetmaster2001 - rm /var/run/confd-template/_srv_config-master_pybal_codfw_api-https.err to clear pybal icinga alerts after T367949
- 18:40 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 18:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.15 refs T366960
- 18:13 swfrench-wmf: sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsadm --delete-service --tcp-service 10.2.2.1:443' (appservers-https eqiad) - T367949
- 18:12 aokoth@cumin1002: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1001.eqiad.wmnet
- 18:11 swfrench-wmf: sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsadm --delete-service --tcp-service 10.2.2.22:443' (api-https eqiad) - T367949
- 18:11 swfrench-wmf: sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsa
- 18:10 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
- 18:10 swfrench-wmf: sudo cumin 'A:lvs-secondary-codfw or A:lvs-low-traffic-codfw' 'ipvsa
- 18:08 swfrench-wmf: sudo cumin 'A:lvs-secondary-codfw or A:lvs-low-traffic-codfw' 'ipvsa
- 18:01 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet
- 18:01 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
- 17:58 swfrench-wmf: sudo cumin 'A:lvs-low-traffic-eqiad' 'systemctl restart pybal.service' - T367949
- 17:51 swfrench-wmf: sudo cumin 'A:lvs-secondary-eqiad' 'systemctl restart pybal.service' - T367949
- 17:46 logmsgbot: nshahquinn-wmf@deploy1002 Finished deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided) (duration: 00m 07s)
- 17:46 logmsgbot: nshahquinn-wmf@deploy1002 Started deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided)
- 17:44 swfrench-wmf: sudo cumin 'A:lvs-low-traffic-codfw' 'systemctl restart pybal.service' - T367949
- 17:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2014.codfw.wmnet
- 17:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2014.codfw.wmnet
- 17:40 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T367949)
- 17:37 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 17:33 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T367949)
- 17:28 swfrench-wmf: run-puppet-agent on O:lvs::balancer to pick up switch to service_setup, removal of profile::lvs::realserver::pools - T367949
- 17:17 swfrench-wmf: run-puppet-agent on A:dnsbox to pick up switch to lvs_setup - T367949
- 17:06 swfrench-wmf: ran authdns-update on dns1004 to pick up removal of appservers / api records - T367949
- 17:04 dancy@deploy1002: sync-world aborted: testing (duration: 00m 51s)
- 17:03 dancy@deploy1002: Started scap sync-world: testing
- 17:02 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 16:59 jhathaway: applying varnish change on cp4037, 1030591
- 16:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 16:57 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 16:16 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 16:14 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephmon1004.eqiad.wmnet
- 16:07 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 16:07 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 15:52 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephmon1004.eqiad.wmnet
- 15:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 15:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:24 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=(kubernetes1025|kubernetes1026|kubernetes1052|kubernetes1053|kubernetes1054|kubernetes1055|kubernetes1056|mw1496).eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning following T365998]
- 15:24 Emperor: moss-be1003 out of maintenance mode after network downtime T365998
- 15:22 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=dse-k8s-worker1008.eqiad.wmnet,cluster=dse-k8s,service=kubesvc
- 15:22 claime: Uncordoning dse-k8s-worker1008.eqiad.wmnet after T365998
- 15:20 andrewbogott: find /srv/mediawiki/images/wikitech/archive -type f | xargs delete on wikitech-static, drive is full of nonsense
- 15:07 brennen@deploy1002: Finished deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776 (duration: 00m 33s)
- 15:06 brennen@deploy1002: Started deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776
- 15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op) (duration: 00m 34s)
- 15:05 brennen@deploy1002: Started deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op)
- 15:05 brennen@deploy1002: Finished deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776 (duration: 01m 17s)
- 15:03 brennen@deploy1002: Started deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776
- 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
- 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
- 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
- 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
- 15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
- 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
- 15:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 25 hosts with reason: JunOS upgrade lsw1-f3-eqiad
- 15:01 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 25 hosts with reason: JunOS upgrade lsw1-f3-eqiad
- 15:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f3-eqiad,lsw1-f3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f3-eqiad
- 15:00 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f3-eqiad,lsw1-f3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f3-eqiad
- 15:00 topranks: rebooting lsw1-f3-eqiad to complete JunOS upgrade (T365998)
- 14:59 XioNoX: deploy CR1055546 border-in: remove authdns filter
- 14:59 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 14:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 14:54 Emperor: moss-be1003 into maintenance mode for network downtime T365998
- 14:48 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f3-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f3-eqiad
- 14:48 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f3-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f3-eqiad
- 14:10 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
- 14:10 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin 'A:cp' 'run-puppet-agent "merging CR #1041705"'
- 14:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
- 14:03 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
- 14:03 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
- 13:58 Lucas_WMDE: UTC afternoon backport+config window done
- 13:57 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for MoveLogFormatter::getPreloadTitles: Handle bad titles (T370396) (duration: 09m 24s)
- 13:52 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
- 13:51 XioNoX: deploy CR1055544 border-in: remove squid and nrpe filters, expand LVS filter
- 13:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for MoveLogFormatter::getPreloadTitles: Handle bad titles (T370396) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:50 sukhe: running authdns-update after dns6001 depool
- 13:50 XioNoX: deploy CR1055543: border-in: remove git-ssh term
- 13:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:49 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
- 13:48 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 13:47 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for MoveLogFormatter::getPreloadTitles: Handle bad titles (T370396)
- 13:44 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin 'A:cp' 'disable-puppet "merging CR #1041705"'
- 13:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 13:40 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
- 13:38 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow7001.magru.wmnet
- 13:37 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org [reason: upgrading anycast-hc: T370068]
- 13:34 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow7001.magru.wmnet
- 13:34 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow6001.drmrs.wmnet
- 13:31 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow5002.eqsin.wmnet
- 13:30 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow6001.drmrs.wmnet
- 13:29 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow4002.ulsfo.wmnet
- 13:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [arwiki] Enable the CampaignEvents extension (T370066) (duration: 19m 17s)
- 13:24 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow5002.eqsin.wmnet
- 13:23 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow4002.ulsfo.wmnet
- 13:22 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow3003.esams.wmnet
- 13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, daimona: Continuing with sync
- 13:16 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow3003.esams.wmnet
- 13:15 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow1002.eqiad.wmnet
- 13:11 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow1002.eqiad.wmnet
- 13:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, daimona: Backport for [arwiki] Enable the CampaignEvents extension (T370066) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:05 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=dse-k8s-worker1008.eqiad.wmnet,cluster=dse-k8s,service=kubesvc
- 13:05 claime: Cordoning dse-k8s-worker1008.eqiad.wmnet for T365998
- 13:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [arwiki] Enable the CampaignEvents extension (T370066)
- 11:28 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(kubernetes1025|kubernetes1026|kubernetes1052|kubernetes1053|kubernetes1054|kubernetes1055|kubernetes1056|mw1496).eqiad.wmnet,cluster=kubernetes,service=kubesvc
- 11:19 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:19 claime: Lowered concurrency of RecordLint job to 50 - T370304
- 11:18 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 11:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:17 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 11:16 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:15 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 10:51 Amir1: running "delete from linter where linter_cat = 23 limit 1000;" in a loop in mwmaint (T370304)
- 10:39 claime: Cordoning kubernetes1025.eqiad.wmnet kubernetes1026.eqiad.wmnet kubernetes1052.eqiad.wmnet kubernetes1053.eqiad.wmnet kubernetes1054.eqiad.wmnet kubernetes1055.eqiad.wmnet kubernetes1056.eqiad.wmnet mw1496.eqiad.wmnet for T365998
- 10:03 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 09:41 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 09:41 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 09:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 09:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 09:14 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 09:12 dreamyjazz@deploy1002: Finished scap: Backport for Define wgGlobalBlockingCentralWiki as 'metawiki' (T370457) (duration: 11m 29s)
- 09:07 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
- 09:07 dreamyjazz@deploy1002: dreamyjazz: Backport for Define wgGlobalBlockingCentralWiki as 'metawiki' (T370457) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:05 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 09:01 dreamyjazz@deploy1002: Started scap sync-world: Backport for Define wgGlobalBlockingCentralWiki as 'metawiki' (T370457)
- 08:27 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 08:17 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 07:22 kartik@deploy1002: Finished scap: Backport for uzwiki: Limit publishing in CX to 'patroller' and 'sysop' groups (T370387) (duration: 13m 37s)
- 07:17 kartik@deploy1002: kartik: Continuing with sync
- 07:15 kartik@deploy1002: kartik: Backport for uzwiki: Limit publishing in CX to 'patroller' and 'sysop' groups (T370387) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:08 kartik@deploy1002: Started scap sync-world: Backport for uzwiki: Limit publishing in CX to 'patroller' and 'sysop' groups (T370387)
- 06:58 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 06:58 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T367856)', diff saved to https://phabricator.wikimedia.org/P66892 and previous config saved to /var/cache/conftool/dbconfig/20240723-050042-marostegui.json
- 05:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 05:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 05:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 05:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66891 and previous config saved to /var/cache/conftool/dbconfig/20240723-050004-marostegui.json
- 04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P66890 and previous config saved to /var/cache/conftool/dbconfig/20240723-044457-marostegui.json
- 04:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P66889 and previous config saved to /var/cache/conftool/dbconfig/20240723-042950-marostegui.json
- 04:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66888 and previous config saved to /var/cache/conftool/dbconfig/20240723-041442-marostegui.json
- 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.12 (duration: 01m 00s)
- 03:54 mwpresync@deploy1002: Finished scap: testwikis to 1.43.0-wmf.15 refs T366960 (duration: 51m 50s)
- 03:03 mwpresync@deploy1002: Started scap sync-world: testwikis to 1.43.0-wmf.15 refs T366960
- 01:28 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 01:27 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 01:27 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 01:27 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 01:27 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 01:27 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 01:24 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 01:24 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 01:24 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 01:24 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 01:24 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 01:24 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 01:24 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 01:24 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 01:24 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 01:24 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 01:24 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 01:24 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 00:22 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
- 00:22 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
- 00:05 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow2003.codfw.wmnet
- 00:02 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow2003.codfw.wmnet
- 00:00 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on netflow2003.codfw.wmnet with reason: reboot netflow2003
- 00:00 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on netflow2003.codfw.wmnet with reason: reboot netflow2003
2024-07-22
- 23:08 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set lsw in codfw to active - cmooney@cumin1002"
- 23:07 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set lsw in codfw to active - cmooney@cumin1002"
- 23:05 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:03 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 22:47 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1005.eqiad.wmnet with OS bullseye
- 22:38 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 22:37 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic110[0-2]* for T348977 - bking@cumin2002
- 22:36 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic110[0-2]* for T348977 - bking@cumin2002
- 22:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: T348977
- 22:34 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: T348977
- 22:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
- 21:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 21:30 catrope@deploy1002: Finished scap: Backport for Do not unreview pages when they are moved (T370593) (duration: 20m 27s)
- 21:25 catrope@deploy1002: catrope, soda: Continuing with sync
- 21:24 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:24 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new pfw3-codfw mgmt IP - cmooney@cumin1002"
- 21:12 catrope@deploy1002: catrope, soda: Backport for Do not unreview pages when they are moved (T370593) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:10 catrope@deploy1002: Started scap sync-world: Backport for Do not unreview pages when they are moved (T370593)
- 21:09 catrope@deploy1002: Finished scap: Backport for SpecialMovePage: fix logic to check `delete-redirect` (T370669) (duration: 19m 12s)
- 21:04 catrope@deploy1002: catrope, matmarex: Continuing with sync
- 20:52 catrope@deploy1002: catrope, matmarex: Backport for SpecialMovePage: fix logic to check `delete-redirect` (T370669) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:50 catrope@deploy1002: Started scap sync-world: Backport for SpecialMovePage: fix logic to check `delete-redirect` (T370669)
- 20:49 catrope@deploy1002: Finished scap: Backport for HACK: add option to checked-disable checkboxes (T370611), HACK: show structured link task as disabled if frontend flag is true (T370611) (duration: 08m 27s)
- 20:47 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new pfw3-codfw mgmt IP - cmooney@cumin1002"
- 20:46 topranks: applying additional address to pfw3-codfw reth0.2140 to provide space for new hosts (T370164)
- 20:44 catrope@deploy1002: catrope, migr: Continuing with sync
- 20:43 catrope@deploy1002: catrope, migr: Backport for HACK: add option to checked-disable checkboxes (T370611), HACK: show structured link task as disabled if frontend flag is true (T370611) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:40 catrope@deploy1002: Started scap sync-world: Backport for HACK: add option to checked-disable checkboxes (T370611), HACK: show structured link task as disabled if frontend flag is true (T370611)
- 20:40 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 20:12 catrope@deploy1002: Finished scap: Backport for Work around T370517 by remapping the affected i18n message (T370517) (duration: 08m 24s)
- 20:07 catrope@deploy1002: catrope: Continuing with sync
- 20:06 catrope@deploy1002: catrope: Backport for Work around T370517 by remapping the affected i18n message (T370517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:04 catrope@deploy1002: Started scap sync-world: Backport for Work around T370517 by remapping the affected i18n message (T370517)
- 19:54 dancy@deploy1002: Finished scap: Backport for MWMultiVersion.php: Use FORCE_MW_VERSION instead of MW_FORCE_VERSION (T369115) (duration: 20m 22s)
- 19:47 dancy@deploy1002: dancy: Continuing with sync
- 19:47 dancy@deploy1002: dancy: Backport for MWMultiVersion.php: Use FORCE_MW_VERSION instead of MW_FORCE_VERSION (T369115) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 19:34 dancy@deploy1002: Started scap sync-world: Backport for MWMultiVersion.php: Use FORCE_MW_VERSION instead of MW_FORCE_VERSION (T369115)
- 18:36 ejegg: civicrm upgraded from a9ef8ab9 to 4247715d
- 18:27 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts2001.codfw.wmnet
- 18:27 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts2001.codfw.wmnet
- 18:13 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts2001.codfw.wmnet
- 18:12 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts2001.codfw.wmnet
- 18:12 aokoth@cumin1002: END (ERROR) - Cookbook sre.vrts.upgrade (exit_code=97) on VRTS host vrts2001.codfw.wmnet
- 18:12 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts2001.codfw.wmnet
- 17:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new cloudceph nodes - cmooney@cumin1002"
- 17:41 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new cloudceph nodes - cmooney@cumin1002"
- 17:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 17:32 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 17:11 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 17:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
- 17:09 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
- 17:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 17:09 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 17:09 ayounsi@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 17:08 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 16:37 sukhe: [doh1001] upgrade anycast-healthchecker to 0.9.8-1+wmf12u1: T370068
- 16:32 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2035.codfw.wmnet|wikikube-worker2036.codfw.wmnet|wikikube-worker2037.codfw.wmnet|wikikube-worker2038.codfw.wmnet),cluster=kubernetes,service=kubesvc
- 16:31 claime: Pooling and uncordoning wikikube-worker2035.codfw.wmnet wikikube-worker2036.codfw.wmnet wikikube-worker2037.codfw.wmnet wikikube-worker2038.codfw.wmnet - T351074
- 16:31 sukhe: restart anycast-hc on durum1001
- 16:13 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephmon1004.eqiad.wmnet
- 16:08 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephmon1004.eqiad.wmnet
- 16:02 elukey: remove /srv/kafka/data/eqiad.resource-purge-3 on kafka-main2001 to force a refetch of data from good replicas and circumvent data corruption - T370574
- 15:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2001.codfw.wmnet with reason: attempt to remove a data dir on disk
- 15:57 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2001.codfw.wmnet with reason: attempt to remove a data dir on disk
- 15:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-test1006.eqiad.wmnet with reason: attempt to remove a data dir on disk
- 15:49 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-test1006.eqiad.wmnet with reason: attempt to remove a data dir on disk
- 15:08 dancy@deploy1002: Finished scap: Backport for MWMultiVersion.php: Allow MW_FORCE_VERSION to pin the mw version (T369115) (duration: 09m 10s)
- 15:03 dancy@deploy1002: dancy: Continuing with sync
- 15:01 dancy@deploy1002: dancy: Backport for MWMultiVersion.php: Allow MW_FORCE_VERSION to pin the mw version (T369115) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:59 dancy@deploy1002: Started scap sync-world: Backport for MWMultiVersion.php: Allow MW_FORCE_VERSION to pin the mw version (T369115)
- 14:26 zabe@deploy1002: Finished scap: Backport for Revert^2 "Set some site names for new-ish wikis" (T363270 T360303 T360310 T363263) (duration: 10m 54s)
- 14:21 zabe@deploy1002: zabe: Continuing with sync
- 14:17 zabe@deploy1002: zabe: Backport for Revert^2 "Set some site names for new-ish wikis" (T363270 T360303 T360310 T363263) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:15 zabe@deploy1002: Started scap sync-world: Backport for Revert^2 "Set some site names for new-ish wikis" (T363270 T360303 T360310 T363263)
- 14:08 tchanders@deploy1002: Finished scap: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895), Fix logic for handling enabling temporary accounts (T348895) (duration: 07m 11s)
- 14:03 tchanders@deploy1002: tchanders: Continuing with sync
- 14:03 tchanders@deploy1002: tchanders: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895), Fix logic for handling enabling temporary accounts (T348895) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:01 tchanders@deploy1002: Started scap sync-world: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895), Fix logic for handling enabling temporary accounts (T348895)
- 13:45 tchanders@deploy1002: tchanders: Continuing with sync
- 13:42 tchanders@deploy1002: tchanders: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895), Fix logic for handling enabling temporary accounts (T348895) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:39 tchanders@deploy1002: Started scap sync-world: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895), Fix logic for handling enabling temporary accounts (T348895)
- 13:29 tchanders@deploy1002: Sync cancelled.
- 13:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on rdb1014.eqiad.wmnet with reason: Hardware issue
- 13:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on rdb1014.eqiad.wmnet with reason: Hardware issue
- 13:21 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on netbox1002.eqiad.wmnet with reason: Netbox 3 silencing
- 13:20 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on netbox1002.eqiad.wmnet with reason: Netbox 3 silencing
- 13:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on netbox2002.codfw.wmnet with reason: Netbox 3 silencing
- 13:20 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on netbox2002.codfw.wmnet with reason: Netbox 3 silencing
- 13:13 tchanders@deploy1002: tchanders: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:11 tchanders@deploy1002: Started scap sync-world: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895)
- 13:07 claime: power cycling rdb1014.eqiad.wmnet
- 12:22 godog: restore retention.ms=172800000 for mediawiki.httpd.accesslog
- 11:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 11:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 11:17 ladsgroup@deploy1002: Finished scap: Backport for Enable ICU provided alphabetical order in the Kurdish wikis categories (T48235) (duration: 08m 02s)
- 11:12 ladsgroup@deploy1002: ebrahim, ladsgroup: Continuing with sync
- 11:11 ladsgroup@deploy1002: ebrahim, ladsgroup: Backport for Enable ICU provided alphabetical order in the Kurdish wikis categories (T48235) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:09 ladsgroup@deploy1002: Started scap sync-world: Backport for Enable ICU provided alphabetical order in the Kurdish wikis categories (T48235)
- 10:33 volans: upgraded manually prometheus-ipmi-exporter to v 1.8.0-1~wmf12+1 on db1179 (leftover because was down) T368088
- 10:32 Dreamy_Jazz: Running `mwscript extensions/MediaModeration/maintenance/updateMetrics.php --wiki=commonswiki --verbose`
- 10:28 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
- 10:24 elukey: kafka preferred-replica-election on kafka-main - T370574
- 09:51 godog: set mediawiki.httpd.accesslog topic retention to 26h temporarily
- 09:50 mlitn@deploy1002: Finished scap: Backport for Reduce weight of 'main subject' as it's used inconsistently (T367774) (duration: 08m 19s)
- 09:45 mlitn@deploy1002: cparle, mlitn: Continuing with sync
- 09:44 mlitn@deploy1002: cparle, mlitn: Backport for Reduce weight of 'main subject' as it's used inconsistently (T367774) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:42 mlitn@deploy1002: Started scap sync-world: Backport for Reduce weight of 'main subject' as it's used inconsistently (T367774)
- 09:40 claime: homer 'cr*codfw*' commit 'T351074'
- 09:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.7 to future netbox prod - ayounsi@cumin1002 - T336275
- 09:21 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.7 to future netbox prod - ayounsi@cumin1002 - T336275
- 09:03 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.7 to future netbox prod - ayounsi@cumin1002 - T336275
- 09:00 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.7 to future netbox prod - ayounsi@cumin1002 - T336275
- 08:56 godog: rebalance mediawiki.httpd.accesslog partitions across brokers - T370129
- 08:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
- 08:50 ayounsi@cumin1002: START - Cookbook sre.postgresql.postgres-init
- 08:32 elukey: restart kafka on kafka-main2005 - T370574
- 08:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-main2005.codfw.wmnet with reason: restart attempt
- 08:30 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-main2005.codfw.wmnet with reason: restart attempt
- 08:24 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
- 08:23 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
- 08:07 elukey: restart kafka on kafka-main2001 - T370574
- 08:06 elukey: restart kafka on kafka-main2001 - sre.hosts.downtime
- 08:06 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-main2001.codfw.wmnet with reason: restart attempt
- 08:05 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-main2001.codfw.wmnet with reason: restart attempt
- 08:03 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts karapace1002.eqiad.wmnet
- 08:00 brouberol@cumin1002: START - Cookbook sre.hosts.decommission for hosts karapace1002.eqiad.wmnet
- 07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
- 07:39 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
- 07:35 stran@deploy1002: Finished scap: Backport for IPInfoHandler: Move token param definition to getBodyParamSettings (T370500) (duration: 12m 18s)
- 07:30 stran@deploy1002: stran: Continuing with sync
- 07:25 stran@deploy1002: stran: Backport for IPInfoHandler: Move token param definition to getBodyParamSettings (T370500) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:23 stran@deploy1002: Started scap sync-world: Backport for IPInfoHandler: Move token param definition to getBodyParamSettings (T370500)
- 07:12 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 07:12 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 02:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66880 and previous config saved to /var/cache/conftool/dbconfig/20240722-025552-marostegui.json
- 02:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 02:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 02:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T367856)', diff saved to https://phabricator.wikimedia.org/P66879 and previous config saved to /var/cache/conftool/dbconfig/20240722-025530-marostegui.json
- 02:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P66878 and previous config saved to /var/cache/conftool/dbconfig/20240722-024023-marostegui.json
- 02:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P66877 and previous config saved to /var/cache/conftool/dbconfig/20240722-022516-marostegui.json
- 02:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T367856)', diff saved to https://phabricator.wikimedia.org/P66876 and previous config saved to /var/cache/conftool/dbconfig/20240722-021009-marostegui.json
- 01:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Maint over (T369855 T370304)', diff saved to https://phabricator.wikimedia.org/P66875 and previous config saved to /var/cache/conftool/dbconfig/20240722-015302-ladsgroup.json
- 01:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Maint over (T369855 T370304)', diff saved to https://phabricator.wikimedia.org/P66874 and previous config saved to /var/cache/conftool/dbconfig/20240722-013756-ladsgroup.json
- 01:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Maint over (T369855 T370304)', diff saved to https://phabricator.wikimedia.org/P66873 and previous config saved to /var/cache/conftool/dbconfig/20240722-012251-ladsgroup.json
- 01:19 ladsgroup@deploy1002: Finished scap: Backport for Stop storing missing-image-alt-text lints (T370304) (duration: 08m 48s)
- 01:13 ladsgroup@deploy1002: ladsgroup: Continuing with sync
- 01:13 ladsgroup@deploy1002: ladsgroup: Backport for Stop storing missing-image-alt-text lints (T370304) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 01:10 ladsgroup@deploy1002: Started scap sync-world: Backport for Stop storing missing-image-alt-text lints (T370304)
- 01:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Maint over (T369855 T370304)', diff saved to https://phabricator.wikimedia.org/P66872 and previous config saved to /var/cache/conftool/dbconfig/20240722-010745-ladsgroup.json
2024-07-21
- 23:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367856)', diff saved to https://phabricator.wikimedia.org/P66871 and previous config saved to /var/cache/conftool/dbconfig/20240721-232234-marostegui.json
- 23:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P66870 and previous config saved to /var/cache/conftool/dbconfig/20240721-230727-marostegui.json
- 22:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P66869 and previous config saved to /var/cache/conftool/dbconfig/20240721-225219-marostegui.json
- 22:44 ladsgroup@deploy1002: Finished scap: Backport for Disable missing-image-alt-text lint (T370304) (duration: 26m 27s)
- 22:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367856)', diff saved to https://phabricator.wikimedia.org/P66868 and previous config saved to /var/cache/conftool/dbconfig/20240721-223712-marostegui.json
- 22:36 ladsgroup@deploy1002: ladsgroup: Continuing with sync
- 22:35 ladsgroup@deploy1002: ladsgroup: Backport for Disable missing-image-alt-text lint (T370304) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 22:18 ladsgroup@deploy1002: Started scap sync-world: Backport for Disable missing-image-alt-text lint (T370304)
- 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367856)', diff saved to https://phabricator.wikimedia.org/P66867 and previous config saved to /var/cache/conftool/dbconfig/20240721-085853-marostegui.json
- 08:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 08:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367856)', diff saved to https://phabricator.wikimedia.org/P66866 and previous config saved to /var/cache/conftool/dbconfig/20240721-085832-marostegui.json
- 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P66865 and previous config saved to /var/cache/conftool/dbconfig/20240721-084325-marostegui.json
- 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P66864 and previous config saved to /var/cache/conftool/dbconfig/20240721-082818-marostegui.json
- 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367856)', diff saved to https://phabricator.wikimedia.org/P66863 and previous config saved to /var/cache/conftool/dbconfig/20240721-081310-marostegui.json
- 02:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T367856)', diff saved to https://phabricator.wikimedia.org/P66862 and previous config saved to /var/cache/conftool/dbconfig/20240721-020121-marostegui.json
- 02:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 02:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 02:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367856)', diff saved to https://phabricator.wikimedia.org/P66861 and previous config saved to /var/cache/conftool/dbconfig/20240721-020059-marostegui.json
- 01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P66860 and previous config saved to /var/cache/conftool/dbconfig/20240721-014552-marostegui.json
- 01:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P66859 and previous config saved to /var/cache/conftool/dbconfig/20240721-013044-marostegui.json
- 01:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367856)', diff saved to https://phabricator.wikimedia.org/P66858 and previous config saved to /var/cache/conftool/dbconfig/20240721-011537-marostegui.json
2024-07-20
- 19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T367856)', diff saved to https://phabricator.wikimedia.org/P66857 and previous config saved to /var/cache/conftool/dbconfig/20240720-190046-marostegui.json
- 19:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
- 19:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
- 19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367856)', diff saved to https://phabricator.wikimedia.org/P66856 and previous config saved to /var/cache/conftool/dbconfig/20240720-190024-marostegui.json
- 18:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P66855 and previous config saved to /var/cache/conftool/dbconfig/20240720-184516-marostegui.json
- 18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P66854 and previous config saved to /var/cache/conftool/dbconfig/20240720-183009-marostegui.json
- 18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367856)', diff saved to https://phabricator.wikimedia.org/P66853 and previous config saved to /var/cache/conftool/dbconfig/20240720-181502-marostegui.json
- 14:30 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1005.eqiad.wmnet with OS bullseye
- 14:22 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 14:16 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
- 14:16 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
- 14:15 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephmon1006
- 14:15 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1006
- 14:15 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1005
- 14:15 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
- 14:15 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
- 14:14 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
- 14:10 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
- 14:10 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
- 14:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephmon1004-6 - jclark@cumin1002"
- 14:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephmon1004-6 - jclark@cumin1002"
- 14:06 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 14:06 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
- 14:05 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
- 14:05 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
- 14:05 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
- 13:59 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephmon1006
- 13:59 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1006
- 13:54 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
- 13:54 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
- 13:47 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
- 13:47 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
- 13:47 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1005
- 13:47 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
- 13:45 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephmon1005
- 13:45 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
- 13:45 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
- 13:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
- 13:34 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1005
- 13:34 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
- 13:33 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
- 13:33 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
- 13:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
- 13:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 08:15 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 08:15 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 08:15 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 08:15 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 08:15 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 08:15 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 06:21 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 03:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T367856)', diff saved to https://phabricator.wikimedia.org/P66852 and previous config saved to /var/cache/conftool/dbconfig/20240720-033501-marostegui.json
- 03:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
- 03:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
- 01:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T367856)', diff saved to https://phabricator.wikimedia.org/P66851 and previous config saved to /var/cache/conftool/dbconfig/20240720-011705-marostegui.json
- 01:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 01:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 01:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367856)', diff saved to https://phabricator.wikimedia.org/P66850 and previous config saved to /var/cache/conftool/dbconfig/20240720-011643-marostegui.json
- 01:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P66849 and previous config saved to /var/cache/conftool/dbconfig/20240720-010136-marostegui.json
- 00:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P66848 and previous config saved to /var/cache/conftool/dbconfig/20240720-004629-marostegui.json
- 00:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367856)', diff saved to https://phabricator.wikimedia.org/P66847 and previous config saved to /var/cache/conftool/dbconfig/20240720-003122-marostegui.json
- 00:26 jclark@cumin1002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host db1179.mgmt.eqiad.wmnet with reboot policy GRACEFUL
- 00:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1179.mgmt.eqiad.wmnet with reboot policy GRACEFUL
2024-07-19
- 21:14 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1061.eqiad.wmnet with OS bookworm
- 20:52 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
- 20:49 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
- 20:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1061.eqiad.wmnet with OS bookworm
- 17:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new irb ints codfw row c and d - cmooney@cumin1002"
- 17:20 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new irb ints codfw row c and d - cmooney@cumin1002"
- 17:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 17:13 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 17:12 topranks: adding irb ints for row c/d vlans to codfw leaf switches in those rows T364095
- 17:05 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 16:48 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:46 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 16:20 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
- 16:20 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
- 16:13 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
- 16:11 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
- 15:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2038.codfw.wmnet with OS bullseye
- 15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['gerrit2003']
- 15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit2003']
- 15:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['gerrit2003']
- 15:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2037.codfw.wmnet with OS bullseye
- 15:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit2003']
- 15:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['gerrit2003']
- 15:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit2003']
- 15:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['gerrit2003']
- 15:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit2003']
- 15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2038.codfw.wmnet with reason: host reimage
- 15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2038.codfw.wmnet with reason: host reimage
- 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gerrit2003.mgmt.codfw.wmnet with reboot policy FORCED
- 15:25 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2002.codfw.wmnet
- 15:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2037.codfw.wmnet with reason: host reimage
- 15:17 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2037.codfw.wmnet with reason: host reimage
- 15:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host gerrit2003.mgmt.codfw.wmnet with reboot policy FORCED
- 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding gerrit2003 to codfw - jhancock@cumin2002"
- 15:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding gerrit2003 to codfw - jhancock@cumin2002"
- 15:11 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2038.codfw.wmnet with OS bullseye
- 15:09 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2038.codfw.wmnet with OS bullseye
- 14:59 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2037.codfw.wmnet with OS bullseye
- 14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2035.codfw.wmnet with OS bullseye
- 14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2036.codfw.wmnet with OS bullseye
- 14:49 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2002.codfw.wmnet
- 14:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2038.codfw.wmnet with OS bullseye
- 14:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2037.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 14:43 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:43 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2002 - cmooney@cumin1002"
- 14:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2002 - cmooney@cumin1002"
- 14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2037.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 14:39 godog: power off centrallog1002 for network upgrade - T369825
- 14:38 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on centrallog1002.eqiad.wmnet with reason: network upgrade
- 14:38 filippo@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on centrallog1002.eqiad.wmnet with reason: network upgrade
- 14:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 14:36 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2038.codfw.wmnet with OS bullseye
- 14:36 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2037.codfw.wmnet with OS bullseye
- 14:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2035.codfw.wmnet with reason: host reimage
- 14:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2036.codfw.wmnet with reason: host reimage
- 14:28 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2035.codfw.wmnet with reason: host reimage
- 14:27 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2036.codfw.wmnet with reason: host reimage
- 14:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2038.codfw.wmnet with OS bullseye
- 14:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2037.codfw.wmnet with OS bullseye
- 14:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2036.codfw.wmnet with OS bullseye
- 14:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2035.codfw.wmnet with OS bullseye
- 14:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2439 to wikikube-worker2038
- 14:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2038
- 14:06 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2038
- 14:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2439 to wikikube-worker2038 - cgoubert@cumin1002"
- 14:05 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2439 to wikikube-worker2038 - cgoubert@cumin1002"
- 14:03 herron@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thanos-web,name=titan1001.eqiad.wmnet
- 14:02 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 14:02 herron@puppetmaster1001: conftool action : set/pooled=no; selector: service=thanos-web,name=titan1001.eqiad.wmnet
- 14:02 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2439 to wikikube-worker2038
- 14:02 herron@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thanos-web,name=titan1002.eqiad.wmnet
- 14:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2438 to wikikube-worker2037
- 14:01 herron@puppetmaster1001: conftool action : set/pooled=no; selector: service=thanos-web,name=titan1002.eqiad.wmnet
- 14:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2037
- 13:59 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2037
- 13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2438 to wikikube-worker2037 - cgoubert@cumin1002"
- 13:57 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2438 to wikikube-worker2037 - cgoubert@cumin1002"
- 13:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 13:55 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2438 to wikikube-worker2037
- 13:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2433 to wikikube-worker2036
- 13:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2036
- 13:52 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2036
- 13:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2433 to wikikube-worker2036 - cgoubert@cumin1002"
- 13:51 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2433 to wikikube-worker2036 - cgoubert@cumin1002"
- 13:48 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 13:48 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2433 to wikikube-worker2036
- 13:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2432 to wikikube-worker2035
- 13:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2035
- 13:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2035
- 13:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2432 to wikikube-worker2035 - cgoubert@cumin1002"
- 13:42 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2432 to wikikube-worker2035 - cgoubert@cumin1002"
- 13:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 13:39 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2432 to wikikube-worker2035
- 13:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 13:21 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 12:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 12:49 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 12:47 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 12:47 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 12:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.convert-disks (exit_code=0) for host mw2439
- 12:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'T365998 - depooling db1195 - s1 db1202 - s7 db1203 - s8', diff saved to https://phabricator.wikimedia.org/P66843 and previous config saved to /var/cache/conftool/dbconfig/20240719-122320-arnaudb.json
- 12:20 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 12:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
- 12:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
- 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367856)', diff saved to https://phabricator.wikimedia.org/P66842 and previous config saved to /var/cache/conftool/dbconfig/20240719-121933-marostegui.json
- 12:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
- 12:18 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
- 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
- 12:13 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
- 12:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
- 12:12 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
- 12:12 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
- 12:10 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
- 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
- 12:09 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
- 12:09 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
- 12:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P66841 and previous config saved to /var/cache/conftool/dbconfig/20240719-120426-marostegui.json
- 12:01 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P66840 and previous config saved to /var/cache/conftool/dbconfig/20240719-114919-marostegui.json
- 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367856)', diff saved to https://phabricator.wikimedia.org/P66839 and previous config saved to /var/cache/conftool/dbconfig/20240719-113412-marostegui.json
- 11:10 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
- 11:07 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 11:05 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 11:05 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 10:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
- 10:54 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
- 10:54 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
- 10:49 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.convert-disks (exit_code=97) for host mw2439
- 10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
- 10:41 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
- 10:41 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
- 10:38 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
- 10:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
- 10:37 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
- 10:37 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
- 10:28 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
- 10:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
- 10:13 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
- 10:13 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
- 10:06 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 10:05 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 10:00 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 10:00 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 09:58 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.convert-disks (exit_code=97) for host mw2439
- 09:54 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet
- 09:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
- 09:41 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
- 09:41 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
- 09:35 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
- 09:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
- 09:35 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
- 09:35 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
- 09:32 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
- 09:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
- 09:21 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
- 09:21 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
- 08:16 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 08:16 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 08:15 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 08:15 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 08:15 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 08:15 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 08:08 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2438
- 08:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2438.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 08:05 elukey@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
- 02:50 eileen: civicrm upgraded from 384fe444 to a9ef8ab9
- 00:28 zabe@deploy1002: sync-world aborted: Backport for Set some site names for new-ish wikis (T363270 T360303 T360310 T363263) (duration: 01m 33s)
- 00:26 zabe@deploy1002: Started scap sync-world: Backport for Set some site names for new-ish wikis (T363270 T360303 T360310 T363263)
2024-07-18
- 23:57 topranks: re-enable ssw<->ssw bgp in codfw to move east-west traffic away from CRs T369274
- 23:46 topranks: move IP GW for vlan private1-d-codfw to ssw1-d1-codfw and ssw1-d8-codfw T369274
- 23:44 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:44 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for migrated codfw gw IPs - cmooney@cumin1002"
- 23:44 topranks: remove VRRP group for private1-d-codfw vlan on cr1-codfw and cr2-codfw
- 23:43 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for migrated codfw gw IPs - cmooney@cumin1002"
- 23:40 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 23:36 topranks: move outbound gateway for private1-d-codfw vlan from cr1-codfw to ssw1-d1-codfw
- 23:31 topranks: disable IPv6 RA generation for private1-d-codfw vlan on cr1-codfw and cr2-codfw T369274
- 23:17 topranks: enable IPv6 RA generation for private1-d-codfw vlan from ssw1-d1-codfw and ssw1-d8-codfw T369274
- 23:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T367856)', diff saved to https://phabricator.wikimedia.org/P66838 and previous config saved to /var/cache/conftool/dbconfig/20240718-231639-marostegui.json
- 23:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 23:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 23:05 topranks: Remove VRRP group for vlan private1-c-codfw on cr1-codfw and cr2-codfw
- 22:49 topranks: Re-route outbound traffic for private1-c-codfw vlan on to ssw1-d1-codfw
- 22:33 topranks: Disable IPv6 RA generation for private1-c-codfw vlan on cr1-codfw and cr2-codfw T369274
- 22:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic1100.eqiad.wmnet with reason: catch up on indexing
- 22:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic1100.eqiad.wmnet with reason: catch up on indexing
- 22:15 topranks: add IP interfaces for private1-c-codfw vlan to ssw1-d1-codfw and ssw1-d8-codfw
- 22:03 topranks: move GW IPs for public1-d-codfw vlan to ssw1-d1-codfw and ssw1-d8-codfw T369274
- 21:58 topranks: remove VRRP group on cr1-codfw and cr2-codfw for public1-d-codfw vlan T369274
- 21:57 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
- 21:57 bking@cumin2002: START - Cookbook sre.elasticsearch.force-shard-allocation
- 21:39 topranks: disable IPv6 RA generation on cr1-codfw and cr2-codfw for public1-d-codfw vlan T369274
- 21:21 topranks: enable IPv6 RA generation on ssw1-d1-codfw and ssw1-d8-codfw for public1-d-codfw vlan T369274
- 21:14 dancy@deploy1002: Finished scap: Backport for Fix guard clause in Revision Hook Handler and Precheck (T370161) (duration: 12m 02s)
- 21:09 dancy@deploy1002: suecarmol, dancy: Continuing with sync
- 21:08 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 21:08 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
- 21:04 dancy@deploy1002: suecarmol, dancy: Backport for Fix guard clause in Revision Hook Handler and Precheck (T370161) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:02 dancy@deploy1002: Started scap sync-world: Backport for Fix guard clause in Revision Hook Handler and Precheck (T370161)
- 21:01 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.14 refs T366959
- 20:52 dancy@deploy1002: Finished scap: Backport for Fixes client preferences error (T370441) (duration: 11m 22s)
- 20:49 topranks: remove VRRP for public1-c-codfw vlan from cr1-codfw and cr2-codfw T369274
- 20:47 dancy@deploy1002: dancy, jdlrobson: Continuing with sync
- 20:43 dancy@deploy1002: dancy, jdlrobson: Backport for Fixes client preferences error (T370441) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:41 dancy@deploy1002: Started scap sync-world: Backport for Fixes client preferences error (T370441)
- 20:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367856)', diff saved to https://phabricator.wikimedia.org/P66836 and previous config saved to /var/cache/conftool/dbconfig/20240718-202511-marostegui.json
- 20:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 20:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 20:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367856)', diff saved to https://phabricator.wikimedia.org/P66835 and previous config saved to /var/cache/conftool/dbconfig/20240718-202449-marostegui.json
- 20:04 topranks: enabling IPv6 RA generation for public1-c-codfw on ssw1-d1-codfw and ssw1-d8-codfw T369274
- 19:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P66832 and previous config saved to /var/cache/conftool/dbconfig/20240718-195434-marostegui.json
- 19:54 dancy@deploy1002: Finished scap: Backport for [i18n] Change the names of the Arabic months (T370456) (duration: 10m 23s)
- 19:47 dancy@deploy1002: dancy: Continuing with sync
- 19:46 dancy@deploy1002: dancy: Backport for [i18n] Change the names of the Arabic months (T370456) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 19:43 dancy@deploy1002: Started scap sync-world: Backport for [i18n] Change the names of the Arabic months (T370456)
- 19:43 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:43 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new IRB interfaces codfw - cmooney@cumin1002"
- 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new IRB interfaces codfw - cmooney@cumin1002"
- 19:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367856)', diff saved to https://phabricator.wikimedia.org/P66831 and previous config saved to /var/cache/conftool/dbconfig/20240718-193927-marostegui.json
- 19:38 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 19:37 topranks: add IRB int on public1-c-codfw vlan to ssw1-d1-codfw and ssw1-d8-codfw T369274
- 19:37 denisse: Send SIGQUIT signal to the benthos service after a goroutine was waiting forever in webrequest_live.yaml - T369256
- 19:34 topranks: disable BGP between spine switches in rows A and row D prior to enabling IP GW (T369274)
- 19:32 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ssw1-a[1,8]-codfw.mgmt,ssw1-d[1,8]-codfw.mgmt with reason: Migrate codfw row c and d IP GWs from CRs to Spines
- 19:31 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ssw1-a[1,8]-codfw.mgmt,ssw1-d[1,8]-codfw.mgmt with reason: Migrate codfw row c and d IP GWs from CRs to Spines
- 19:12 topranks: enabling BGP session from cr1-codfw to ssw1-d1-codfw
- 19:07 dancy@deploy1002: Installing scap version "4.93.0" for 232 hosts
- 18:30 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet
- 18:27 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
- 18:17 swfrench-wmf: api-ro.discovery.wmnet now resolves to failoid - T367949
- 18:03 swfrench-wmf: appservers-ro.discovery.wmnet now resolves to failoid - T367949
- 18:01 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet
- 18:01 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
- 17:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P66829 and previous config saved to /var/cache/conftool/dbconfig/20240718-174547-root.json
- 17:43 topranks: disabling cr2-codfw port et-1/1/0 connecting to asw-c-codfw T366941
- 17:38 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2438.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 17:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
- 17:29 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
- 17:29 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
- 17:28 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.convert-disks (exit_code=97) for host mw2438
- 17:24 topranks: making cr1-codfw interfaces connecting ssw1-d1-codfw VRRP master for row c & d vlans T366941
- 17:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
- 17:20 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
- 17:20 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
- 17:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2438
- 17:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
- 17:15 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
- 17:15 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
- 17:10 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2438
- 17:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
- 17:10 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
- 17:09 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
- 16:52 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2438
- 16:39 topranks: resetting line card 1/1 on cr1-codfw (T366941)
- 16:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2438.codfw.wmnet
- 16:35 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host mw2438.codfw.wmnet
- 16:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
- 16:34 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on ssw1-a1-codfw.mgmt with reason: bouncing line card on cr1-codfw
- 16:34 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on ssw1-a1-codfw.mgmt with reason: bouncing line card on cr1-codfw
- 16:32 papaul: re-enable option 82 on lsw1-b7-codfw
- 16:26 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
- 16:25 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
- 16:24 papaul: disable option 82 on lsw1-b7-codfw to test pxe boot issue
- 16:23 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2433
- 16:21 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cloudsw1-b1-codfw.mgmt,pfw3-codfw with reason: bouncing line card on cr1-codfw
- 16:21 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on cloudsw1-b1-codfw.mgmt,pfw3-codfw with reason: bouncing line card on cr1-codfw
- 16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2433.codfw.wmnet
- 16:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host mw2433.codfw.wmnet
- 16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2433.codfw.wmnet
- 16:10 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2433.codfw.wmnet
- 16:10 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2433
- 16:07 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cloudsw1-b1-codfw.mgmt,pfw3-codfw with reason: bouncing line card on cr1-codfw
- 16:07 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on cloudsw1-b1-codfw.mgmt,pfw3-codfw with reason: bouncing line card on cr1-codfw
- 15:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 15:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 15:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 100%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66827 and previous config saved to /var/cache/conftool/dbconfig/20240718-153748-arnaudb.json
- 15:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66826 and previous config saved to /var/cache/conftool/dbconfig/20240718-153731-arnaudb.json
- 15:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66825 and previous config saved to /var/cache/conftool/dbconfig/20240718-153718-arnaudb.json
- 15:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2001.codfw.wmnet
- 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2433.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2433.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 75%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66824 and previous config saved to /var/cache/conftool/dbconfig/20240718-152243-arnaudb.json
- 15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66823 and previous config saved to /var/cache/conftool/dbconfig/20240718-152225-arnaudb.json
- 15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66822 and previous config saved to /var/cache/conftool/dbconfig/20240718-152213-arnaudb.json
- 15:19 topranks: disabling interface et-1/1/3 on cr1-codfw (facing asw-d-codfw) T366941
- 15:17 topranks: disabling interface et-1/1/0 on cr1-codfw (facing asw-c-codfw) T366941
- 15:13 elukey@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
- 15:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cr[1-2]-codfw,ssw1-d[1,8]-codfw with reason: Move asw-c-codfw and asw-d-codfw CR uplinks
- 15:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on cr[1-2]-codfw,ssw1-d[1,8]-codfw with reason: Move asw-c-codfw and asw-d-codfw CR uplinks
- 15:12 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2433
- 15:09 mforns@deploy1002: Finished deploy [airflow-dags/analytics@cde3c31]: (no justification provided) (duration: 00m 30s)
- 15:08 mforns@deploy1002: Started deploy [airflow-dags/analytics@cde3c31]: (no justification provided)
- 15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 50%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66821 and previous config saved to /var/cache/conftool/dbconfig/20240718-150737-arnaudb.json
- 15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66820 and previous config saved to /var/cache/conftool/dbconfig/20240718-150720-arnaudb.json
- 15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 50%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66819 and previous config saved to /var/cache/conftool/dbconfig/20240718-150708-arnaudb.json
- 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2433
- 14:58 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2433
- 14:58 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host mw2433.codfw.wmnet
- 14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 25%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66818 and previous config saved to /var/cache/conftool/dbconfig/20240718-145232-arnaudb.json
- 14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66817 and previous config saved to /var/cache/conftool/dbconfig/20240718-145214-arnaudb.json
- 14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: maintenance rescheduled', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240718-145157-arnaudb.json
- 14:47 arnaudb@cumin1002: dbctl commit (dc=all): 'T365998 - depooling db1195 - s1 db1202 - s7 db1203 - s8', diff saved to https://phabricator.wikimedia.org/P66816 and previous config saved to /var/cache/conftool/dbconfig/20240718-144754-arnaudb.json
- 14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host mw2433.codfw.wmnet
- 14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2433.codfw.wmnet
- 14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1004.eqiad.wmnet with OS bookworm
- 14:38 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2433.codfw.wmnet
- 14:38 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2433
- 14:17 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 14:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
- 14:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
- 14:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
- 14:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
- 14:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
- 14:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
- 14:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
- 14:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
- 14:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
- 14:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
- 13:55 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
- 13:53 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
- 13:50 brett: Release ncmonitor 1.1.0-1 to bookworm-wikimedia
- 13:46 Dreamy_Jazz: Afternoon UTC backport window done
- 13:44 dreamyjazz@deploy1002: Finished scap: Backport for Allow Bureaucrats on Foundation Wiki to be able to remove Sysop rights (T370097), fix(editor): make PageTitleControl reliably blankable (T370326) (duration: 09m 59s)
- 13:39 dreamyjazz@deploy1002: migr, dreamyjazz, dreamrimmer: Continuing with sync
- 13:36 dreamyjazz@deploy1002: migr, dreamyjazz, dreamrimmer: Backport for Allow Bureaucrats on Foundation Wiki to be able to remove Sysop rights (T370097), fix(editor): make PageTitleControl reliably blankable (T370326) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:34 dreamyjazz@deploy1002: Started scap sync-world: Backport for Allow Bureaucrats on Foundation Wiki to be able to remove Sysop rights (T370097), fix(editor): make PageTitleControl reliably blankable (T370326)
- 13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm
- 13:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2432.codfw.wmnet with OS buster
- 12:55 topranks: re-enabling interface et-1/0/2 on cr2-codfw which connects to ssw1-d8-codfw (problemtic IP interfaces have been deleted) T366941
- 12:52 topranks: re-enabling BGP between spine-layer switches in codfw (problematic IP interfaces have been deleted) T366941
- 12:51 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:51 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove entries for IRB ints on row D spines - cmooney@cumin1002"
- 12:50 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove entries for IRB ints on row D spines - cmooney@cumin1002"
- 12:46 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 12:40 dreamyjazz@deploy1002: Finished scap: Backport for [GlobalBlocking] Enable global account blocks on all wikis (T356924) (duration: 09m 10s)
- 12:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
- 12:35 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
- 12:34 dreamyjazz@deploy1002: dreamyjazz: Backport for [GlobalBlocking] Enable global account blocks on all wikis (T356924) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2432.codfw.wmnet with reason: host reimage
- 12:32 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 12:32 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 12:32 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
- 12:30 dreamyjazz@deploy1002: Started scap sync-world: Backport for [GlobalBlocking] Enable global account blocks on all wikis (T356924)
- 12:27 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2432.codfw.wmnet with reason: host reimage
- 12:25 elukey: update spicerack to 8.8.0 on cumin1002
- 12:14 claime: restarting sync-puppet-volatile on puppetserver2001
- 12:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2432.codfw.wmnet with OS buster
- 12:09 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 12:09 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 12:08 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 11:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2432.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 11:39 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2432.mgmt.codfw.wmnet with reboot policy GRACEFUL
- 11:15 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
- 11:14 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 11:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new IRB interfaces codfw - cmooney@cumin1002"
- 11:13 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new IRB interfaces codfw - cmooney@cumin1002"
- 11:12 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 11:12 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 11:10 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 11:10 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 11:09 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 11:07 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 11:07 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:05 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 11:05 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 11:04 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 11:04 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 11:03 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
- 10:54 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 10:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2432.codfw.wmnet with OS buster
- 10:28 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.convert-disks (exit_code=97) for host mw2432
- 10:17 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
- 10:08 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
- 10:04 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
- 09:56 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
- 09:52 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
- 09:46 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 09:46 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 09:44 elukey: upgrade spicerack to 8.8.0 on cumin2002 - testing the new release
- 09:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
- 09:26 elukey: uploaded spicerack_8.8.0 to apt.wikimedia.org bullseye-wikimedia
- 09:26 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
- 09:08 btullis: disabled check-private-data.timer on clouddb1021, pending decom.
- 09:06 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:06 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
- 09:02 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:02 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 08:56 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 08:55 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 08:51 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 08:51 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 08:47 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 08:47 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 08:13 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.14 refs T366959
- 04:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367856)', diff saved to https://phabricator.wikimedia.org/P66806 and previous config saved to /var/cache/conftool/dbconfig/20240718-043817-marostegui.json
- 04:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 04:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 04:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 04:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 04:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367856)', diff saved to https://phabricator.wikimedia.org/P66805 and previous config saved to /var/cache/conftool/dbconfig/20240718-043739-marostegui.json
- 04:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P66804 and previous config saved to /var/cache/conftool/dbconfig/20240718-042232-marostegui.json
- 04:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P66803 and previous config saved to /var/cache/conftool/dbconfig/20240718-040725-marostegui.json
- 03:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367856)', diff saved to https://phabricator.wikimedia.org/P66802 and previous config saved to /var/cache/conftool/dbconfig/20240718-035218-marostegui.json
- 00:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic110[0-2]* for row maint - ryankemper@cumin2002 - T348977
- 00:35 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic110[0-2]* for row maint - ryankemper@cumin2002 - T348977
- 00:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T367781)', diff saved to https://phabricator.wikimedia.org/P66801 and previous config saved to /var/cache/conftool/dbconfig/20240718-000500-arnaudb.json
2024-07-17
- 23:50 mutante: phabricator (phab1004) - deployed gerrit:1054907 ; restarted apache
- 23:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P66800 and previous config saved to /var/cache/conftool/dbconfig/20240717-234953-arnaudb.json
- 23:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P66799 and previous config saved to /var/cache/conftool/dbconfig/20240717-233446-arnaudb.json
- 23:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T367781)', diff saved to https://phabricator.wikimedia.org/P66798 and previous config saved to /var/cache/conftool/dbconfig/20240717-231939-arnaudb.json
- 23:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T367781)', diff saved to https://phabricator.wikimedia.org/P66797 and previous config saved to /var/cache/conftool/dbconfig/20240717-231612-arnaudb.json
- 23:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 23:16 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 23:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T367781)', diff saved to https://phabricator.wikimedia.org/P66796 and previous config saved to /var/cache/conftool/dbconfig/20240717-231550-arnaudb.json
- 23:14 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 23:13 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1006
- 23:13 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1006
- 23:13 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
- 23:13 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
- 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1005
- 23:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
- 23:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P66795 and previous config saved to /var/cache/conftool/dbconfig/20240717-230043-arnaudb.json
- 22:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P66794 and previous config saved to /var/cache/conftool/dbconfig/20240717-224536-arnaudb.json
- 22:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1006.eqiad.wmnet with OS bullseye
- 22:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
- 22:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
- 22:37 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php aewikimedia "Reda Kerbouche" REDACTED --bureaucrat --sysop # T362529
- 22:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T367781)', diff saved to https://phabricator.wikimedia.org/P66793 and previous config saved to /var/cache/conftool/dbconfig/20240717-223028-arnaudb.json
- 22:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1006.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1005.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T367781)', diff saved to https://phabricator.wikimedia.org/P66792 and previous config saved to /var/cache/conftool/dbconfig/20240717-222701-arnaudb.json
- 22:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2208.codfw.wmnet with reason: Maintenance
- 22:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2208.codfw.wmnet with reason: Maintenance
- 22:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 22:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 22:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 22:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 22:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66791 and previous config saved to /var/cache/conftool/dbconfig/20240717-222530-arnaudb.json
- 22:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1005.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1006.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1004.mgmt.eqiad.wmnet with reboot policy FORCED
- 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephmon1004-6 - jclark@cumin1002"
- 22:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephmon1004-6 - jclark@cumin1002"
- 22:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P66790 and previous config saved to /var/cache/conftool/dbconfig/20240717-221023-arnaudb.json
- 22:07 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 21:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P66789 and previous config saved to /var/cache/conftool/dbconfig/20240717-215516-arnaudb.json
- 21:51 eileen: civicrm upgraded from 1ac3e7be to 384fe444
- 21:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66788 and previous config saved to /var/cache/conftool/dbconfig/20240717-214008-arnaudb.json
- 21:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66787 and previous config saved to /var/cache/conftool/dbconfig/20240717-213641-arnaudb.json
- 21:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2182.codfw.wmnet with reason: Maintenance
- 21:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2182.codfw.wmnet with reason: Maintenance
- 21:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T367781)', diff saved to https://phabricator.wikimedia.org/P66786 and previous config saved to /var/cache/conftool/dbconfig/20240717-213619-arnaudb.json
- away: UTC late deploys done
- 21:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P66785 and previous config saved to /var/cache/conftool/dbconfig/20240717-212112-arnaudb.json
- 21:19 tgr@deploy1002: Finished scap: Backport for skin-themes dblist is expanded to include tier 2 wikis as well as tier 1. (T367150) (duration: 16m 59s)
- 21:14 tgr@deploy1002: tgr, ksarabia: Continuing with sync
- 21:08 tgr@deploy1002: tgr, ksarabia: Backport for skin-themes dblist is expanded to include tier 2 wikis as well as tier 1. (T367150) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P66784 and previous config saved to /var/cache/conftool/dbconfig/20240717-210605-arnaudb.json
- 21:02 tgr@deploy1002: Started scap sync-world: Backport for skin-themes dblist is expanded to include tier 2 wikis as well as tier 1. (T367150)
- 21:01 tgr@deploy1002: Finished scap: Backport for SUL3: Fix URL handling for the SSO domain (T365162) (duration: 42m 33s)
- 20:54 tgr@deploy1002: tgr: Continuing with sync
- 20:53 tgr@deploy1002: tgr: Backport for SUL3: Fix URL handling for the SSO domain (T365162) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T367781)', diff saved to https://phabricator.wikimedia.org/P66783 and previous config saved to /var/cache/conftool/dbconfig/20240717-205058-arnaudb.json
- 20:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T367781)', diff saved to https://phabricator.wikimedia.org/P66782 and previous config saved to /var/cache/conftool/dbconfig/20240717-204731-arnaudb.json
- 20:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2168.codfw.wmnet with reason: Maintenance
- 20:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2168.codfw.wmnet with reason: Maintenance
- 20:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T367781)', diff saved to https://phabricator.wikimedia.org/P66781 and previous config saved to /var/cache/conftool/dbconfig/20240717-204709-arnaudb.json
- 20:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P66780 and previous config saved to /var/cache/conftool/dbconfig/20240717-203202-arnaudb.json
- 20:18 tgr@deploy1002: Started scap sync-world: Backport for SUL3: Fix URL handling for the SSO domain (T365162)
- 20:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P66779 and previous config saved to /var/cache/conftool/dbconfig/20240717-201655-arnaudb.json
- 20:14 tgr@deploy1002: Finished scap: Backport for SUL3: Fix cookie names on the SSO domain (T365162) (duration: 09m 23s)
- 20:12 topranks: rebooting unused switch ssw1-d8-codfw in an effort to troubleshoot gnmic errors
- 20:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: Rebooting ssw1-d8-codfw to try and fix gnmi telemtry
- 20:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: Rebooting ssw1-d8-codfw to try and fix gnmi telemtry
- 20:09 tgr@deploy1002: tgr: Continuing with sync
- 20:07 tgr@deploy1002: tgr: Backport for SUL3: Fix cookie names on the SSO domain (T365162) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:04 tgr@deploy1002: Started scap sync-world: Backport for SUL3: Fix cookie names on the SSO domain (T365162)
- 20:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T367781)', diff saved to https://phabricator.wikimedia.org/P66778 and previous config saved to /var/cache/conftool/dbconfig/20240717-200147-arnaudb.json
- 19:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T367781)', diff saved to https://phabricator.wikimedia.org/P66777 and previous config saved to /var/cache/conftool/dbconfig/20240717-195921-arnaudb.json
- 19:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 19:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 19:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2159.codfw.wmnet with reason: Maintenance
- 19:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2159.codfw.wmnet with reason: Maintenance
- 19:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T367781)', diff saved to https://phabricator.wikimedia.org/P66776 and previous config saved to /var/cache/conftool/dbconfig/20240717-195844-arnaudb.json
- 19:45 eileen: config revision changed from 85336766 to 4ea1c745
- 19:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P66775 and previous config saved to /var/cache/conftool/dbconfig/20240717-194337-arnaudb.json
- 19:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P66774 and previous config saved to /var/cache/conftool/dbconfig/20240717-192830-arnaudb.json
- 19:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T367781)', diff saved to https://phabricator.wikimedia.org/P66773 and previous config saved to /var/cache/conftool/dbconfig/20240717-191324-arnaudb.json
- 19:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T367781)', diff saved to https://phabricator.wikimedia.org/P66772 and previous config saved to /var/cache/conftool/dbconfig/20240717-191057-arnaudb.json
- 19:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2150.codfw.wmnet with reason: Maintenance
- 19:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2150.codfw.wmnet with reason: Maintenance
- 19:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T367781)', diff saved to https://phabricator.wikimedia.org/P66771 and previous config saved to /var/cache/conftool/dbconfig/20240717-191035-arnaudb.json
- 18:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P66770 and previous config saved to /var/cache/conftool/dbconfig/20240717-185528-arnaudb.json
- 18:46 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.14 refs T366959
- 18:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P66769 and previous config saved to /var/cache/conftool/dbconfig/20240717-184021-arnaudb.json
- 18:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T367781)', diff saved to https://phabricator.wikimedia.org/P66768 and previous config saved to /var/cache/conftool/dbconfig/20240717-182514-arnaudb.json
- 18:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T367781)', diff saved to https://phabricator.wikimedia.org/P66767 and previous config saved to /var/cache/conftool/dbconfig/20240717-182147-arnaudb.json
- 18:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2122.codfw.wmnet with reason: Maintenance
- 18:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2122.codfw.wmnet with reason: Maintenance
- 18:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T367781)', diff saved to https://phabricator.wikimedia.org/P66766 and previous config saved to /var/cache/conftool/dbconfig/20240717-182125-arnaudb.json
- 18:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.14 refs T366959
- 18:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P66765 and previous config saved to /var/cache/conftool/dbconfig/20240717-180617-arnaudb.json
- 18:01 topranks: adjust route preference for traffic to AWS on Eqiad core routers T370297
- 17:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P66764 and previous config saved to /var/cache/conftool/dbconfig/20240717-175110-arnaudb.json
- 17:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T367781)', diff saved to https://phabricator.wikimedia.org/P66763 and previous config saved to /var/cache/conftool/dbconfig/20240717-173603-arnaudb.json
- 17:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2121 (T367781)', diff saved to https://phabricator.wikimedia.org/P66762 and previous config saved to /var/cache/conftool/dbconfig/20240717-173336-arnaudb.json
- 17:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2121.codfw.wmnet with reason: Maintenance
- 17:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2121.codfw.wmnet with reason: Maintenance
- 17:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 17:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 17:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T367781)', diff saved to https://phabricator.wikimedia.org/P66761 and previous config saved to /var/cache/conftool/dbconfig/20240717-173257-arnaudb.json
- 17:27 mutante: removing integration.mediawiki.org from DNS - T361250
- 17:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P66760 and previous config saved to /var/cache/conftool/dbconfig/20240717-171750-arnaudb.json
- 17:13 inflatador: bking@kafka-main2005 `kafka topics --create --topic ${TOPIC} --partitions 1 --replication-factor 3; kafka configs --entity-type topics --entity-name ${TOPIC} --alter --add-config retention.ms=2592000000 T367510`
- 17:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P66759 and previous config saved to /var/cache/conftool/dbconfig/20240717-170243-arnaudb.json
- 16:59 btullis@deploy1002: Finished deploy [airflow-dags/analytics@ca21d05]: (no justification provided) (duration: 00m 51s)
- 16:58 btullis@deploy1002: Started deploy [airflow-dags/analytics@ca21d05]: (no justification provided)
- 16:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T367781)', diff saved to https://phabricator.wikimedia.org/P66758 and previous config saved to /var/cache/conftool/dbconfig/20240717-164736-arnaudb.json
- 16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T367781)', diff saved to https://phabricator.wikimedia.org/P66757 and previous config saved to /var/cache/conftool/dbconfig/20240717-164521-arnaudb.json
- 16:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1227.eqiad.wmnet with reason: Maintenance
- 16:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1227.eqiad.wmnet with reason: Maintenance
- 16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T367781)', diff saved to https://phabricator.wikimedia.org/P66756 and previous config saved to /var/cache/conftool/dbconfig/20240717-164459-arnaudb.json
- 16:34 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 16:34 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 16:32 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 16:31 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 16:31 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 16:31 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 16:30 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 16:30 otto@deploy1002: Finished deploy [analytics/refinery@8f00c85] (thin): THIN [analytics/refinery@8f00c859] (duration: 04m 08s)
- 16:29 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P66755 and previous config saved to /var/cache/conftool/dbconfig/20240717-162952-arnaudb.json
- 16:26 otto@deploy1002: Started deploy [analytics/refinery@8f00c85] (thin): THIN [analytics/refinery@8f00c859]
- 16:21 otto@deploy1002: Finished deploy [analytics/refinery@8f00c85]: [analytics/refinery@8f00c859] (duration: 07m 59s)
- 16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P66754 and previous config saved to /var/cache/conftool/dbconfig/20240717-161445-arnaudb.json
- 16:13 otto@deploy1002: Started deploy [analytics/refinery@8f00c85]: [analytics/refinery@8f00c859]
- 16:08 inflatador: bking@kafka-main1005 `kafka topics --create --topic ${TOPIC} --partitions 1 --replication-factor 3; kafka configs --entity-type topics --entity-name ${TOPIC} --alter --add-config retention.ms=2592000000` T367510
- 15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T367781)', diff saved to https://phabricator.wikimedia.org/P66752 and previous config saved to /var/cache/conftool/dbconfig/20240717-155937-arnaudb.json
- 15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T367781)', diff saved to https://phabricator.wikimedia.org/P66751 and previous config saved to /var/cache/conftool/dbconfig/20240717-155628-arnaudb.json
- 15:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 15:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66750 and previous config saved to /var/cache/conftool/dbconfig/20240717-155606-arnaudb.json
- 15:53 otto@deploy1002: Finished deploy [analytics/refinery@8f00c85] (hadoop-test): - take 2 - TEST [analytics/refinery@8f00c859] (duration: 03m 33s)
- 15:50 otto@deploy1002: Started deploy [analytics/refinery@8f00c85] (hadoop-test): - take 2 - TEST [analytics/refinery@8f00c859]
- 15:46 otto@deploy1002: Finished deploy [analytics/refinery@0b53772] (hadoop-test): TEST [analytics/refinery@0b53772e] (duration: 03m 27s)
- 15:42 otto@deploy1002: Started deploy [analytics/refinery@0b53772] (hadoop-test): TEST [analytics/refinery@0b53772e]
- 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P66748 and previous config saved to /var/cache/conftool/dbconfig/20240717-154059-arnaudb.json
- 15:38 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
- 15:37 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
- 15:35 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
- 15:35 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
- 15:33 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
- 15:32 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
- 15:32 topranks: Adjust anycast route policy at Chicago Network POP cr2-eqord to announce anycast ranges T367439
- 15:30 sukhe: sudo cumin "A:lvs" "run-puppet-agent" to pick up apus change
- 15:29 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
- 15:28 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
- 15:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P66747 and previous config saved to /var/cache/conftool/dbconfig/20240717-152552-arnaudb.json
- 15:24 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:23 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:23 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:22 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2007.codfw.wmnet with OS bookworm
- 15:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 15:21 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:21 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apus.discovery.wmnet on all recursors
- 15:20 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache apus.discovery.wmnet on all recursors
- 15:20 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:19 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 15:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 15:18 sukhe: running authdns-update for CR 1054346
- 15:16 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 15:16 sukhe: cumin 'A:dnsbox' 'run-puppet-agent': T279621
- 15:13 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 15:12 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 15:11 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 15:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66745 and previous config saved to /var/cache/conftool/dbconfig/20240717-151045-arnaudb.json
- 15:09 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:08 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:08 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66744 and previous config saved to /var/cache/conftool/dbconfig/20240717-150833-arnaudb.json
- 15:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 15:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T367781)', diff saved to https://phabricator.wikimedia.org/P66743 and previous config saved to /var/cache/conftool/dbconfig/20240717-150811-arnaudb.json
- 15:08 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:08 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 15:07 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:07 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 15:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2007.codfw.wmnet with reason: host reimage
- 14:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2007.codfw.wmnet with reason: host reimage
- 14:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P66742 and previous config saved to /var/cache/conftool/dbconfig/20240717-145303-arnaudb.json
- 14:46 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
- 14:46 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
- 14:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2007.codfw.wmnet with OS bookworm
- 14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367856)', diff saved to https://phabricator.wikimedia.org/P66741 and previous config saved to /var/cache/conftool/dbconfig/20240717-144415-marostegui.json
- 14:40 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
- 14:40 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
- 14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P66740 and previous config saved to /var/cache/conftool/dbconfig/20240717-143756-arnaudb.json
- 14:37 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 14:36 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 14:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P66739 and previous config saved to /var/cache/conftool/dbconfig/20240717-142908-marostegui.json
- 14:27 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 14:27 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 14:27 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 14:27 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 14:26 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for durum3003.esams.wmnet
- 14:26 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for durum3003.esams.wmnet
- 14:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T367781)', diff saved to https://phabricator.wikimedia.org/P66738 and previous config saved to /var/cache/conftool/dbconfig/20240717-142249-arnaudb.json
- 14:22 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on durum3003.esams.wmnet with reason: testing anycast-healthchecker 0.9.8
- 14:22 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on durum3003.esams.wmnet with reason: testing anycast-healthchecker 0.9.8
- 14:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2008.codfw.wmnet with OS bookworm
- 14:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 14:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T367781)', diff saved to https://phabricator.wikimedia.org/P66737 and previous config saved to /var/cache/conftool/dbconfig/20240717-141939-arnaudb.json
- 14:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 14:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 14:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66736 and previous config saved to /var/cache/conftool/dbconfig/20240717-141929-arnaudb.json
- 14:19 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 14:17 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:17 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:16 sukhe: [durum3003] upgrade anycast-healthchecker to 0.9.8-1+wmf12u1: T370068
- 14:16 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:14 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
- 14:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P66735 and previous config saved to /var/cache/conftool/dbconfig/20240717-141401-marostegui.json
- 14:11 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:11 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
- 14:11 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:07 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 14:06 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 14:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P66734 and previous config saved to /var/cache/conftool/dbconfig/20240717-140423-arnaudb.json
- 14:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2008.codfw.wmnet with reason: host reimage
- 13:59 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
- 13:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2008.codfw.wmnet with reason: host reimage
- 13:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367856)', diff saved to https://phabricator.wikimedia.org/P66733 and previous config saved to /var/cache/conftool/dbconfig/20240717-135854-marostegui.json
- 13:56 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
- 13:54 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
- 13:54 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
- 13:53 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
- 13:53 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
- 13:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P66732 and previous config saved to /var/cache/conftool/dbconfig/20240717-134916-arnaudb.json
- 13:43 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
- 13:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
- 13:40 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
- 13:37 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 13:36 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy2008.codfw.wmnet with OS bookworm
- 13:34 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 13:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66730 and previous config saved to /var/cache/conftool/dbconfig/20240717-133408-arnaudb.json
- 13:33 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 13:33 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 13:29 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
- 13:26 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
- 13:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
- 13:19 urbanecm: Stop revalidateLinkRecommendation for azwiki; restart as `[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --olderThan=20240104000000 --verbose` instead (T370262)
- 13:13 urbanecm@deploy1002: Finished scap: Backport for Add Portal namespace for Ingush Wikipedia (T326089), eventbus: enable instrumentation on group 0 (T363587) (duration: 10m 06s)
- 13:12 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --all --verbose # T370262
- 13:08 urbanecm@deploy1002: nmw03, gmodena, urbanecm: Continuing with sync
- 13:07 sukhe: [intentional] stop nginx.service on durum1001
- 13:05 urbanecm@deploy1002: nmw03, gmodena, urbanecm: Backport for Add Portal namespace for Ingush Wikipedia (T326089), eventbus: enable instrumentation on group 0 (T363587) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:03 urbanecm@deploy1002: Started scap sync-world: Backport for Add Portal namespace for Ingush Wikipedia (T326089), eventbus: enable instrumentation on group 0 (T363587)
- 12:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66729 and previous config saved to /var/cache/conftool/dbconfig/20240717-123352-arnaudb.json
- 12:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1181.eqiad.wmnet with reason: Maintenance
- 12:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1181.eqiad.wmnet with reason: Maintenance
- 12:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T367781)', diff saved to https://phabricator.wikimedia.org/P66728 and previous config saved to /var/cache/conftool/dbconfig/20240717-123341-arnaudb.json
- 12:31 urbanecm: Community configuration deployment finished
- 12:29 urbanecm@deploy1002: Finished scap: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458), dewiki: Disable CommunityConfiguration (T366458) (duration: 08m 30s)
- 12:24 urbanecm@deploy1002: urbanecm: Continuing with sync
- 12:23 urbanecm@deploy1002: urbanecm: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458), dewiki: Disable CommunityConfiguration (T366458) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:21 urbanecm@deploy1002: Started scap sync-world: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458), dewiki: Disable CommunityConfiguration (T366458)
- 12:19 urbanecm@deploy1002: Sync cancelled.
- 12:19 urbanecm: (relogging to attach to the task) migrateCommunityConfig.php finished, logs are available at https://phabricator.wikimedia.org/P66724 (T366458)
- 12:18 urbanecm: migrateCommunityConfig.php finished, logs are available at https://phabricator.wikimedia.org/P66724
- 12:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P66725 and previous config saved to /var/cache/conftool/dbconfig/20240717-121834-arnaudb.json
- 12:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P66723 and previous config saved to /var/cache/conftool/dbconfig/20240717-120327-arnaudb.json
- 11:57 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
- 11:54 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
- 11:52 urbanecm: [urbanecm@mwdebug1001 ~]$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php # T366458; output logged to migrateCommunityConfig.log in my home
- 11:51 urbanecm@deploy1002: urbanecm: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:49 urbanecm@deploy1002: Started scap sync-world: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458)
- 11:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T367781)', diff saved to https://phabricator.wikimedia.org/P66722 and previous config saved to /var/cache/conftool/dbconfig/20240717-114820-arnaudb.json
- 11:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T367781)', diff saved to https://phabricator.wikimedia.org/P66721 and previous config saved to /var/cache/conftool/dbconfig/20240717-114510-arnaudb.json
- 11:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 11:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 11:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 11:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 11:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T367781)', diff saved to https://phabricator.wikimedia.org/P66720 and previous config saved to /var/cache/conftool/dbconfig/20240717-114426-arnaudb.json
- 11:40 marostegui@cumin1002: dbctl commit (dc=all): 'Increase db2136's weight - testing 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P66719 and previous config saved to /var/cache/conftool/dbconfig/20240717-114032-marostegui.json
- 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T367856)', diff saved to https://phabricator.wikimedia.org/P66718 and previous config saved to /var/cache/conftool/dbconfig/20240717-113954-marostegui.json
- 11:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 11:39 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367856)', diff saved to https://phabricator.wikimedia.org/P66717 and previous config saved to /var/cache/conftool/dbconfig/20240717-113932-marostegui.json
- 11:38 _joe_: deleted pod that was reportedly returning 5xx to the cdn for mw-api-ext
- 11:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P66716 and previous config saved to /var/cache/conftool/dbconfig/20240717-112919-arnaudb.json
- 11:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
- 11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
- 11:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P66715 and previous config saved to /var/cache/conftool/dbconfig/20240717-112425-marostegui.json
- 11:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2432.codfw.wmnet with reason: RAID conversion testing
- 11:22 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2432.codfw.wmnet with reason: RAID conversion testing
- 11:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P66714 and previous config saved to /var/cache/conftool/dbconfig/20240717-111412-arnaudb.json
- 11:12 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d8-codfw
- 11:10 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d8-codfw
- 11:10 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d7-codfw
- 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P66713 and previous config saved to /var/cache/conftool/dbconfig/20240717-110918-marostegui.json
- 11:08 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d7-codfw
- 11:08 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d6-codfw
- 11:05 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d6-codfw
- 11:05 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d5-codfw
- 11:03 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d5-codfw
- 11:03 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d4-codfw
- 11:01 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d4-codfw
- 11:01 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d3-codfw
- 10:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T367781)', diff saved to https://phabricator.wikimedia.org/P66712 and previous config saved to /var/cache/conftool/dbconfig/20240717-105904-arnaudb.json
- 10:58 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d3-codfw
- 10:58 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d2-codfw
- 10:56 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d2-codfw
- 10:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c7-codfw
- 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367856)', diff saved to https://phabricator.wikimedia.org/P66711 and previous config saved to /var/cache/conftool/dbconfig/20240717-105411-marostegui.json
- 10:53 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c7-codfw
- 10:53 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c6-codfw
- 10:51 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c6-codfw
- 10:51 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c5-codfw
- 10:49 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c5-codfw
- 10:49 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c4-codfw
- 10:46 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c4-codfw
- 10:46 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c3-codfw
- 10:44 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c3-codfw
- 10:44 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c2-codfw
- 10:41 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c2-codfw
- 10:41 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c1-codfw
- 10:39 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c1-codfw
- 10:39 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-d8-codfw
- 10:37 cmooney@cumin1002: START - Cookbook sre.network.tls for network device ssw1-d8-codfw
- 10:37 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-d1-codfw
- 10:34 cmooney@cumin1002: START - Cookbook sre.network.tls for network device ssw1-d1-codfw
- 10:34 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b4-magru
- 10:32 cmooney@cumin1002: START - Cookbook sre.network.tls for network device asw1-b4-magru
- 10:32 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b3-magru
- 10:29 cmooney@cumin1002: START - Cookbook sre.network.tls for network device asw1-b3-magru
- 09:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T367781)', diff saved to https://phabricator.wikimedia.org/P66710 and previous config saved to /var/cache/conftool/dbconfig/20240717-095845-arnaudb.json
- 09:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 09:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66709 and previous config saved to /var/cache/conftool/dbconfig/20240717-094412-root.json
- 09:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66708 and previous config saved to /var/cache/conftool/dbconfig/20240717-092907-root.json
- 09:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-magru
- 09:14 cmooney@cumin1002: START - Cookbook sre.network.tls for network device cr2-magru
- 09:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66706 and previous config saved to /var/cache/conftool/dbconfig/20240717-091402-root.json
- 09:13 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-magru
- 09:08 cmooney@cumin1002: START - Cookbook sre.network.tls for network device cr1-magru
- 09:02 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
- 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66705 and previous config saved to /var/cache/conftool/dbconfig/20240717-085857-root.json
- 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4037.ulsfo.wmnet
- 08:48 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4037.ulsfo.wmnet
- 08:47 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
- 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66704 and previous config saved to /var/cache/conftool/dbconfig/20240717-084351-root.json
- 08:06 kartik@deploy1002: Finished scap: Backport for TranslatablePageState: Check if banner namespaces are configured (T370219) (duration: 14m 26s)
- 08:00 kartik@deploy1002: abi, kartik: Continuing with sync
- 07:54 kartik@deploy1002: abi, kartik: Backport for TranslatablePageState: Check if banner namespaces are configured (T370219) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:51 kartik@deploy1002: Started scap sync-world: Backport for TranslatablePageState: Check if banner namespaces are configured (T370219)
- 07:50 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 07:50 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 07:50 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 07:49 elukey: restart hadoop-mapreduce-historyserver.service on an-master1003 - failed for Java OOM
- 07:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 07:38 elukey@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d1-codfw
- 07:37 jayme: imported helm3 3.11.3 to bullseye-wikimedia and buster-wikimedia
- 07:36 elukey@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d1-codfw
- 06:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 17072
- 06:48 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'clear' for AS: 17072
- 05:36 marostegui: Deploy schema change on s7 eqiad db1181 dbmaint T367856
- 05:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Long schema change
- 05:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Long schema change
- 05:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1181 T370121', diff saved to https://phabricator.wikimedia.org/P66703 and previous config saved to /var/cache/conftool/dbconfig/20240717-053359-marostegui.json
- 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1236 to s7 primary and set section read-write T370121', diff saved to https://phabricator.wikimedia.org/P66702 and previous config saved to /var/cache/conftool/dbconfig/20240717-053302-root.json
- 05:32 marostegui@cumin1002: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T370121', diff saved to https://phabricator.wikimedia.org/P66701 and previous config saved to /var/cache/conftool/dbconfig/20240717-053230-root.json
- 05:32 marostegui: Starting s7 eqiad failover from db1181 to db1236 - T370121
- 05:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T370121
- 05:14 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1236 with weight 0 T370121', diff saved to https://phabricator.wikimedia.org/P66700 and previous config saved to /var/cache/conftool/dbconfig/20240717-051419-root.json
- 05:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T370121
- 02:56 eileen: civicrm upgraded from 4f919c1e to 1ac3e7be
- 00:42 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 00:42 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
2024-07-16
- 23:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367781)', diff saved to https://phabricator.wikimedia.org/P66699 and previous config saved to /var/cache/conftool/dbconfig/20240716-233336-arnaudb.json
- 23:25 cstone: civicrm upgraded from 8dbcdfb7 to 4f919c1e
- 23:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P66698 and previous config saved to /var/cache/conftool/dbconfig/20240716-231829-arnaudb.json
- 23:04 eileen: config revision changed from a1ed167f to 85336766
- 23:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P66697 and previous config saved to /var/cache/conftool/dbconfig/20240716-230322-arnaudb.json
- 22:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367781)', diff saved to https://phabricator.wikimedia.org/P66696 and previous config saved to /var/cache/conftool/dbconfig/20240716-224815-arnaudb.json
- 22:40 tzatziki: removing 9 files for legal compliance
- 22:37 eileen: * civicrm upgraded from 3287ced0 to 8dbcdfb7
- 22:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T367781)', diff saved to https://phabricator.wikimedia.org/P66695 and previous config saved to /var/cache/conftool/dbconfig/20240716-222638-arnaudb.json
- 22:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
- 22:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
- 22:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66694 and previous config saved to /var/cache/conftool/dbconfig/20240716-222616-arnaudb.json
- 22:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66693 and previous config saved to /var/cache/conftool/dbconfig/20240716-221109-arnaudb.json
- 21:59 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy2008.codfw.wmnet with OS bookworm
- 21:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66692 and previous config saved to /var/cache/conftool/dbconfig/20240716-215601-arnaudb.json
- 21:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66691 and previous config saved to /var/cache/conftool/dbconfig/20240716-214054-arnaudb.json
- 21:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66690 and previous config saved to /var/cache/conftool/dbconfig/20240716-211914-arnaudb.json
- 21:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2194.codfw.wmnet with reason: Maintenance
- 21:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2194.codfw.wmnet with reason: Maintenance
- 21:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367781)', diff saved to https://phabricator.wikimedia.org/P66689 and previous config saved to /var/cache/conftool/dbconfig/20240716-211852-arnaudb.json
- 21:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66688 and previous config saved to /var/cache/conftool/dbconfig/20240716-210345-arnaudb.json
- 20:54 urbanecm@deploy1002: Finished scap: Backport for [July 16th] Enable dark mode for logged out users (tier 1) (T367150) (duration: 08m 43s)
- 20:49 urbanecm@deploy1002: urbanecm, jdlrobson: Continuing with sync
- 20:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66687 and previous config saved to /var/cache/conftool/dbconfig/20240716-204838-arnaudb.json
- 20:48 urbanecm@deploy1002: urbanecm, jdlrobson: Backport for [July 16th] Enable dark mode for logged out users (tier 1) (T367150) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:45 urbanecm@deploy1002: Started scap sync-world: Backport for [July 16th] Enable dark mode for logged out users (tier 1) (T367150)
- 20:39 urbanecm@deploy1002: Finished scap: Backport for Ensure every test-config has valid defaults, Merge partial config with defaults (T368606), Merge partial config with defaults (T368606) (duration: 09m 55s)
- 20:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
- 20:34 urbanecm@deploy1002: urbanecm, migr: Continuing with sync
- 20:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367781)', diff saved to https://phabricator.wikimedia.org/P66686 and previous config saved to /var/cache/conftool/dbconfig/20240716-203331-arnaudb.json
- 20:33 urbanecm@deploy1002: urbanecm, migr: Backport for Ensure every test-config has valid defaults, Merge partial config with defaults (T368606), Merge partial config with defaults (T368606) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:30 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy2008.codfw.wmnet with OS bookworm
- 20:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
- 20:29 urbanecm@deploy1002: Started scap sync-world: Backport for Ensure every test-config has valid defaults, Merge partial config with defaults (T368606), Merge partial config with defaults (T368606)
- 20:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy2008.codfw.wmnet with OS bookworm
- 20:14 urbanecm@deploy1002: Finished scap: Backport for foundationwiki: Restrict `unfuzzy` right to autoconfirmed users (T369979) (duration: 09m 31s)
- 20:12 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=appservers-ro,name=eqiad [reason: Repooling to concentrate clients in eqiad - T367949]
- 20:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T367781)', diff saved to https://phabricator.wikimedia.org/P66685 and previous config saved to /var/cache/conftool/dbconfig/20240716-201153-arnaudb.json
- 20:11 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 20:11 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 20:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66684 and previous config saved to /var/cache/conftool/dbconfig/20240716-201131-arnaudb.json
- 20:09 urbanecm@deploy1002: seawolf35gerrit, urbanecm: Continuing with sync
- 20:09 urbanecm@deploy1002: seawolf35gerrit, urbanecm: Backport for foundationwiki: Restrict `unfuzzy` right to autoconfirmed users (T369979) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:05 urbanecm@deploy1002: Started scap sync-world: Backport for foundationwiki: Restrict `unfuzzy` right to autoconfirmed users (T369979)
- 19:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66683 and previous config saved to /var/cache/conftool/dbconfig/20240716-195624-arnaudb.json
- 19:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66682 and previous config saved to /var/cache/conftool/dbconfig/20240716-194117-arnaudb.json
- 19:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66681 and previous config saved to /var/cache/conftool/dbconfig/20240716-192610-arnaudb.json
- 19:25 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=eqiad [reason: Depooling ahead of turndown - T367949]
- 19:24 swfrench-wmf: depooling appservers-ro in eqiad, which is not used by remaining analytics workloads - T367949
- 19:18 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 19:18 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 19:17 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 19:15 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 19:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
- 19:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66680 and previous config saved to /var/cache/conftool/dbconfig/20240716-190526-arnaudb.json
- 19:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 19:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 19:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66679 and previous config saved to /var/cache/conftool/dbconfig/20240716-190504-arnaudb.json
- 18:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T367856)', diff saved to https://phabricator.wikimedia.org/P66678 and previous config saved to /var/cache/conftool/dbconfig/20240716-185657-marostegui.json
- 18:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 18:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
- 18:51 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 18:50 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 18:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66677 and previous config saved to /var/cache/conftool/dbconfig/20240716-184956-arnaudb.json
- 18:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 18:49 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 18:45 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy2007.codfw.wmnet with OS bookworm
- 18:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66675 and previous config saved to /var/cache/conftool/dbconfig/20240716-183449-arnaudb.json
- 18:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2007.codfw.wmnet with OS bookworm
- 18:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66674 and previous config saved to /var/cache/conftool/dbconfig/20240716-181942-arnaudb.json
- 18:14 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.14 refs T366959
- 18:00 dancy@deploy1002: Installing scap version "4.92.0" for 232 hosts
- 17:59 otto@deploy1002: Finished deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 3 [analytics/refinery@f97900c9] (duration: 00m 47s)
- 17:58 otto@deploy1002: Started deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 3 [analytics/refinery@f97900c9]
- 17:58 otto@deploy1002: Finished deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 2 [analytics/refinery@f97900c9] (duration: 02m 44s)
- 17:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66672 and previous config saved to /var/cache/conftool/dbconfig/20240716-175820-arnaudb.json
- 17:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 17:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 17:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 17:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 17:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367781)', diff saved to https://phabricator.wikimedia.org/P66671 and previous config saved to /var/cache/conftool/dbconfig/20240716-175742-arnaudb.json
- 17:55 otto@deploy1002: Started deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 2 [analytics/refinery@f97900c9]
- 17:55 otto@deploy1002: Finished deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s [analytics/refinery@f97900c9] (duration: 08m 33s)
- 17:55 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 17:53 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 17:47 otto@deploy1002: Started deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s [analytics/refinery@f97900c9]
- 17:47 otto@deploy1002: Finished deploy [analytics/refinery@f97900c] (hadoop-test): Deploy refinery with refinery-source version 0.2.44 for mw on k8s - TEST [analytics/refinery@f97900c9] (duration: 03m 23s)
- 17:46 swfrench-wmf: appservers-rw and api-rw now resolve to failoid - T367949
- 17:44 otto@deploy1002: Started deploy [analytics/refinery@f97900c] (hadoop-test): Deploy refinery with refinery-source version 0.2.44 for mw on k8s - TEST [analytics/refinery@f97900c9]
- 17:44 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=api-rw,name=eqiad [reason: Depooling ahead of turndown - T367949]
- 17:43 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=appservers-rw,name=eqiad [reason: Depooling ahead of turndown - T367949]
- 17:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66670 and previous config saved to /var/cache/conftool/dbconfig/20240716-174235-arnaudb.json
- 17:40 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=api-ro,name=codfw [reason: Depooling ahead of turndown - T367949]
- 17:39 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=codfw [reason: Depooling ahead of turndown - T367949]
- 17:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66669 and previous config saved to /var/cache/conftool/dbconfig/20240716-172727-arnaudb.json
- 17:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2006.codfw.wmnet with OS bookworm
- 17:14 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 17:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 17:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367781)', diff saved to https://phabricator.wikimedia.org/P66668 and previous config saved to /var/cache/conftool/dbconfig/20240716-171220-arnaudb.json
- 17:00 mutante: lists2001 - systemctl reset-failed after gerrit:1054610 to fix T370098
- 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2006.codfw.wmnet with reason: host reimage
- 16:53 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2006.codfw.wmnet with reason: host reimage
- 16:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T367781)', diff saved to https://phabricator.wikimedia.org/P66667 and previous config saved to /var/cache/conftool/dbconfig/20240716-165135-arnaudb.json
- 16:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 16:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 16:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66666 and previous config saved to /var/cache/conftool/dbconfig/20240716-164446-arnaudb.json
- 16:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66665 and previous config saved to /var/cache/conftool/dbconfig/20240716-164437-arnaudb.json
- 16:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 100%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66664 and previous config saved to /var/cache/conftool/dbconfig/20240716-164422-arnaudb.json
- 16:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2006.codfw.wmnet with OS bookworm
- 16:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 16:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 16:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T367781)', diff saved to https://phabricator.wikimedia.org/P66663 and previous config saved to /var/cache/conftool/dbconfig/20240716-163059-arnaudb.json
- 16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66662 and previous config saved to /var/cache/conftool/dbconfig/20240716-162940-arnaudb.json
- 16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66661 and previous config saved to /var/cache/conftool/dbconfig/20240716-162931-arnaudb.json
- 16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 75%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66660 and previous config saved to /var/cache/conftool/dbconfig/20240716-162916-arnaudb.json
- 16:21 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:21 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge DNS franio changes (add mgmt IPs) - sukhe@cumin1002"
- 16:20 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge DNS franio changes (add mgmt IPs) - sukhe@cumin1002"
- 16:18 sukhe@cumin1002: START - Cookbook sre.dns.netbox
- 16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P66659 and previous config saved to /var/cache/conftool/dbconfig/20240716-161552-arnaudb.json
- 16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66658 and previous config saved to /var/cache/conftool/dbconfig/20240716-161435-arnaudb.json
- 16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66657 and previous config saved to /var/cache/conftool/dbconfig/20240716-161426-arnaudb.json
- 16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 50%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66656 and previous config saved to /var/cache/conftool/dbconfig/20240716-161411-arnaudb.json
- 16:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P66655 and previous config saved to /var/cache/conftool/dbconfig/20240716-160044-arnaudb.json
- 15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66654 and previous config saved to /var/cache/conftool/dbconfig/20240716-155930-arnaudb.json
- 15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66653 and previous config saved to /var/cache/conftool/dbconfig/20240716-155920-arnaudb.json
- 15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 25%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66652 and previous config saved to /var/cache/conftool/dbconfig/20240716-155905-arnaudb.json
- 15:58 elukey: uploaded spicerack_8.7.0 to apt.wikimedia.org bullseye-wikimedia
- 15:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66651 and previous config saved to /var/cache/conftool/dbconfig/20240716-155221-root.json
- 15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T367781)', diff saved to https://phabricator.wikimedia.org/P66650 and previous config saved to /var/cache/conftool/dbconfig/20240716-154537-arnaudb.json
- 15:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66649 and previous config saved to /var/cache/conftool/dbconfig/20240716-154424-arnaudb.json
- 15:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66648 and previous config saved to /var/cache/conftool/dbconfig/20240716-154415-arnaudb.json
- 15:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 10%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66647 and previous config saved to /var/cache/conftool/dbconfig/20240716-154401-arnaudb.json
- 15:39 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:39 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:37 papaul: reboot fpc0 on fasw-c-codfw.mgmt.codfw.wmnet
- 15:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66646 and previous config saved to /var/cache/conftool/dbconfig/20240716-153715-root.json
- 15:36 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:35 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:32 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:32 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 5%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66645 and previous config saved to /var/cache/conftool/dbconfig/20240716-152918-arnaudb.json
- 15:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66644 and previous config saved to /var/cache/conftool/dbconfig/20240716-152910-arnaudb.json
- 15:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 5%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66643 and previous config saved to /var/cache/conftool/dbconfig/20240716-152855-arnaudb.json
- 15:27 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
- 15:27 claime: Uncordoning kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet - T365997
- 15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2127 (T367781)', diff saved to https://phabricator.wikimedia.org/P66642 and previous config saved to /var/cache/conftool/dbconfig/20240716-152349-arnaudb.json
- 15:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
- 15:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
- 15:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66641 and previous config saved to /var/cache/conftool/dbconfig/20240716-152209-root.json
- 15:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 15:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 15:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 15:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367781)', diff saved to https://phabricator.wikimedia.org/P66640 and previous config saved to /var/cache/conftool/dbconfig/20240716-151516-arnaudb.json
- 15:08 topranks: Rebooting lsw1-f2-eqiad to complete JunOS upgrade T365997
- 15:08 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 21 hosts with reason: JunOS upgrade lsw1-f2-eqiad
- 15:07 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 21 hosts with reason: JunOS upgrade lsw1-f2-eqiad
- 15:07 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f2-eqiad,lsw1-f2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f2-eqiad
- 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66638 and previous config saved to /var/cache/conftool/dbconfig/20240716-150704-root.json
- 15:06 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f2-eqiad,lsw1-f2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f2-eqiad
- 15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@7335128]: deploy phab1004 for T370109 (duration: 00m 52s)
- 15:05 godog: silence OtelCollectorRefusedSpans in codfw for 7d - T370043
- 15:05 godog: silence OtelCollectorRefusedSpans in codfw for 7d
- 15:05 brennen@deploy1002: Started deploy [phabricator/deployment@7335128]: deploy phab1004 for T370109
- 15:04 brennen@deploy1002: Finished deploy [phabricator/deployment@7335128]: test deploy phab2002 for T370109 (duration: 00m 34s)
- 15:04 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
- 15:04 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
- 15:04 brennen@deploy1002: Started deploy [phabricator/deployment@7335128]: test deploy phab2002 for T370109
- 15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
- 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
- 15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
- 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
- {{safesubst:SAL entry|1=15:01 urbanecm@deploy1002: Finished scap: Backport for Introduce Vanish Request Flow (T367329 T367726 T367728 T367729 T367744 T368177 T368285 T368368 T368372 T368611 T369489), Pass wiki id to actor store for cross-db hasPublicLogs query (T370059), Properly set automatic vanish performer on GlobalRenameUser (T368177), [[gerrit:1053373|Enable account vanishing in Centra}}
- 15:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66637 and previous config saved to /var/cache/conftool/dbconfig/20240716-150007-arnaudb.json
- 14:53 urbanecm@deploy1002: dbrant, urbanecm: Continuing with sync
- {{safesubst:SAL entry|1=14:53 urbanecm@deploy1002: dbrant, urbanecm: Backport for Introduce Vanish Request Flow (T367329 T367726 T367728 T367729 T367744 T368177 T368285 T368368 T368372 T368611 T369489), Pass wiki id to actor store for cross-db hasPublicLogs query (T370059), Properly set automatic vanish performer on GlobalRenameUser (T368177), [[gerrit:1053373|Enable account vanishing in Cen}}
- 14:53 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on centrallog2002.codfw.wmnet with reason: network upgrade
- 14:53 filippo@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on centrallog2002.codfw.wmnet with reason: network upgrade
- 14:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66636 and previous config saved to /var/cache/conftool/dbconfig/20240716-145159-root.json
- 14:49 sukhe: [durum1001] upgrade anycast-healthchecker to 0.9.8-1+wmf12u1: T370068
- 14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f2-eqiad
- 14:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f2-eqiad
- 14:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66635 and previous config saved to /var/cache/conftool/dbconfig/20240716-144500-arnaudb.json
- 14:44 sukhe: reprepro -C main include bookworm-wikimedia anycast-healthchecker_0.9.8-1+wmf12u1_amd64.changes: T370068
- 14:36 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
- 14:34 claime: Cordoning kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet - T365997
- 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[1194,1200-1201].eqiad.wmnet,dbstore1009.eqiad.wmnet with reason: T365997
- 14:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db[1194,1200-1201].eqiad.wmnet,dbstore1009.eqiad.wmnet with reason: T365997
- 14:33 arnaudb@cumin1002: dbctl commit (dc=all): 'T365997 - depool db1194-s7,db1200-s5,db1201-s6', diff saved to https://phabricator.wikimedia.org/P66634 and previous config saved to /var/cache/conftool/dbconfig/20240716-143306-arnaudb.json
- 14:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367781)', diff saved to https://phabricator.wikimedia.org/P66633 and previous config saved to /var/cache/conftool/dbconfig/20240716-142953-arnaudb.json
- {{safesubst:SAL entry|1=14:26 urbanecm@deploy1002: Started scap sync-world: Backport for Introduce Vanish Request Flow (T367329 T367726 T367728 T367729 T367744 T368177 T368285 T368368 T368372 T368611 T369489), Pass wiki id to actor store for cross-db hasPublicLogs query (T370059), Properly set automatic vanish performer on GlobalRenameUser (T368177), [[gerrit:1053373|Enable account vanishing}}
- 14:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T367781)', diff saved to https://phabricator.wikimedia.org/P66632 and previous config saved to /var/cache/conftool/dbconfig/20240716-142321-arnaudb.json
- 14:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 14:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 14:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 14:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 14:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367781)', diff saved to https://phabricator.wikimedia.org/P66631 and previous config saved to /var/cache/conftool/dbconfig/20240716-142029-arnaudb.json
- 14:12 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 14:11 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 14:10 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 14:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 14:07 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 14:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 14:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66630 and previous config saved to /var/cache/conftool/dbconfig/20240716-140522-arnaudb.json
- 14:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2432.codfw.wmnet
- 13:53 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2432.codfw.wmnet
- 13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66629 and previous config saved to /var/cache/conftool/dbconfig/20240716-135015-arnaudb.json
- away: UTC afternoon deploys done
- 13:39 tgr@deploy1002: Finished scap: Backport for Handle sso.wikimedia.org domain (T365162) (duration: 19m 07s)
- 13:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367781)', diff saved to https://phabricator.wikimedia.org/P66628 and previous config saved to /var/cache/conftool/dbconfig/20240716-133508-arnaudb.json
- 13:34 tgr@deploy1002: tgr: Continuing with sync
- 13:29 mforns@deploy1002: Finished deploy [airflow-dags/analytics@1ee55b8]: (no justification provided) (duration: 00m 30s)
- 13:29 mforns@deploy1002: Started deploy [airflow-dags/analytics@1ee55b8]: (no justification provided)
- 13:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T367781)', diff saved to https://phabricator.wikimedia.org/P66627 and previous config saved to /var/cache/conftool/dbconfig/20240716-132915-arnaudb.json
- 13:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 13:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 13:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66626 and previous config saved to /var/cache/conftool/dbconfig/20240716-132853-arnaudb.json
- 13:22 tgr@deploy1002: tgr: Backport for Handle sso.wikimedia.org domain (T365162) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:20 tgr@deploy1002: Started scap sync-world: Backport for Handle sso.wikimedia.org domain (T365162)
- 13:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for EventStreamConfig: Enable hive ingestion for mediawiki.page-delete (T367134) (duration: 10m 15s)
- 13:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66625 and previous config saved to /var/cache/conftool/dbconfig/20240716-131346-arnaudb.json
- 13:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 tchin, lucaswerkmeister-wmde: Continuing with sync
- 13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 tchin, lucaswerkmeister-wmde: Backport for EventStreamConfig: Enable hive ingestion for mediawiki.page-delete (T367134) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for EventStreamConfig: Enable hive ingestion for mediawiki.page-delete (T367134)
- 12:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66624 and previous config saved to /var/cache/conftool/dbconfig/20240716-125839-arnaudb.json
- 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T367856)', diff saved to https://phabricator.wikimedia.org/P66623 and previous config saved to /var/cache/conftool/dbconfig/20240716-124604-marostegui.json
- 12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66622 and previous config saved to /var/cache/conftool/dbconfig/20240716-124543-marostegui.json
- 12:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66621 and previous config saved to /var/cache/conftool/dbconfig/20240716-124332-arnaudb.json
- 12:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P66620 and previous config saved to /var/cache/conftool/dbconfig/20240716-123035-marostegui.json
- 12:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66619 and previous config saved to /var/cache/conftool/dbconfig/20240716-122039-root.json
- 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P66618 and previous config saved to /var/cache/conftool/dbconfig/20240716-121528-marostegui.json
- 12:10 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.7 to netbox-next - ayounsi@cumin1002 - T336275
- 12:09 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.7 to netbox-next - ayounsi@cumin1002 - T336275
- 12:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66617 and previous config saved to /var/cache/conftool/dbconfig/20240716-120534-root.json
- 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66616 and previous config saved to /var/cache/conftool/dbconfig/20240716-120021-marostegui.json
- 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66615 and previous config saved to /var/cache/conftool/dbconfig/20240716-120012-marostegui.json
- 12:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 12:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 11:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66614 and previous config saved to /var/cache/conftool/dbconfig/20240716-115920-marostegui.json
- 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66613 and previous config saved to /var/cache/conftool/dbconfig/20240716-115028-root.json
- 11:49 effie: drain mw1496.eqiad.wmnet
- 11:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66611 and previous config saved to /var/cache/conftool/dbconfig/20240716-114315-arnaudb.json
- 11:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 11:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 11:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66610 and previous config saved to /var/cache/conftool/dbconfig/20240716-114254-arnaudb.json
- 11:35 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66608 and previous config saved to /var/cache/conftool/dbconfig/20240716-113523-root.json
- 11:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66607 and previous config saved to /var/cache/conftool/dbconfig/20240716-112746-arnaudb.json
- 11:20 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
- 11:20 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
- 11:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66606 and previous config saved to /var/cache/conftool/dbconfig/20240716-112017-root.json
- 11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66605 and previous config saved to /var/cache/conftool/dbconfig/20240716-111239-arnaudb.json
- 11:08 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
- 11:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
- 11:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66604 and previous config saved to /var/cache/conftool/dbconfig/20240716-110512-root.json
- 10:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66603 and previous config saved to /var/cache/conftool/dbconfig/20240716-105732-arnaudb.json
- 10:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
- 10:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66602 and previous config saved to /var/cache/conftool/dbconfig/20240716-105139-arnaudb.json
- 10:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 10:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 10:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66601 and previous config saved to /var/cache/conftool/dbconfig/20240716-105117-arnaudb.json
- 10:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66600 and previous config saved to /var/cache/conftool/dbconfig/20240716-105006-root.json
- 10:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66599 and previous config saved to /var/cache/conftool/dbconfig/20240716-103610-arnaudb.json
- 10:35 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
- 10:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66598 and previous config saved to /var/cache/conftool/dbconfig/20240716-102103-arnaudb.json
- 10:10 dcausse: T362529: creating aewikimedia CirrusSearch indices with 'mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=aewikimedia --cluster=all'
- 10:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66597 and previous config saved to /var/cache/conftool/dbconfig/20240716-100556-arnaudb.json
- 10:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66595 and previous config saved to /var/cache/conftool/dbconfig/20240716-100002-arnaudb.json
- 09:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 09:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 09:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66594 and previous config saved to /var/cache/conftool/dbconfig/20240716-095939-arnaudb.json
- 09:54 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 09:53 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 09:52 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:52 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:50 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 09:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P66593 and previous config saved to /var/cache/conftool/dbconfig/20240716-094432-arnaudb.json
- 09:44 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 09:42 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 09:39 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 09:37 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 09:37 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 09:32 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database aewikimedia (T362529)
- 09:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P66592 and previous config saved to /var/cache/conftool/dbconfig/20240716-092924-arnaudb.json
- 09:23 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 09:20 godog: bounce benthos@mw_accesslog_sampler - T369256
- 09:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66591 and previous config saved to /var/cache/conftool/dbconfig/20240716-091418-arnaudb.json
- 09:12 elukey: update docker-registry to 0.0.14-1 on build2001
- 09:12 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 09:12 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 09:12 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 09:11 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 09:11 elukey: update docker-report to 0.0.14-1 on bullseye-wikimedia
- 09:06 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database aewikimedia (T362529)
- 09:04 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 09:03 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 09:03 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 09:03 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 09:03 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 09:02 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 08:50 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 08:32 godog: root@kafka-logging1001:~# kafka topics --alter --topic mediawiki.httpd.accesslog --partitions 12 - T369256
- 08:31 marostegui: Clone dbstore1008:3317 from db1174 T370122
- 08:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Long schema change
- 08:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Long schema change
- 08:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P66589 and previous config saved to /var/cache/conftool/dbconfig/20240716-082727-root.json
- 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66588 and previous config saved to /var/cache/conftool/dbconfig/20240716-082213-root.json
- 08:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66587 and previous config saved to /var/cache/conftool/dbconfig/20240716-081401-arnaudb.json
- 08:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 08:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66586 and previous config saved to /var/cache/conftool/dbconfig/20240716-081129-root.json
- 08:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 08:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66585 and previous config saved to /var/cache/conftool/dbconfig/20240716-080720-root.json
- 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66584 and previous config saved to /var/cache/conftool/dbconfig/20240716-080707-root.json
- 07:46 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1006.eqiad.wmnet
- 07:40 Dreamy_Jazz: Morning UTC backport window done
- 07:38 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1006.eqiad.wmnet
- 07:38 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
- 07:29 Dreamy_Jazz: Restarted MediaModeration scanning scrpt
- 07:28 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
- 07:19 dreamyjazz@deploy1002: Finished scap: Backport for [CheckUser] Remove wgCheckUserEventTablesMigrationStage config (T366546) (duration: 12m 09s)
- 07:14 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
- 07:14 dreamyjazz@deploy1002: dreamyjazz: Backport for [CheckUser] Remove wgCheckUserEventTablesMigrationStage config (T366546) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:13 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:13 volans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Merging pending changes for frack hosts as per IRC discussion - volans@cumin1002"
- 07:10 volans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Merging pending changes for frack hosts as per IRC discussion - volans@cumin1002"
- 07:07 dreamyjazz@deploy1002: Started scap sync-world: Backport for [CheckUser] Remove wgCheckUserEventTablesMigrationStage config (T366546)
- 07:07 volans@cumin1002: START - Cookbook sre.dns.netbox
- 06:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52999
- 06:59 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 52999
- 06:18 kart_: Updated cxserver to 2024-07-15-100650-production (T354666)
- 06:16 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 06:16 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 06:12 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 06:12 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
- 06:11 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 06:11 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 06:06 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 06:05 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 05:43 marostegui: Deploy schema change on s7 eqiad db1174 dbmaint T367856
- 05:43 marostegui: Deploy schema change on s3 eqiad db1157 dbmaint T367856
- 05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Long schema change
- 05:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Long schema change
- 05:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Long schema change
- 05:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Long schema change
- 05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1157 T370019', diff saved to https://phabricator.wikimedia.org/P66581 and previous config saved to /var/cache/conftool/dbconfig/20240716-051718-root.json
- 05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write T370019', diff saved to https://phabricator.wikimedia.org/P66580 and previous config saved to /var/cache/conftool/dbconfig/20240716-051538-root.json
- 05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T370019', diff saved to https://phabricator.wikimedia.org/P66579 and previous config saved to /var/cache/conftool/dbconfig/20240716-051516-root.json
- 05:15 marostegui: Starting s3 eqiad failover from db1157 to db1223 - T370019
- 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1223 with weight 0 T370019', diff saved to https://phabricator.wikimedia.org/P66578 and previous config saved to /var/cache/conftool/dbconfig/20240716-045839-root.json
- 04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Long schema change
- 04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Long schema change
- 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P66577 and previous config saved to /var/cache/conftool/dbconfig/20240716-045807-marostegui.json
- 04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T370019
- 04:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s3 T370019
- 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.11 (duration: 00m 58s)
- 03:53 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.14 refs T366959 (duration: 50m 56s)
- 03:03 mwpresync@deploy1002: Started scap sync-world: testwikis wikis to 1.43.0-wmf.14 refs T366959
- 02:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T367781)', diff saved to https://phabricator.wikimedia.org/P66576 and previous config saved to /var/cache/conftool/dbconfig/20240716-025545-arnaudb.json
- 02:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P66575 and previous config saved to /var/cache/conftool/dbconfig/20240716-024038-arnaudb.json
- 02:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P66574 and previous config saved to /var/cache/conftool/dbconfig/20240716-022531-arnaudb.json
- 02:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T367781)', diff saved to https://phabricator.wikimedia.org/P66573 and previous config saved to /var/cache/conftool/dbconfig/20240716-021023-arnaudb.json
- 02:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T367781)', diff saved to https://phabricator.wikimedia.org/P66572 and previous config saved to /var/cache/conftool/dbconfig/20240716-020751-arnaudb.json
- 02:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2207.codfw.wmnet with reason: Maintenance
- 02:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2207.codfw.wmnet with reason: Maintenance
- 01:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 01:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 01:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66570 and previous config saved to /var/cache/conftool/dbconfig/20240716-012125-arnaudb.json
- 01:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P66569 and previous config saved to /var/cache/conftool/dbconfig/20240716-010618-arnaudb.json
- 00:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P66568 and previous config saved to /var/cache/conftool/dbconfig/20240716-005111-arnaudb.json
- 00:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66567 and previous config saved to /var/cache/conftool/dbconfig/20240716-003604-arnaudb.json
- 00:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66566 and previous config saved to /var/cache/conftool/dbconfig/20240716-003331-arnaudb.json
- 00:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: Maintenance
- 00:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2189.codfw.wmnet with reason: Maintenance
- 00:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66565 and previous config saved to /var/cache/conftool/dbconfig/20240716-003310-arnaudb.json
- 00:26 zabe: zabe@mwmaint1002:/tmp/upload$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Trade . # T369998
- 00:22 zabe: zabe@mwmaint1002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiktionary --logwiki=metawiki 'Dodo cham' 'Le GlitcheurHD' # T369777
- 00:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P66564 and previous config saved to /var/cache/conftool/dbconfig/20240716-001802-arnaudb.json
- 00:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P66563 and previous config saved to /var/cache/conftool/dbconfig/20240716-000255-arnaudb.json
2024-07-15
- 23:54 zabe@deploy1002: Finished scap: Backport for Further configurations for aewikimedia (T362529) (duration: 12m 26s)
- 23:49 zabe@deploy1002: zabe: Continuing with sync
- 23:48 zabe: zabe@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php aewikimedia translate # T362529
- 23:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66562 and previous config saved to /var/cache/conftool/dbconfig/20240715-234748-arnaudb.json
- 23:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66561 and previous config saved to /var/cache/conftool/dbconfig/20240715-234516-arnaudb.json
- 23:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2175.codfw.wmnet with reason: Maintenance
- 23:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2175.codfw.wmnet with reason: Maintenance
- 23:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367781)', diff saved to https://phabricator.wikimedia.org/P66560 and previous config saved to /var/cache/conftool/dbconfig/20240715-234454-arnaudb.json
- 23:44 zabe@deploy1002: zabe: Backport for Further configurations for aewikimedia (T362529) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 23:42 zabe@deploy1002: Started scap sync-world: Backport for Further configurations for aewikimedia (T362529)
- 23:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P66559 and previous config saved to /var/cache/conftool/dbconfig/20240715-232947-arnaudb.json
- 23:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P66558 and previous config saved to /var/cache/conftool/dbconfig/20240715-231440-arnaudb.json
- 23:11 logmsgbot: nshahquinn-wmf@deploy1002 Finished deploy [airflow-dags/analytics_product@767d7ad]: (no justification provided) (duration: 00m 08s)
- 23:11 logmsgbot: nshahquinn-wmf@deploy1002 Started deploy [airflow-dags/analytics_product@767d7ad]: (no justification provided)
- 22:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367781)', diff saved to https://phabricator.wikimedia.org/P66557 and previous config saved to /var/cache/conftool/dbconfig/20240715-225933-arnaudb.json
- 22:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T367781)', diff saved to https://phabricator.wikimedia.org/P66556 and previous config saved to /var/cache/conftool/dbconfig/20240715-225701-arnaudb.json
- 22:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 22:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 22:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367781)', diff saved to https://phabricator.wikimedia.org/P66555 and previous config saved to /var/cache/conftool/dbconfig/20240715-225639-arnaudb.json
- 22:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P66554 and previous config saved to /var/cache/conftool/dbconfig/20240715-224131-arnaudb.json
- 22:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P66553 and previous config saved to /var/cache/conftool/dbconfig/20240715-222624-arnaudb.json
- 22:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367781)', diff saved to https://phabricator.wikimedia.org/P66552 and previous config saved to /var/cache/conftool/dbconfig/20240715-221117-arnaudb.json
- 22:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T367781)', diff saved to https://phabricator.wikimedia.org/P66551 and previous config saved to /var/cache/conftool/dbconfig/20240715-220845-arnaudb.json
- 22:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2138.codfw.wmnet with reason: Maintenance
- 22:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2138.codfw.wmnet with reason: Maintenance
- 22:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367781)', diff saved to https://phabricator.wikimedia.org/P66550 and previous config saved to /var/cache/conftool/dbconfig/20240715-220823-arnaudb.json
- 21:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P66549 and previous config saved to /var/cache/conftool/dbconfig/20240715-215316-arnaudb.json
- 21:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P66548 and previous config saved to /var/cache/conftool/dbconfig/20240715-213809-arnaudb.json
- 21:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367781)', diff saved to https://phabricator.wikimedia.org/P66547 and previous config saved to /var/cache/conftool/dbconfig/20240715-212302-arnaudb.json
- 21:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T367781)', diff saved to https://phabricator.wikimedia.org/P66546 and previous config saved to /var/cache/conftool/dbconfig/20240715-212034-arnaudb.json
- 21:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 21:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 21:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2126.codfw.wmnet with reason: Maintenance
- 21:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2126.codfw.wmnet with reason: Maintenance
- 21:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367781)', diff saved to https://phabricator.wikimedia.org/P66545 and previous config saved to /var/cache/conftool/dbconfig/20240715-211957-arnaudb.json
- 21:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P66544 and previous config saved to /var/cache/conftool/dbconfig/20240715-210451-arnaudb.json
- 20:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P66543 and previous config saved to /var/cache/conftool/dbconfig/20240715-204944-arnaudb.json
- 20:39 catrope@deploy1002: Finished scap: Backport for Revert changes in log levels, Revert "Change Linter log level to info" (duration: 07m 41s)
- 20:35 catrope@deploy1002: arlolra, catrope: Continuing with sync
- 20:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367781)', diff saved to https://phabricator.wikimedia.org/P66542 and previous config saved to /var/cache/conftool/dbconfig/20240715-203435-arnaudb.json
- 20:34 catrope@deploy1002: arlolra, catrope: Backport for Revert changes in log levels, Revert "Change Linter log level to info" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 20:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 20:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T367856)', diff saved to https://phabricator.wikimedia.org/P66541 and previous config saved to /var/cache/conftool/dbconfig/20240715-203233-marostegui.json
- 20:32 catrope@deploy1002: Started scap sync-world: Backport for Revert changes in log levels, Revert "Change Linter log level to info"
- 20:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T367781)', diff saved to https://phabricator.wikimedia.org/P66540 and previous config saved to /var/cache/conftool/dbconfig/20240715-203203-arnaudb.json
- 20:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2125.codfw.wmnet with reason: Maintenance
- 20:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2125.codfw.wmnet with reason: Maintenance
- 20:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 20:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 20:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367781)', diff saved to https://phabricator.wikimedia.org/P66539 and previous config saved to /var/cache/conftool/dbconfig/20240715-203120-arnaudb.json
- 20:29 catrope@deploy1002: Finished scap: Backport for [July 15th] Deploy dark mode to all logged-in users (T368795) (duration: 10m 26s)
- 20:24 catrope@deploy1002: jdlrobson, catrope: Continuing with sync
- 20:22 catrope@deploy1002: jdlrobson, catrope: Backport for [July 15th] Deploy dark mode to all logged-in users (T368795) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:19 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
- 20:18 catrope@deploy1002: Started scap sync-world: Backport for [July 15th] Deploy dark mode to all logged-in users (T368795)
- 20:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P66538 and previous config saved to /var/cache/conftool/dbconfig/20240715-201726-marostegui.json
- 20:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P66537 and previous config saved to /var/cache/conftool/dbconfig/20240715-201613-arnaudb.json
- 20:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P66536 and previous config saved to /var/cache/conftool/dbconfig/20240715-200218-marostegui.json
- 20:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P66535 and previous config saved to /var/cache/conftool/dbconfig/20240715-200106-arnaudb.json
- 19:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66534 and previous config saved to /var/cache/conftool/dbconfig/20240715-195510-root.json
- 19:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66533 and previous config saved to /var/cache/conftool/dbconfig/20240715-195459-root.json
- 19:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T367856)', diff saved to https://phabricator.wikimedia.org/P66532 and previous config saved to /var/cache/conftool/dbconfig/20240715-194711-marostegui.json
- 19:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367781)', diff saved to https://phabricator.wikimedia.org/P66531 and previous config saved to /var/cache/conftool/dbconfig/20240715-194559-arnaudb.json
- 19:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T367781)', diff saved to https://phabricator.wikimedia.org/P66530 and previous config saved to /var/cache/conftool/dbconfig/20240715-194344-arnaudb.json
- 19:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1246.eqiad.wmnet with reason: Maintenance
- 19:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1246.eqiad.wmnet with reason: Maintenance
- 19:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 19:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 19:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367781)', diff saved to https://phabricator.wikimedia.org/P66529 and previous config saved to /var/cache/conftool/dbconfig/20240715-194257-arnaudb.json
- 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66528 and previous config saved to /var/cache/conftool/dbconfig/20240715-194004-root.json
- 19:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66527 and previous config saved to /var/cache/conftool/dbconfig/20240715-193953-root.json
- 19:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P66526 and previous config saved to /var/cache/conftool/dbconfig/20240715-192750-arnaudb.json
- 19:25 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@9ad2bec]: 0.3.144 (duration: 08m 31s)
- 19:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66525 and previous config saved to /var/cache/conftool/dbconfig/20240715-192458-root.json
- 19:24 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic109[8-9]* for T348977 - bking@cumin2002
- 19:24 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic109[8-9]* for T348977 - bking@cumin2002
- 19:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66524 and previous config saved to /var/cache/conftool/dbconfig/20240715-192448-root.json
- 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1098-1099].eqiad.wmnet with reason: T348977
- 19:23 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1098-1099].eqiad.wmnet with reason: T348977
- 19:17 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.144` on canary `wdqs1016`; proceeding to rest of fleet
- 19:16 ryankemper@deploy1002: Started deploy [wdqs/wdqs@9ad2bec]: 0.3.144
- 19:16 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.144`. Pre-deploy tests passing on canary `wdqs1016`
- 19:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P66523 and previous config saved to /var/cache/conftool/dbconfig/20240715-191243-arnaudb.json
- 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66522 and previous config saved to /var/cache/conftool/dbconfig/20240715-190953-root.json
- 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66521 and previous config saved to /var/cache/conftool/dbconfig/20240715-190942-root.json
- 18:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367781)', diff saved to https://phabricator.wikimedia.org/P66520 and previous config saved to /var/cache/conftool/dbconfig/20240715-185736-arnaudb.json
- 18:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T367781)', diff saved to https://phabricator.wikimedia.org/P66519 and previous config saved to /var/cache/conftool/dbconfig/20240715-185521-arnaudb.json
- 18:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1233.eqiad.wmnet with reason: Maintenance
- 18:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1233.eqiad.wmnet with reason: Maintenance
- 18:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367781)', diff saved to https://phabricator.wikimedia.org/P66518 and previous config saved to /var/cache/conftool/dbconfig/20240715-185459-arnaudb.json
- 18:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66517 and previous config saved to /var/cache/conftool/dbconfig/20240715-185447-root.json
- 18:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66516 and previous config saved to /var/cache/conftool/dbconfig/20240715-185437-root.json
- 18:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P66515 and previous config saved to /var/cache/conftool/dbconfig/20240715-183952-arnaudb.json
- 18:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66514 and previous config saved to /var/cache/conftool/dbconfig/20240715-183942-root.json
- 18:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66513 and previous config saved to /var/cache/conftool/dbconfig/20240715-183931-root.json
- 18:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P66512 and previous config saved to /var/cache/conftool/dbconfig/20240715-182444-arnaudb.json
- 18:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66511 and previous config saved to /var/cache/conftool/dbconfig/20240715-182436-root.json
- 18:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66510 and previous config saved to /var/cache/conftool/dbconfig/20240715-182426-root.json
- 18:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367781)', diff saved to https://phabricator.wikimedia.org/P66509 and previous config saved to /var/cache/conftool/dbconfig/20240715-180937-arnaudb.json
- 18:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T367781)', diff saved to https://phabricator.wikimedia.org/P66508 and previous config saved to /var/cache/conftool/dbconfig/20240715-180726-arnaudb.json
- 18:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1229.eqiad.wmnet with reason: Maintenance
- 18:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1229.eqiad.wmnet with reason: Maintenance
- 18:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 18:06 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 18:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367781)', diff saved to https://phabricator.wikimedia.org/P66507 and previous config saved to /var/cache/conftool/dbconfig/20240715-180640-arnaudb.json
- 18:04 herron: upgraded prometheus-ipmi-exporter to 1.8.0 T368088
- 17:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P66506 and previous config saved to /var/cache/conftool/dbconfig/20240715-175133-arnaudb.json
- 17:41 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 10s)
- 17:40 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
- 17:38 ejegg: Fundraising python tools upgraded from 94bac5c6 to 490a7b3f
- 17:37 ejegg: SmashPig upgraded from 565c61e4 to f2aca230
- 17:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P66505 and previous config saved to /var/cache/conftool/dbconfig/20240715-173625-arnaudb.json
- 17:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367781)', diff saved to https://phabricator.wikimedia.org/P66504 and previous config saved to /var/cache/conftool/dbconfig/20240715-172118-arnaudb.json
- 17:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T367781)', diff saved to https://phabricator.wikimedia.org/P66503 and previous config saved to /var/cache/conftool/dbconfig/20240715-171908-arnaudb.json
- 17:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 17:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 17:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367781)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240715-171841-arnaudb.json
- 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P66501 and previous config saved to /var/cache/conftool/dbconfig/20240715-170334-arnaudb.json
- 16:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P66500 and previous config saved to /var/cache/conftool/dbconfig/20240715-164827-arnaudb.json
- 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367781)', diff saved to https://phabricator.wikimedia.org/P66499 and previous config saved to /var/cache/conftool/dbconfig/20240715-163320-arnaudb.json
- 16:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T367781)', diff saved to https://phabricator.wikimedia.org/P66498 and previous config saved to /var/cache/conftool/dbconfig/20240715-163110-arnaudb.json
- 16:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 16:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 16:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66497 and previous config saved to /var/cache/conftool/dbconfig/20240715-163048-arnaudb.json
- 16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P66496 and previous config saved to /var/cache/conftool/dbconfig/20240715-161541-arnaudb.json
- 16:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P66495 and previous config saved to /var/cache/conftool/dbconfig/20240715-160033-arnaudb.json
- 15:47 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 15:47 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66494 and previous config saved to /var/cache/conftool/dbconfig/20240715-154526-arnaudb.json
- 15:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66493 and previous config saved to /var/cache/conftool/dbconfig/20240715-154312-arnaudb.json
- 15:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 15:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66492 and previous config saved to /var/cache/conftool/dbconfig/20240715-154250-arnaudb.json
- 15:32 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
- 15:31 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
- 15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P66491 and previous config saved to /var/cache/conftool/dbconfig/20240715-152742-arnaudb.json
- 15:17 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 15:16 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 15:16 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 15:14 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 31s)
- 15:13 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
- 15:13 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P66490 and previous config saved to /var/cache/conftool/dbconfig/20240715-151235-arnaudb.json
- 15:12 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 15:12 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 15:09 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 15:07 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66489 and previous config saved to /var/cache/conftool/dbconfig/20240715-145728-arnaudb.json
- 14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66488 and previous config saved to /var/cache/conftool/dbconfig/20240715-145517-arnaudb.json
- 14:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 14:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 14:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66487 and previous config saved to /var/cache/conftool/dbconfig/20240715-145455-arnaudb.json
- 14:50 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
- 14:50 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
- 14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P66486 and previous config saved to /var/cache/conftool/dbconfig/20240715-143948-arnaudb.json
- 14:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P66485 and previous config saved to /var/cache/conftool/dbconfig/20240715-142441-arnaudb.json
- 14:16 _joe_: updating conftool to 3.1.0 fleet wide
- 14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2005.codfw.wmnet with OS bookworm
- 14:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66484 and previous config saved to /var/cache/conftool/dbconfig/20240715-140934-arnaudb.json
- 14:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66483 and previous config saved to /var/cache/conftool/dbconfig/20240715-140720-arnaudb.json
- 14:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 14:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 14:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1156.eqiad.wmnet with reason: Maintenance
- 14:06 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1156.eqiad.wmnet with reason: Maintenance
- 13:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2005.codfw.wmnet with reason: host reimage
- 13:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2005.codfw.wmnet with reason: host reimage
- 13:53 oblivian@puppetmaster2001: conftool action : set/pooled=yes; selector: name=mw1386.*,cluster=kubernetes,dc=eqiad [reason: Test conftool sal logging]
- 13:51 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 13:51 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 13:50 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netboxdb2003.codfw.wmnet with reason: netbox upgrade prep work
- 13:50 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netboxdb2003.codfw.wmnet with reason: netbox upgrade prep work
- 13:45 _joe_: uploading conftool 3.1.0 to bookworm,bullseye,buster
- 13:41 Lucas_WMDE: UTC afternoon backport+config window done
- 13:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2005.codfw.wmnet with OS bookworm
- 13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add entity-schema to $wgWBRepoSettings['searchIndexTypes'] (T369495) (duration: 30m 51s)
- 13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
- 13:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Add entity-schema to $wgWBRepoSettings['searchIndexTypes'] (T369495) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:02 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Add entity-schema to $wgWBRepoSettings['searchIndexTypes'] (T369495)
- 12:41 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:41 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 12:41 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:40 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 12:30 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 12:30 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 12:30 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 12:30 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 12:16 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 12:15 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 11:32 marostegui: test
- 11:31 marostegui: Reboot stashbot
- 11:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 11:11 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 11:11 claime: Increasing webVideoTranscodePrioritized concurrency in changeprop-jobqueue
- 11:09 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 11:08 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 11:08 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66480 and previous config saved to /var/cache/conftool/dbconfig/20240715-102117-marostegui.json
- 10:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 10:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 09:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52999
- 09:59 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52999
- 09:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270361
- 09:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 270361
- 09:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262293
- 09:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262293
- 09:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61941
- 09:57 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61941
- 09:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 49544
- 09:54 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 49544
- 09:29 claime: manually removing mw1349.eqiad.wmnet mw1350.eqiad.wmnet mw1351.eqiad.wmnet from k8s following reimage to videoscalers - T351074
- 09:25 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:22 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 09:19 marostegui: Deploy schema change on s7 eqiad db1170 dbmaint T367856
- 09:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Long schema change
- 09:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Long schema change
- 09:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66479 and previous config saved to /var/cache/conftool/dbconfig/20240715-091800-marostegui.json
- 09:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 09:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 09:16 elukey@cumin1002: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-d3-codfw
- 09:15 marostegui: Deploy schema change on s7 codfw db2121 dbmaint T367856
- 09:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Long schema change
- 09:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Long schema change
- 09:14 elukey@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d3-codfw
- 09:05 volans@cumin1002: dbctl commit (dc=all): 'Depool db2121 T369882', diff saved to https://phabricator.wikimedia.org/P66478 and previous config saved to /var/cache/conftool/dbconfig/20240715-090532-volans.json
- 08:56 volans@cumin1002: dbctl commit (dc=all): 'Promote db2218 to s7 primary T369882', diff saved to https://phabricator.wikimedia.org/P66477 and previous config saved to /var/cache/conftool/dbconfig/20240715-085654-volans.json
- 08:51 volans: Starting s7 codfw failover from db2121 to db2218 - T369882
- 08:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp2004.wikimedia.org
- 08:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp2004.wikimedia.org with OS bookworm
- 08:22 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52468
- 08:21 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 52468
- 08:16 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp2004.wikimedia.org with reason: host reimage
- 08:13 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp2004.wikimedia.org with reason: host reimage
- 08:12 volans@cumin2002: dbctl commit (dc=all): 'Remove db2218 from API T369882', diff saved to https://phabricator.wikimedia.org/P66475 and previous config saved to /var/cache/conftool/dbconfig/20240715-081252-volans.json
- 08:09 volans@cumin2002: dbctl commit (dc=all): 'Set db2218 with weight 0 T369882', diff saved to https://phabricator.wikimedia.org/P66474 and previous config saved to /var/cache/conftool/dbconfig/20240715-080948-volans.json
- 08:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T369882
- 08:04 volans@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T369882
- 07:58 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp2004.wikimedia.org - slyngshede@cumin1002"
- 07:57 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp2004.wikimedia.org - slyngshede@cumin1002"
- 07:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp2004.wikimedia.org on all recursors
- 07:57 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp2004.wikimedia.org on all recursors
- 07:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp2004.wikimedia.org - slyngshede@cumin1002"
- 07:55 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp2004.wikimedia.org - slyngshede@cumin1002"
- 07:53 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
- 07:53 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp2004.wikimedia.org
- 07:36 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp1004.wikimedia.org
- 07:36 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp1004.wikimedia.org with OS bookworm
- 07:21 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp1004.wikimedia.org with reason: host reimage
- 07:17 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp1004.wikimedia.org with reason: host reimage
- 07:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1179.eqiad.wmnet with reason: T369855
- 07:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1179.eqiad.wmnet with reason: T369855
- 07:06 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp1004.wikimedia.org with OS bookworm
- 07:05 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp1004.wikimedia.org - slyngshede@cumin1002"
- 07:04 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp1004.wikimedia.org - slyngshede@cumin1002"
- 07:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp1004.wikimedia.org on all recursors
- 07:04 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp1004.wikimedia.org on all recursors
- 07:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp1004.wikimedia.org - slyngshede@cumin1002"
- 07:03 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp1004.wikimedia.org - slyngshede@cumin1002"
- 07:01 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
- 07:00 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp1004.wikimedia.org
- 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repool db2136', diff saved to https://phabricator.wikimedia.org/P66473 and previous config saved to /var/cache/conftool/dbconfig/20240715-062216-root.json
- 06:07 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 06:07 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 06:07 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 06:06 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 06:06 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 06:06 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 05:12 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host dbproxy2005.codfw.wmnet with OS bookworm
- 04:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2137 (T367856)', diff saved to https://phabricator.wikimedia.org/P66472 and previous config saved to /var/cache/conftool/dbconfig/20240715-044723-marostegui.json
- 04:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
- 04:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
- 04:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 04:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove IPV6 for dbproxy200[5-8] - pt1979@cumin2002"
- 04:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove IPV6 for dbproxy200[5-8] - pt1979@cumin2002"
- 04:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 02:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 02:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 02:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367856)', diff saved to https://phabricator.wikimedia.org/P66471 and previous config saved to /var/cache/conftool/dbconfig/20240715-021121-marostegui.json
- 01:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P66470 and previous config saved to /var/cache/conftool/dbconfig/20240715-015613-marostegui.json
- 01:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P66469 and previous config saved to /var/cache/conftool/dbconfig/20240715-014106-marostegui.json
- 01:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367856)', diff saved to https://phabricator.wikimedia.org/P66467 and previous config saved to /var/cache/conftool/dbconfig/20240715-012559-marostegui.json
2024-07-14
- 22:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T367856)', diff saved to https://phabricator.wikimedia.org/P66466 and previous config saved to /var/cache/conftool/dbconfig/20240714-223146-marostegui.json
- 22:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 22:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 22:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T367856)', diff saved to https://phabricator.wikimedia.org/P66465 and previous config saved to /var/cache/conftool/dbconfig/20240714-223124-marostegui.json
- 22:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P66464 and previous config saved to /var/cache/conftool/dbconfig/20240714-221617-marostegui.json
- 22:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P66463 and previous config saved to /var/cache/conftool/dbconfig/20240714-220110-marostegui.json
- 21:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T367856)', diff saved to https://phabricator.wikimedia.org/P66462 and previous config saved to /var/cache/conftool/dbconfig/20240714-214603-marostegui.json
- 17:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66461 and previous config saved to /var/cache/conftool/dbconfig/20240714-175827-root.json
- 17:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66460 and previous config saved to /var/cache/conftool/dbconfig/20240714-174322-root.json
- 17:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66459 and previous config saved to /var/cache/conftool/dbconfig/20240714-172816-root.json
- 17:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66458 and previous config saved to /var/cache/conftool/dbconfig/20240714-171311-root.json
- 16:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66457 and previous config saved to /var/cache/conftool/dbconfig/20240714-165805-root.json
- 16:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66456 and previous config saved to /var/cache/conftool/dbconfig/20240714-164300-root.json
- 16:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
- 16:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
- 16:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66455 and previous config saved to /var/cache/conftool/dbconfig/20240714-162755-root.json
- 14:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T367856)', diff saved to https://phabricator.wikimedia.org/P66454 and previous config saved to /var/cache/conftool/dbconfig/20240714-140046-marostegui.json
- 14:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
- 14:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
- 14:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T367856)', diff saved to https://phabricator.wikimedia.org/P66453 and previous config saved to /var/cache/conftool/dbconfig/20240714-140024-marostegui.json
- 13:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P66452 and previous config saved to /var/cache/conftool/dbconfig/20240714-134517-marostegui.json
- 13:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P66451 and previous config saved to /var/cache/conftool/dbconfig/20240714-133010-marostegui.json
- 13:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T367856)', diff saved to https://phabricator.wikimedia.org/P66450 and previous config saved to /var/cache/conftool/dbconfig/20240714-131502-marostegui.json
- 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T367856)', diff saved to https://phabricator.wikimedia.org/P66449 and previous config saved to /var/cache/conftool/dbconfig/20240714-093540-marostegui.json
- 09:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 09:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66448 and previous config saved to /var/cache/conftool/dbconfig/20240714-093518-marostegui.json
- 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P66447 and previous config saved to /var/cache/conftool/dbconfig/20240714-092011-marostegui.json
- 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P66446 and previous config saved to /var/cache/conftool/dbconfig/20240714-090504-marostegui.json
- 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66445 and previous config saved to /var/cache/conftool/dbconfig/20240714-084956-marostegui.json
- 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T367856)', diff saved to https://phabricator.wikimedia.org/P66444 and previous config saved to /var/cache/conftool/dbconfig/20240714-084903-marostegui.json
- 08:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 08:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66443 and previous config saved to /var/cache/conftool/dbconfig/20240714-054611-marostegui.json
- 05:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 05:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T367856)', diff saved to https://phabricator.wikimedia.org/P66442 and previous config saved to /var/cache/conftool/dbconfig/20240714-054549-marostegui.json
- 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P66441 and previous config saved to /var/cache/conftool/dbconfig/20240714-053042-marostegui.json
- 05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P66440 and previous config saved to /var/cache/conftool/dbconfig/20240714-051535-marostegui.json
- 05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T367856)', diff saved to https://phabricator.wikimedia.org/P66439 and previous config saved to /var/cache/conftool/dbconfig/20240714-050027-marostegui.json
- 01:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T367856)', diff saved to https://phabricator.wikimedia.org/P66438 and previous config saved to /var/cache/conftool/dbconfig/20240714-015901-marostegui.json
- 01:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 01:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 01:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66437 and previous config saved to /var/cache/conftool/dbconfig/20240714-015838-marostegui.json
- 01:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P66436 and previous config saved to /var/cache/conftool/dbconfig/20240714-014331-marostegui.json
- 01:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P66435 and previous config saved to /var/cache/conftool/dbconfig/20240714-012824-marostegui.json
- 01:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66434 and previous config saved to /var/cache/conftool/dbconfig/20240714-011317-marostegui.json
- 00:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66433 and previous config saved to /var/cache/conftool/dbconfig/20240714-001301-marostegui.json
- 00:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 00:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
2024-07-13
- 15:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 15:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 15:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66432 and previous config saved to /var/cache/conftool/dbconfig/20240713-155158-marostegui.json
- 15:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P66431 and previous config saved to /var/cache/conftool/dbconfig/20240713-153650-marostegui.json
- 15:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P66430 and previous config saved to /var/cache/conftool/dbconfig/20240713-152143-marostegui.json
- 15:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66429 and previous config saved to /var/cache/conftool/dbconfig/20240713-150636-marostegui.json
- 14:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66428 and previous config saved to /var/cache/conftool/dbconfig/20240713-140620-marostegui.json
- 14:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 14:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 13:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 13:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 10:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 10:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 10:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 10:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 06:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 06:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T367856)', diff saved to https://phabricator.wikimedia.org/P66427 and previous config saved to /var/cache/conftool/dbconfig/20240713-061928-marostegui.json
- 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P66426 and previous config saved to /var/cache/conftool/dbconfig/20240713-060421-marostegui.json
- 05:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P66425 and previous config saved to /var/cache/conftool/dbconfig/20240713-054913-marostegui.json
- 05:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T367856)', diff saved to https://phabricator.wikimedia.org/P66424 and previous config saved to /var/cache/conftool/dbconfig/20240713-053406-marostegui.json
- 01:33 tzatziki: removing 2 files for legal compliance
- 01:22 tzatziki: removing 16 files for legal compliance
- 00:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T367856)', diff saved to https://phabricator.wikimedia.org/P66423 and previous config saved to /var/cache/conftool/dbconfig/20240713-000433-marostegui.json
2024-07-12
- 23:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P66422 and previous config saved to /var/cache/conftool/dbconfig/20240712-234926-marostegui.json
- 23:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P66421 and previous config saved to /var/cache/conftool/dbconfig/20240712-233419-marostegui.json
- 23:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T367856)', diff saved to https://phabricator.wikimedia.org/P66420 and previous config saved to /var/cache/conftool/dbconfig/20240712-231912-marostegui.json
- 22:34 tzatziki: removing 1 file for legal compliance
- 22:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T367856)', diff saved to https://phabricator.wikimedia.org/P66419 and previous config saved to /var/cache/conftool/dbconfig/20240712-223226-marostegui.json
- 22:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 22:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 22:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66418 and previous config saved to /var/cache/conftool/dbconfig/20240712-223204-marostegui.json
- 22:21 tzatziki: removing 1 file for legal compliance
- 22:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66417 and previous config saved to /var/cache/conftool/dbconfig/20240712-221656-marostegui.json
- 22:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66416 and previous config saved to /var/cache/conftool/dbconfig/20240712-220149-marostegui.json
- 21:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66415 and previous config saved to /var/cache/conftool/dbconfig/20240712-214642-marostegui.json
- 19:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66414 and previous config saved to /var/cache/conftool/dbconfig/20240712-190224-marostegui.json
- 19:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 19:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 19:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 19:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 19:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367856)', diff saved to https://phabricator.wikimedia.org/P66413 and previous config saved to /var/cache/conftool/dbconfig/20240712-190154-marostegui.json
- 18:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66412 and previous config saved to /var/cache/conftool/dbconfig/20240712-184647-marostegui.json
- 18:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66411 and previous config saved to /var/cache/conftool/dbconfig/20240712-183140-marostegui.json
- 18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367856)', diff saved to https://phabricator.wikimedia.org/P66410 and previous config saved to /var/cache/conftool/dbconfig/20240712-181632-marostegui.json
- 17:10 hnowlan@cumin1002: conftool action : set/pooled=yes:weight=10; selector: name=(mw1349.eqiad.wmnet|mw1350.eqiad.wmnet|mw1351.eqiad.wmnet)
- 17:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1349.eqiad.wmnet
- 17:07 hnowlan@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1349.eqiad.wmnet
- 17:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1350-1351].eqiad.wmnet
- 17:07 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw[1350-1351].eqiad.wmnet
- 17:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1351.eqiad.wmnet with OS buster
- 17:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1350.eqiad.wmnet with OS buster
- 17:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1349.eqiad.wmnet with OS buster
- 16:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1351.eqiad.wmnet with reason: host reimage
- 16:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1350.eqiad.wmnet with reason: host reimage
- 16:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
- 16:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1351.eqiad.wmnet with reason: host reimage
- 16:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1350.eqiad.wmnet with reason: host reimage
- 16:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
- 16:17 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 16:16 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 16:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1351.eqiad.wmnet with OS buster
- 16:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1350.eqiad.wmnet with OS buster
- 16:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1349.eqiad.wmnet with OS buster
- 16:05 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(mw1349|mw1350|mw1351).eqiad.wmnet,cluster=(jobrunner|videoscaler)
- 16:05 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=(mw1349|mw1350|mw1351).eqiad.wmnet,cluster=(jobrunner|videoscaler)
- 16:04 claime: pooling mw1349, mw1350, mw1351 as jobrunners
- 16:03 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1349|mw1350|mw1351).eqiad.wmnet,cluster=(jobrunner|videoscaler)
- 16:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1351.eqiad.wmnet with OS buster
- 16:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1350.eqiad.wmnet
- 16:01 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1350.eqiad.wmnet
- 16:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1349.eqiad.wmnet
- 16:00 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1349.eqiad.wmnet
- 15:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1350.eqiad.wmnet with OS buster
- 15:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1349.eqiad.wmnet with OS buster
- 15:57 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1349|mw1350|mw1351).eqiad.wmnet,cluster=jobrunner
- 15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2001.codfw.wmnet
- 15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T367856)', diff saved to https://phabricator.wikimedia.org/P66408 and previous config saved to /var/cache/conftool/dbconfig/20240712-154954-marostegui.json
- 15:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 15:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T367856)', diff saved to https://phabricator.wikimedia.org/P66407 and previous config saved to /var/cache/conftool/dbconfig/20240712-154921-marostegui.json
- 15:47 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 15:47 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 15:46 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
- 15:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2001.codfw.wmnet
- 15:46 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
- 15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P66406 and previous config saved to /var/cache/conftool/dbconfig/20240712-153414-marostegui.json
- 15:33 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
- 15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1351.eqiad.wmnet with reason: host reimage
- 15:32 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
- 15:26 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
- 15:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1350.eqiad.wmnet with reason: host reimage
- 15:25 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
- 15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
- 15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1351.eqiad.wmnet with reason: host reimage
- 15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1350.eqiad.wmnet with reason: host reimage
- 15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
- 15:20 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 15:20 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P66405 and previous config saved to /var/cache/conftool/dbconfig/20240712-151907-marostegui.json
- 15:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:17 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 15:17 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 15:15 hnowlan: homer 'cr*eqiad*' commit 'videoscaler reimages mw1349/mw135[01]'
- 15:08 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 15:07 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 15:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1351.eqiad.wmnet with OS buster
- 15:06 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1350.eqiad.wmnet with OS buster
- 15:06 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1349.eqiad.wmnet with OS buster
- 15:04 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 15:04 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T367856)', diff saved to https://phabricator.wikimedia.org/P66404 and previous config saved to /var/cache/conftool/dbconfig/20240712-150400-marostegui.json
- 15:03 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 15:02 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 14:58 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(mw1349.eqiad.wmnet|mw1350.eqiad.wmnet|mw1351.eqiad.wmnet),cluster=kubernetes,service=kubesvc
- 14:55 claime: Draining and depooling mw1349, mw1350, mw1351 for reimage as jobrunners
- 14:36 elukey@cumin1002: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-d3-codfw
- 14:34 elukey@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d3-codfw
- 14:20 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 14:19 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 14:19 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 14:18 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 13:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 13:22 cdanis@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 13:21 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 13:21 cdanis@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 13:21 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 13:19 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 13:18 cdanis@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 13:18 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 13:12 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
- 13:10 topranks: pushing updated BGP policy to cr2-eqord and cr2-eqdfw to announce Anycast ranges from network pops (T367439)
- 10:24 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66396 and previous config saved to /var/cache/conftool/dbconfig/20240712-102416-arnaudb.json
- 10:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T367856)', diff saved to https://phabricator.wikimedia.org/P66395 and previous config saved to /var/cache/conftool/dbconfig/20240712-102243-marostegui.json
- 10:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 10:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 10:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367856)', diff saved to https://phabricator.wikimedia.org/P66394 and previous config saved to /var/cache/conftool/dbconfig/20240712-102221-marostegui.json
- 10:18 godog: stop benthos@webrequest_live on centrallog2002 and start it on centrallog1002 - T369737
- 10:09 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66393 and previous config saved to /var/cache/conftool/dbconfig/20240712-100910-arnaudb.json
- 10:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66392 and previous config saved to /var/cache/conftool/dbconfig/20240712-100714-marostegui.json
- 09:54 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66391 and previous config saved to /var/cache/conftool/dbconfig/20240712-095405-arnaudb.json
- 09:53 godog: temp stop benthos@webrequest_live on centrallog1002 - T369737
- 09:52 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 09:52 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66389 and previous config saved to /var/cache/conftool/dbconfig/20240712-095207-marostegui.json
- 09:39 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66388 and previous config saved to /var/cache/conftool/dbconfig/20240712-093900-arnaudb.json
- 09:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367856)', diff saved to https://phabricator.wikimedia.org/P66387 and previous config saved to /var/cache/conftool/dbconfig/20240712-093700-marostegui.json
- 09:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66386 and previous config saved to /var/cache/conftool/dbconfig/20240712-092354-arnaudb.json
- 09:20 dcausse@deploy1002: Finished scap: Backport for Re-add CirrusSearch prefix to statsd metrics (T359033) (duration: 09m 44s)
- 09:15 dcausse@deploy1002: dcausse: Continuing with sync
- 09:13 dcausse@deploy1002: dcausse: Backport for Re-add CirrusSearch prefix to statsd metrics (T359033) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:10 dcausse@deploy1002: Started scap sync-world: Backport for Re-add CirrusSearch prefix to statsd metrics (T359033)
- 09:10 elukey: upgrade httpd version in production (bullseye/bookworm) for T369885
- 09:08 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66385 and previous config saved to /var/cache/conftool/dbconfig/20240712-090849-arnaudb.json
- 09:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T367781)', diff saved to https://phabricator.wikimedia.org/P66384 and previous config saved to /var/cache/conftool/dbconfig/20240712-090527-arnaudb.json
- 09:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 09:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 09:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 09:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 08:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1179.eqiad.wmnet with reason: T369855
- 08:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1179.eqiad.wmnet with reason: T369855
- 08:42 godog: tweak benthos@webrequest_live output batching on centrallog2001 - T369737
- 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T367856)', diff saved to https://phabricator.wikimedia.org/P66383 and previous config saved to /var/cache/conftool/dbconfig/20240712-083644-marostegui.json
- 08:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 08:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367856)', diff saved to https://phabricator.wikimedia.org/P66382 and previous config saved to /var/cache/conftool/dbconfig/20240712-083621-marostegui.json
- 08:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66381 and previous config saved to /var/cache/conftool/dbconfig/20240712-082114-marostegui.json
- 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66380 and previous config saved to /var/cache/conftool/dbconfig/20240712-080607-marostegui.json
- 07:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367856)', diff saved to https://phabricator.wikimedia.org/P66379 and previous config saved to /var/cache/conftool/dbconfig/20240712-075100-marostegui.json
- 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T367856)', diff saved to https://phabricator.wikimedia.org/P66377 and previous config saved to /var/cache/conftool/dbconfig/20240712-073102-marostegui.json
- 07:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
- 07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
- 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T367856)', diff saved to https://phabricator.wikimedia.org/P66376 and previous config saved to /var/cache/conftool/dbconfig/20240712-073040-marostegui.json
- 07:30 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 07:24 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P66375 and previous config saved to /var/cache/conftool/dbconfig/20240712-071533-marostegui.json
- 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P66374 and previous config saved to /var/cache/conftool/dbconfig/20240712-070026-marostegui.json
- 06:37 Dreamy_Jazz: Starting MediaModeration scan on commons after it crashed last night due to database issues - https://wikitech.wikimedia.org/wiki/MediaModeration
- 06:18 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66372 and previous config saved to /var/cache/conftool/dbconfig/20240712-061835-root.json
- 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66371 and previous config saved to /var/cache/conftool/dbconfig/20240712-060329-root.json
- 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66370 and previous config saved to /var/cache/conftool/dbconfig/20240712-054824-root.json
- 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66369 and previous config saved to /var/cache/conftool/dbconfig/20240712-053318-root.json
- 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66368 and previous config saved to /var/cache/conftool/dbconfig/20240712-051813-root.json
- 05:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P66367 and previous config saved to /var/cache/conftool/dbconfig/20240712-050800-root.json
- 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66366 and previous config saved to /var/cache/conftool/dbconfig/20240712-050307-root.json
- 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66365 and previous config saved to /var/cache/conftool/dbconfig/20240712-044802-root.json
- 03:52 ayounsi@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host netboxdb2003.codfw.wmnet
- 03:52 ayounsi@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host netboxdb2003.codfw.wmnet with OS bookworm
- 00:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T367856)', diff saved to https://phabricator.wikimedia.org/P66364 and previous config saved to /var/cache/conftool/dbconfig/20240712-000131-marostegui.json
- 00:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 00:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 00:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367856)', diff saved to https://phabricator.wikimedia.org/P66363 and previous config saved to /var/cache/conftool/dbconfig/20240712-000109-marostegui.json
2024-07-11
- 23:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66362 and previous config saved to /var/cache/conftool/dbconfig/20240711-234602-marostegui.json
- 23:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66361 and previous config saved to /var/cache/conftool/dbconfig/20240711-233712-arnaudb.json
- 23:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66360 and previous config saved to /var/cache/conftool/dbconfig/20240711-233054-marostegui.json
- 23:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T367856)', diff saved to https://phabricator.wikimedia.org/P66359 and previous config saved to /var/cache/conftool/dbconfig/20240711-232218-marostegui.json
- 23:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
- 23:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P66358 and previous config saved to /var/cache/conftool/dbconfig/20240711-232205-arnaudb.json
- 23:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
- 23:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367856)', diff saved to https://phabricator.wikimedia.org/P66357 and previous config saved to /var/cache/conftool/dbconfig/20240711-231547-marostegui.json
- 23:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P66356 and previous config saved to /var/cache/conftool/dbconfig/20240711-230657-arnaudb.json
- 23:06 zabe@deploy1002: Finished scap: update interwiki cache (duration: 07m 37s)
- 22:59 zabe@deploy1002: Started scap sync-world: update interwiki cache
- 22:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66355 and previous config saved to /var/cache/conftool/dbconfig/20240711-225150-arnaudb.json
- 22:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66354 and previous config saved to /var/cache/conftool/dbconfig/20240711-224858-arnaudb.json
- 22:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2213.codfw.wmnet with reason: Maintenance
- 22:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2213.codfw.wmnet with reason: Maintenance
- 22:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66353 and previous config saved to /var/cache/conftool/dbconfig/20240711-224836-arnaudb.json
- 22:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P66352 and previous config saved to /var/cache/conftool/dbconfig/20240711-223329-arnaudb.json
- 22:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:27 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove IPV6 for dbproxy200[5-8] - pt1979@cumin2002"
- 22:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove IPV6 for dbproxy200[5-8] - pt1979@cumin2002"
- 22:23 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 22:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P66351 and previous config saved to /var/cache/conftool/dbconfig/20240711-221822-arnaudb.json
- 22:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66350 and previous config saved to /var/cache/conftool/dbconfig/20240711-220315-arnaudb.json
- 21:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66349 and previous config saved to /var/cache/conftool/dbconfig/20240711-215921-arnaudb.json
- 21:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2211.codfw.wmnet with reason: Maintenance
- 21:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2211.codfw.wmnet with reason: Maintenance
- 21:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2201.codfw.wmnet with reason: Maintenance
- 21:57 rzl: systemctl restart apache2 on mwdebug1002, mwdebug2001, mwdebug2002 for https://gerrit.wikimedia.org/r/1052128
- 21:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2201.codfw.wmnet with reason: Maintenance
- 21:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66348 and previous config saved to /var/cache/conftool/dbconfig/20240711-215700-arnaudb.json
- 21:44 rzl: rzl@mwdebug1002:~$ sudo apache2ctl restart
- 21:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P66347 and previous config saved to /var/cache/conftool/dbconfig/20240711-214153-arnaudb.json
- 21:38 jhathaway: upgrading exim4 to 4.94.2-7+deb11u3
- 21:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P66346 and previous config saved to /var/cache/conftool/dbconfig/20240711-212646-arnaudb.json
- 21:13 catrope@deploy1002: Finished scap: Backport for Change Linter log level to info (duration: 14m 40s)
- 21:11 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 21:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66345 and previous config saved to /var/cache/conftool/dbconfig/20240711-211138-arnaudb.json
- 21:11 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
- 21:08 catrope@deploy1002: arlolra, catrope: Continuing with sync
- 21:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66344 and previous config saved to /var/cache/conftool/dbconfig/20240711-210747-arnaudb.json
- 21:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2192.codfw.wmnet with reason: Maintenance
- 21:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2192.codfw.wmnet with reason: Maintenance
- 21:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66343 and previous config saved to /var/cache/conftool/dbconfig/20240711-210725-arnaudb.json
- 21:05 catrope@deploy1002: arlolra, catrope: Backport for Change Linter log level to info synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:59 catrope@deploy1002: Started scap sync-world: Backport for Change Linter log level to info
- 20:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P66342 and previous config saved to /var/cache/conftool/dbconfig/20240711-205218-arnaudb.json
- 20:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P66341 and previous config saved to /var/cache/conftool/dbconfig/20240711-203711-arnaudb.json
- 20:37 catrope@deploy1002: Finished scap: Backport for Vector theme should default to day (T369833) (duration: 17m 09s)
- 20:32 catrope@deploy1002: jdlrobson, catrope: Continuing with sync
- 20:30 catrope@deploy1002: jdlrobson, catrope: Backport for Vector theme should default to day (T369833) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2005.codfw.wmnet with OS bookworm
- 20:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 20:26 eileen: config revision changed from 540f27e6 to c25da839 renable silverpop_daily
- 20:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66340 and previous config saved to /var/cache/conftool/dbconfig/20240711-202204-arnaudb.json
- 20:19 catrope@deploy1002: Started scap sync-world: Backport for Vector theme should default to day (T369833)
- 20:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66339 and previous config saved to /var/cache/conftool/dbconfig/20240711-201815-arnaudb.json
- 20:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 20:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 20:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367781)', diff saved to https://phabricator.wikimedia.org/P66338 and previous config saved to /var/cache/conftool/dbconfig/20240711-201753-arnaudb.json
- 20:15 catrope@deploy1002: Finished scap: Backport for Graph: Fix JSON parse errors in Graph data source tracking (duration: 13m 32s)
- 20:10 catrope@deploy1002: catrope: Continuing with sync
- 20:08 catrope@deploy1002: catrope: Backport for Graph: Fix JSON parse errors in Graph data source tracking synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P66337 and previous config saved to /var/cache/conftool/dbconfig/20240711-200246-arnaudb.json
- 20:01 catrope@deploy1002: Started scap sync-world: Backport for Graph: Fix JSON parse errors in Graph data source tracking
- 19:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P66336 and previous config saved to /var/cache/conftool/dbconfig/20240711-194739-arnaudb.json
- 19:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367781)', diff saved to https://phabricator.wikimedia.org/P66335 and previous config saved to /var/cache/conftool/dbconfig/20240711-193231-arnaudb.json
- 19:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T367781)', diff saved to https://phabricator.wikimedia.org/P66334 and previous config saved to /var/cache/conftool/dbconfig/20240711-192842-arnaudb.json
- 19:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 19:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 19:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66333 and previous config saved to /var/cache/conftool/dbconfig/20240711-192820-arnaudb.json
- 19:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 19:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P66332 and previous config saved to /var/cache/conftool/dbconfig/20240711-191313-arnaudb.json
- 19:12 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 19:11 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 19:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2005.codfw.wmnet with reason: host reimage
- 19:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2005.codfw.wmnet with reason: host reimage
- 18:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P66331 and previous config saved to /var/cache/conftool/dbconfig/20240711-185805-arnaudb.json
- 18:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2005.codfw.wmnet with OS bookworm
- 18:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66330 and previous config saved to /var/cache/conftool/dbconfig/20240711-184258-arnaudb.json
- 18:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66329 and previous config saved to /var/cache/conftool/dbconfig/20240711-184009-arnaudb.json
- 18:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 18:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 18:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367781)', diff saved to https://phabricator.wikimedia.org/P66328 and previous config saved to /var/cache/conftool/dbconfig/20240711-183946-arnaudb.json
- 18:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P66327 and previous config saved to /var/cache/conftool/dbconfig/20240711-182438-arnaudb.json
- 18:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet
- 18:15 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
- 18:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P66326 and previous config saved to /var/cache/conftool/dbconfig/20240711-180931-arnaudb.json
- 18:00 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=93) on VRTS host vrts1001.eqiad.wmnet
- 18:00 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
- 17:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367781)', diff saved to https://phabricator.wikimedia.org/P66325 and previous config saved to /var/cache/conftool/dbconfig/20240711-175424-arnaudb.json
- 17:52 daniel@deploy1002: Finished scap: Backport for Enable Special:RestSandbox on testwiki (T362006) (duration: 11m 01s)
- 17:52 rzl@cumin2002: dbctl commit (dc=all): 'db1179 depooled', diff saved to https://phabricator.wikimedia.org/P66324 and previous config saved to /var/cache/conftool/dbconfig/20240711-175212-rzl.json
- 17:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T367781)', diff saved to https://phabricator.wikimedia.org/P66322 and previous config saved to /var/cache/conftool/dbconfig/20240711-175038-arnaudb.json
- 17:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 17:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 17:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
- 17:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
- 17:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 17:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 17:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 17:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 17:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66321 and previous config saved to /var/cache/conftool/dbconfig/20240711-174820-arnaudb.json
- 17:47 daniel@deploy1002: daniel: Continuing with sync
- 17:46 daniel@deploy1002: daniel: Backport for Enable Special:RestSandbox on testwiki (T362006) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 17:41 daniel@deploy1002: Started scap sync-world: Backport for Enable Special:RestSandbox on testwiki (T362006)
- 17:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P66319 and previous config saved to /var/cache/conftool/dbconfig/20240711-173313-arnaudb.json
- 17:28 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bullseye
- 17:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P66318 and previous config saved to /var/cache/conftool/dbconfig/20240711-171806-arnaudb.json
- 17:10 daniel@deploy1002: Started scap sync-world: Backport for Enable Special:RestSandbox on testwiki (T362006)
- 17:10 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 17:09 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 17:09 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 17:08 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 17:07 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 17:07 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 17:06 mutante: puppetmaster1001 - puppet cert clean aphlict..discovery.wmnet T369796 T360413
- 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66317 and previous config saved to /var/cache/conftool/dbconfig/20240711-170258-arnaudb.json
- 17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66316 and previous config saved to /var/cache/conftool/dbconfig/20240711-170030-arnaudb.json
- 17:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 17:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367781)', diff saved to https://phabricator.wikimedia.org/P66315 and previous config saved to /var/cache/conftool/dbconfig/20240711-170007-arnaudb.json
- 16:58 mutante: puppetmaster1001 - puppet cert clean phabricator.discovery.wmnet T369796 T360413
- 16:58 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 16:58 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 16:46 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 16:46 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P66314 and previous config saved to /var/cache/conftool/dbconfig/20240711-164500-arnaudb.json
- 16:40 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 16:40 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P66313 and previous config saved to /var/cache/conftool/dbconfig/20240711-162953-arnaudb.json
- 16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367781)', diff saved to https://phabricator.wikimedia.org/P66312 and previous config saved to /var/cache/conftool/dbconfig/20240711-161446-arnaudb.json
- 16:13 ejegg: payments-wiki upgraded from 4e48059a to c8edeb8e
- 16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T367781)', diff saved to https://phabricator.wikimedia.org/P66311 and previous config saved to /var/cache/conftool/dbconfig/20240711-161219-arnaudb.json
- 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 16:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367781)', diff saved to https://phabricator.wikimedia.org/P66310 and previous config saved to /var/cache/conftool/dbconfig/20240711-161157-arnaudb.json
- 16:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 16:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P66309 and previous config saved to /var/cache/conftool/dbconfig/20240711-155649-arnaudb.json
- 15:53 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 15:52 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 15:51 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
- 15:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 100%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66308 and previous config saved to /var/cache/conftool/dbconfig/20240711-155109-arnaudb.json
- 15:48 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
- 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P66307 and previous config saved to /var/cache/conftool/dbconfig/20240711-154142-arnaudb.json
- 15:41 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 15:40 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 15:36 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
- 15:36 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 75%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66306 and previous config saved to /var/cache/conftool/dbconfig/20240711-153604-arnaudb.json
- 15:31 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 15:30 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 15:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T367856)', diff saved to https://phabricator.wikimedia.org/P66305 and previous config saved to /var/cache/conftool/dbconfig/20240711-152946-marostegui.json
- 15:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 15:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367781)', diff saved to https://phabricator.wikimedia.org/P66304 and previous config saved to /var/cache/conftool/dbconfig/20240711-152635-arnaudb.json
- 15:26 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
- 15:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T367781)', diff saved to https://phabricator.wikimedia.org/P66303 and previous config saved to /var/cache/conftool/dbconfig/20240711-152412-arnaudb.json
- 15:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 15:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367781)', diff saved to https://phabricator.wikimedia.org/P66302 and previous config saved to /var/cache/conftool/dbconfig/20240711-152350-arnaudb.json
- 15:22 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 15:22 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 15:22 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
- 15:21 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 15:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 50%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66301 and previous config saved to /var/cache/conftool/dbconfig/20240711-152058-arnaudb.json
- 15:20 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 15:20 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
- 15:17 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
- 15:13 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
- 15:13 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
- 15:12 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
- 15:12 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
- 15:12 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
- 15:11 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
- 15:11 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
- 15:11 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
- 15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P66300 and previous config saved to /var/cache/conftool/dbconfig/20240711-150843-arnaudb.json
- 15:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 25%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66299 and previous config saved to /var/cache/conftool/dbconfig/20240711-150553-arnaudb.json
- 15:03 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 15:01 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 15:00 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 14:59 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 14:55 Emperor: repool ms-fe1014 and thanos-fe1004 before switch work T365996
- 14:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P66298 and previous config saved to /var/cache/conftool/dbconfig/20240711-145336-arnaudb.json
- 14:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 10%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66297 and previous config saved to /var/cache/conftool/dbconfig/20240711-145047-arnaudb.json
- 14:43 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 14:42 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 14:42 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 14:42 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 14:41 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 14:40 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 14:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367781)', diff saved to https://phabricator.wikimedia.org/P66296 and previous config saved to /var/cache/conftool/dbconfig/20240711-143829-arnaudb.json
- 14:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T367781)', diff saved to https://phabricator.wikimedia.org/P66295 and previous config saved to /var/cache/conftool/dbconfig/20240711-143606-arnaudb.json
- 14:35 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 14:35 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 14:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 5%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66294 and previous config saved to /var/cache/conftool/dbconfig/20240711-143541-arnaudb.json
- 14:35 godog: pool titan1001 for switch work T365996
- 14:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on backup1011.eqiad.wmnet,db1193.eqiad.wmnet,dbproxy1027.eqiad.wmnet with reason: T365996
- 14:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on backup1011.eqiad.wmnet,db1193.eqiad.wmnet,dbproxy1027.eqiad.wmnet with reason: T365996
- 14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'T365996 - depool db1193 - s8', diff saved to https://phabricator.wikimedia.org/P66293 and previous config saved to /var/cache/conftool/dbconfig/20240711-142544-arnaudb.json
- 14:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P66292 and previous config saved to /var/cache/conftool/dbconfig/20240711-142037-arnaudb.json
- 14:19 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 23 hosts with reason: JunOS upgrade lsw1-f1-eqiad
- 14:19 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 23 hosts with reason: JunOS upgrade lsw1-f1-eqiad
- 14:15 topranks: rebooting lsw1-f1-eqiad to install updated JunOS version T365996
- 14:12 godog: depool titan1001 for switch work T365996
- 14:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 23 hosts with reason: JunOS upgrade lsw1-f1-eqiad
- 14:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 23 hosts with reason: JunOS upgrade lsw1-f1-eqiad
- 14:09 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f1-eqiad,lsw1-f1-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f1-eqiad
- 14:08 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f1-eqiad,lsw1-f1-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f1-eqiad
- 14:08 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f1-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f1-eqiad
- 14:08 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f1-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f1-eqiad
- 14:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P66291 and previous config saved to /var/cache/conftool/dbconfig/20240711-140530-arnaudb.json
- 13:56 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 13:52 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T367781)', diff saved to https://phabricator.wikimedia.org/P66290 and previous config saved to /var/cache/conftool/dbconfig/20240711-135023-arnaudb.json
- 13:50 Emperor: depool ms-fe1014 and thanos-fe1004 before switch work T365996
- 13:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1183 (T367781)', diff saved to https://phabricator.wikimedia.org/P66289 and previous config saved to /var/cache/conftool/dbconfig/20240711-134759-arnaudb.json
- 13:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1183.eqiad.wmnet with reason: Maintenance
- 13:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1183.eqiad.wmnet with reason: Maintenance
- 13:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367781)', diff saved to https://phabricator.wikimedia.org/P66288 and previous config saved to /var/cache/conftool/dbconfig/20240711-134737-arnaudb.json
- 13:44 btullis@cumin1002: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
- 13:32 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 13:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P66287 and previous config saved to /var/cache/conftool/dbconfig/20240711-133229-arnaudb.json
- 13:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 13:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1090.eqiad.wmnet
- 13:26 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 13:22 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 13:20 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1090.eqiad.wmnet
- 13:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P66286 and previous config saved to /var/cache/conftool/dbconfig/20240711-131721-arnaudb.json
- 13:14 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
- 13:14 claime: Uncordoning and depooling kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet that were actually not concerned by T365996
- 13:13 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 13:12 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
- 13:10 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 13:09 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 13:08 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
- 13:05 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 13:04 claime: Cordoning and depooling kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet for T365996
- 13:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T368950
- 13:04 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 13:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T368950
- 13:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367781)', diff saved to https://phabricator.wikimedia.org/P66285 and previous config saved to /var/cache/conftool/dbconfig/20240711-130214-arnaudb.json
- 13:00 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 12:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T367781)', diff saved to https://phabricator.wikimedia.org/P66284 and previous config saved to /var/cache/conftool/dbconfig/20240711-125949-arnaudb.json
- 12:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 12:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 12:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 12:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 12:55 godog: reenable benthos@webrequest_live on centrallog2002 - T369737
- 12:51 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netboxdb2003.codfw.wmnet with reason: netbox upgrade prep work
- 12:51 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netboxdb2003.codfw.wmnet with reason: netbox upgrade prep work
- 12:51 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netboxdb1003.eqiad.wmnet with reason: netbox upgrade prep work
- 12:51 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 12:51 ayounsi@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netboxdb2003.codfw.wmnet with reason: host reimage
- 12:51 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netboxdb1003.eqiad.wmnet with reason: netbox upgrade prep work
- 12:50 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 12:50 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 12:50 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 12:50 claime: running puppet on O:analytics_cluster::turnilo,O:analytics_cluster::turnilo::staging
- 12:48 godog: temp stop benthos@webrequest_live on centrallog2002 - T369737
- 12:47 ayounsi@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netboxdb2003.codfw.wmnet with reason: host reimage
- 12:43 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 12:42 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 12:39 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4 days, 0:00:00 on netboxdb1003.eqiad.wmnet with reason: netbox upgrade prep work
- 12:39 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netboxdb1003.eqiad.wmnet with reason: netbox upgrade prep work
- 12:30 ayounsi@cumin2002: START - Cookbook sre.hosts.reimage for host netboxdb2003.codfw.wmnet with OS bookworm
- 12:30 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netboxdb2003.codfw.wmnet - ayounsi@cumin2002"
- 12:29 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netboxdb2003.codfw.wmnet - ayounsi@cumin2002"
- 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netboxdb2003.codfw.wmnet on all recursors
- 12:28 ayounsi@cumin2002: START - Cookbook sre.dns.wipe-cache netboxdb2003.codfw.wmnet on all recursors
- 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netboxdb2003.codfw.wmnet - ayounsi@cumin2002"
- 12:28 dcausse@deploy1002: Finished deploy [airflow-dags/search@7bb895a]: search: stop using api-ro.discovery.wmnet (duration: 00m 21s)
- 12:27 dcausse@deploy1002: Started deploy [airflow-dags/search@7bb895a]: search: stop using api-ro.discovery.wmnet
- 12:27 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netboxdb2003.codfw.wmnet - ayounsi@cumin2002"
- 12:24 ayounsi@cumin2002: START - Cookbook sre.dns.netbox
- 12:24 ayounsi@cumin2002: START - Cookbook sre.ganeti.makevm for new host netboxdb2003.codfw.wmnet
- 11:50 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host netboxdb1003.eqiad.wmnet
- 11:50 ayounsi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host netboxdb1003.eqiad.wmnet with OS bookworm
- 11:49 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 11:48 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 11:36 ayounsi@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netbox2003.codfw.wmnet
- 11:36 ayounsi@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netbox2003.codfw.wmnet with OS bookworm
- 11:29 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host netboxdb1003.eqiad.wmnet with OS bookworm
- 11:29 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
- 11:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netboxdb1003.eqiad.wmnet - ayounsi@cumin1002"
- 11:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
- 11:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
- 11:29 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
- 11:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 11:28 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
- 11:28 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netboxdb1003.eqiad.wmnet - ayounsi@cumin1002"
- 11:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netboxdb1003.eqiad.wmnet on all recursors
- 11:28 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netboxdb1003.eqiad.wmnet on all recursors
- 11:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netboxdb1003.eqiad.wmnet - ayounsi@cumin1002"
- 11:26 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netboxdb1003.eqiad.wmnet - ayounsi@cumin1002"
- 11:24 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 11:24 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netboxdb1003.eqiad.wmnet
- 11:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
- 11:14 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
- 11:13 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
- 11:12 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
- 11:02 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host netboxdb1003.eqiad.wmnet
- 11:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netboxdb1003.eqiad.wmnet on all recursors
- 11:02 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netboxdb1003.eqiad.wmnet on all recursors
- 11:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:00 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 11:00 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 10:58 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 10:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netboxdb1003.eqiad.wmnet on all recursors
- 10:58 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netboxdb1003.eqiad.wmnet on all recursors
- 10:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:57 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 10:56 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 10:53 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 10:53 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netboxdb1003.eqiad.wmnet
- 10:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netbox1003.eqiad.wmnet
- 10:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netbox1003.eqiad.wmnet with OS bookworm
- 10:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 10:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 10:47 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 10:41 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 10:40 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
- 10:40 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
- 10:39 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
- 10:39 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
- 10:37 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 10:36 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 10:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 10:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 10:27 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 10:12 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 10:12 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 10:01 sukhe: [end] authdns-update for sending BR to magru: T359054
- 10:00 sukhe: [start] authdns-update for sending BR to magru: T359054
- 09:54 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:54 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 09:53 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 09:53 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 09:45 ayounsi@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox2003.codfw.wmnet with reason: host reimage
- 09:42 ayounsi@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox2003.codfw.wmnet with reason: host reimage
- 09:36 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 09:33 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 09:31 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 09:28 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 09:25 ayounsi@cumin2002: START - Cookbook sre.hosts.reimage for host netbox2003.codfw.wmnet with OS bookworm
- 09:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox1003.eqiad.wmnet with reason: host reimage
- 09:23 jiji@deploy1002: Finished scap: Remove mcrouter container and exporter from mediawiki pods (duration: 04m 33s)
- 09:23 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox2003.codfw.wmnet - ayounsi@cumin2002"
- 09:22 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox2003.codfw.wmnet - ayounsi@cumin2002"
- 09:22 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox1003.eqiad.wmnet with reason: host reimage
- 09:22 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox2003.codfw.wmnet on all recursors
- 09:22 ayounsi@cumin2002: START - Cookbook sre.dns.wipe-cache netbox2003.codfw.wmnet on all recursors
- 09:22 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:22 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox2003.codfw.wmnet - ayounsi@cumin2002"
- 09:20 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox2003.codfw.wmnet - ayounsi@cumin2002"
- 09:19 jiji@deploy1002: Started scap sync-world: Remove mcrouter container and exporter from mediawiki pods
- 09:18 ayounsi@cumin2002: START - Cookbook sre.dns.netbox
- 09:18 ayounsi@cumin2002: START - Cookbook sre.ganeti.makevm for new host netbox2003.codfw.wmnet
- 09:13 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 09:12 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 09:11 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host netbox1003.eqiad.wmnet with OS bookworm
- 09:10 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox1003.eqiad.wmnet - ayounsi@cumin1002"
- 09:09 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox1003.eqiad.wmnet - ayounsi@cumin1002"
- 09:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox1003.eqiad.wmnet on all recursors
- 09:09 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netbox1003.eqiad.wmnet on all recursors
- 09:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox1003.eqiad.wmnet - ayounsi@cumin1002"
- 09:08 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox1003.eqiad.wmnet - ayounsi@cumin1002"
- 09:05 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 09:05 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netbox1003.eqiad.wmnet
- 09:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 09:04 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 09:02 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 09:00 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 08:57 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 08:57 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 08:55 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 08:55 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 08:46 elukey: cd /srv/git/private; git reset --hard HEAD^ on puppetserver1001 to remove my last local commit (test before migration of the private repo to puppetserver1001) - T368023
- 08:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 08:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 08:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T367856)', diff saved to https://phabricator.wikimedia.org/P66280 and previous config saved to /var/cache/conftool/dbconfig/20240711-084151-marostegui.json
- 08:30 hashar: Switched CI Quibble and Phan jobs based on PHP 8.1, 8.2 and 8.3 from Buster to Bullseye - T335766 T366799 T369146
- 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P66279 and previous config saved to /var/cache/conftool/dbconfig/20240711-082644-marostegui.json
- 08:15 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.13 refs T366958
- 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P66278 and previous config saved to /var/cache/conftool/dbconfig/20240711-081137-marostegui.json
- 08:05 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
- 07:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T367856)', diff saved to https://phabricator.wikimedia.org/P66277 and previous config saved to /var/cache/conftool/dbconfig/20240711-075630-marostegui.json
- 07:50 marostegui: Deploy schema change on s3 codfw db2127 dbmaint T367856
- 07:48 dcausse: closing the backport window
- 07:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Long schema change
- 07:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Long schema change
- 07:47 dcausse@deploy1002: Finished scap: Backport for Fix pool counter metric (duration: 09m 56s)
- 07:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2127 T369691', diff saved to https://phabricator.wikimedia.org/P66276 and previous config saved to /var/cache/conftool/dbconfig/20240711-074629-marostegui.json
- 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2205 to s3 primary T369691', diff saved to https://phabricator.wikimedia.org/P66275 and previous config saved to /var/cache/conftool/dbconfig/20240711-074534-marostegui.json
- 07:45 marostegui: Starting s3 codfw failover from db2127 to db2205 - T369691
- 07:42 dcausse@deploy1002: dcausse: Continuing with sync
- 07:41 dcausse@deploy1002: dcausse: Backport for Fix pool counter metric synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:37 dcausse@deploy1002: Started scap sync-world: Backport for Fix pool counter metric
- 07:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T369691
- 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2205 with weight 0 T369691', diff saved to https://phabricator.wikimedia.org/P66274 and previous config saved to /var/cache/conftool/dbconfig/20240711-073101-root.json
- 07:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s3 T369691
- 07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 07:28 jgiannelos@deploy1002: Finished scap: Backport for Linter: trigger parsoid parses on template changes (T361013) (duration: 14m 25s)
- 07:23 jgiannelos@deploy1002: daniel, jgiannelos: Continuing with sync
- 07:17 jgiannelos@deploy1002: daniel, jgiannelos: Backport for Linter: trigger parsoid parses on template changes (T361013) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:14 jgiannelos@deploy1002: Started scap sync-world: Backport for Linter: trigger parsoid parses on template changes (T361013)
- 07:12 kartik@deploy1002: Finished scap: Backport for Enable MinT for Wikipedia readers MVP on a second group of pilot wikis (T367067) (duration: 09m 32s)
- 07:07 kartik@deploy1002: kartik: Continuing with sync
- 07:05 kartik@deploy1002: kartik: Backport for Enable MinT for Wikipedia readers MVP on a second group of pilot wikis (T367067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:02 kartik@deploy1002: Started scap sync-world: Backport for Enable MinT for Wikipedia readers MVP on a second group of pilot wikis (T367067)
- 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66273 and previous config saved to /var/cache/conftool/dbconfig/20240711-070004-root.json
- 06:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 06:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 06:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 06:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 06:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T367781)', diff saved to https://phabricator.wikimedia.org/P66272 and previous config saved to /var/cache/conftool/dbconfig/20240711-065508-arnaudb.json
- 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367856)', diff saved to https://phabricator.wikimedia.org/P66271 and previous config saved to /var/cache/conftool/dbconfig/20240711-065432-marostegui.json
- 06:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66267 and previous config saved to /var/cache/conftool/dbconfig/20240711-062953-root.json
- 06:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P66266 and previous config saved to /var/cache/conftool/dbconfig/20240711-062454-arnaudb.json
- 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P66265 and previous config saved to /var/cache/conftool/dbconfig/20240711-062417-marostegui.json
- 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66264 and previous config saved to /var/cache/conftool/dbconfig/20240711-061447-root.json
- 06:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T367781)', diff saved to https://phabricator.wikimedia.org/P66263 and previous config saved to /var/cache/conftool/dbconfig/20240711-060947-arnaudb.json
- 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367856)', diff saved to https://phabricator.wikimedia.org/P66262 and previous config saved to /var/cache/conftool/dbconfig/20240711-060910-marostegui.json
- 06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T367781)', diff saved to https://phabricator.wikimedia.org/P66261 and previous config saved to /var/cache/conftool/dbconfig/20240711-060736-arnaudb.json
- 06:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2195.codfw.wmnet with reason: Maintenance
- 06:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2195.codfw.wmnet with reason: Maintenance
- 06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66260 and previous config saved to /var/cache/conftool/dbconfig/20240711-060714-arnaudb.json
- 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66259 and previous config saved to /var/cache/conftool/dbconfig/20240711-055942-root.json
- 05:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P66258 and previous config saved to /var/cache/conftool/dbconfig/20240711-055206-arnaudb.json
- 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66257 and previous config saved to /var/cache/conftool/dbconfig/20240711-054436-root.json
- 05:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P66256 and previous config saved to /var/cache/conftool/dbconfig/20240711-053659-arnaudb.json
- 05:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66255 and previous config saved to /var/cache/conftool/dbconfig/20240711-052931-root.json
- 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1163 T369514', diff saved to https://phabricator.wikimedia.org/P66254 and previous config saved to /var/cache/conftool/dbconfig/20240711-052702-root.json
- 05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1184 to s1 primary and set section read-write T369514', diff saved to https://phabricator.wikimedia.org/P66253 and previous config saved to /var/cache/conftool/dbconfig/20240711-052540-root.json
- 05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T369514', diff saved to https://phabricator.wikimedia.org/P66252 and previous config saved to /var/cache/conftool/dbconfig/20240711-052507-root.json
- 05:24 marostegui: Starting s1 eqiad failover from db1163 to db1184 - T369514
- 05:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66251 and previous config saved to /var/cache/conftool/dbconfig/20240711-052151-arnaudb.json
- 05:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66250 and previous config saved to /var/cache/conftool/dbconfig/20240711-051941-arnaudb.json
- 05:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Maintenance
- 05:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Maintenance
- 05:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66249 and previous config saved to /var/cache/conftool/dbconfig/20240711-051920-arnaudb.json
- 05:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P66248 and previous config saved to /var/cache/conftool/dbconfig/20240711-050413-arnaudb.json
- 04:59 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1184 from API/vslow/dump T369514', diff saved to https://phabricator.wikimedia.org/P66247 and previous config saved to /var/cache/conftool/dbconfig/20240711-045905-marostegui.json
- 04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369514
- 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1184 with weight 0 T369514', diff saved to https://phabricator.wikimedia.org/P66246 and previous config saved to /var/cache/conftool/dbconfig/20240711-045829-marostegui.json
- 04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369514
- 04:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P66245 and previous config saved to /var/cache/conftool/dbconfig/20240711-044905-arnaudb.json
- 04:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66244 and previous config saved to /var/cache/conftool/dbconfig/20240711-043358-arnaudb.json
- 04:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66243 and previous config saved to /var/cache/conftool/dbconfig/20240711-043147-arnaudb.json
- 04:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 04:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2167.codfw.wmnet with reason: Maintenance
- 04:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66242 and previous config saved to /var/cache/conftool/dbconfig/20240711-043124-arnaudb.json
- 04:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P66241 and previous config saved to /var/cache/conftool/dbconfig/20240711-041617-arnaudb.json
- 04:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P66240 and previous config saved to /var/cache/conftool/dbconfig/20240711-040110-arnaudb.json
- 03:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66239 and previous config saved to /var/cache/conftool/dbconfig/20240711-034603-arnaudb.json
- 03:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66238 and previous config saved to /var/cache/conftool/dbconfig/20240711-034352-arnaudb.json
- 03:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 03:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2166.codfw.wmnet with reason: Maintenance
- 03:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T367781)', diff saved to https://phabricator.wikimedia.org/P66237 and previous config saved to /var/cache/conftool/dbconfig/20240711-034330-arnaudb.json
- 03:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P66236 and previous config saved to /var/cache/conftool/dbconfig/20240711-032823-arnaudb.json
- 03:20 eileen: civicrm upgraded from 04cb9083 to 3287ced0
- 03:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P66235 and previous config saved to /var/cache/conftool/dbconfig/20240711-031316-arnaudb.json
- 03:08 eileen: civicrm upgraded from 2d1a0aad to 04cb9083
- 02:58 eileen: config revision changed from e02c3a85 to 540f27e6
- 02:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T367781)', diff saved to https://phabricator.wikimedia.org/P66234 and previous config saved to /var/cache/conftool/dbconfig/20240711-025809-arnaudb.json
- 02:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T367781)', diff saved to https://phabricator.wikimedia.org/P66233 and previous config saved to /var/cache/conftool/dbconfig/20240711-025558-arnaudb.json
- 02:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2165.codfw.wmnet with reason: Maintenance
- 02:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2165.codfw.wmnet with reason: Maintenance
- 02:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T367781)', diff saved to https://phabricator.wikimedia.org/P66232 and previous config saved to /var/cache/conftool/dbconfig/20240711-025537-arnaudb.json
- 02:48 eileen: civicrm upgraded from a17496a2 to 2d1a0aad
- 02:45 mutante: stewards2001 - sudo mv /srv/repos/users-db /root/ - run puppet and let it recreate the usersdb repo - this time pulling from gitlab - T369780 T369430
- 02:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P66231 and previous config saved to /var/cache/conftool/dbconfig/20240711-024030-arnaudb.json
- 02:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P66230 and previous config saved to /var/cache/conftool/dbconfig/20240711-022522-arnaudb.json
- 02:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1060.eqiad.wmnet with OS bookworm
- 02:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T367781)', diff saved to https://phabricator.wikimedia.org/P66229 and previous config saved to /var/cache/conftool/dbconfig/20240711-021015-arnaudb.json
- 02:08 eileen: civicrm upgraded from a03085ff to 1e2fcba3
- 02:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T367781)', diff saved to https://phabricator.wikimedia.org/P66228 and previous config saved to /var/cache/conftool/dbconfig/20240711-020805-arnaudb.json
- 02:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 02:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 02:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 02:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2164.codfw.wmnet with reason: Maintenance
- 02:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T367781)', diff saved to https://phabricator.wikimedia.org/P66227 and previous config saved to /var/cache/conftool/dbconfig/20240711-020738-arnaudb.json
- 01:54 eileen: config revision changed from 840e6b90 to e02c3a85
- 01:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P66226 and previous config saved to /var/cache/conftool/dbconfig/20240711-015231-arnaudb.json
- 01:46 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
- 01:43 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
- 01:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P66225 and previous config saved to /var/cache/conftool/dbconfig/20240711-013723-arnaudb.json
- 01:27 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1060.eqiad.wmnet with OS bookworm
- 01:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T367781)', diff saved to https://phabricator.wikimedia.org/P66224 and previous config saved to /var/cache/conftool/dbconfig/20240711-012216-arnaudb.json
- 01:21 mutante: gerrit-replica.wikimedia.org (gerrit2002) - switched firewall provider from iptables to nftables - all seems fine to me but just in case: gerrit:1053068 can be reverted to go back
- 01:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T367781)', diff saved to https://phabricator.wikimedia.org/P66223 and previous config saved to /var/cache/conftool/dbconfig/20240711-012006-arnaudb.json
- 01:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 01:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2163.codfw.wmnet with reason: Maintenance
- 01:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66222 and previous config saved to /var/cache/conftool/dbconfig/20240711-011944-arnaudb.json
- 01:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P66221 and previous config saved to /var/cache/conftool/dbconfig/20240711-010437-arnaudb.json
- 00:55 mutante: gerrit-replica.wikimedia.org (gerrit2002) - maintenance
- 00:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P66220 and previous config saved to /var/cache/conftool/dbconfig/20240711-004930-arnaudb.json
- 00:49 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on gerrit-replica.wikimedia.org with reason: switch firewall provider
- 00:49 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit-replica.wikimedia.org with reason: switch firewall provider
- 00:49 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2002.wikimedia.org with reason: switch firewall provider
- 00:48 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit2002.wikimedia.org with reason: switch firewall provider
- 00:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66219 and previous config saved to /var/cache/conftool/dbconfig/20240711-003423-arnaudb.json
- 00:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66218 and previous config saved to /var/cache/conftool/dbconfig/20240711-003212-arnaudb.json
- 00:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2162.codfw.wmnet with reason: Maintenance
- 00:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2162.codfw.wmnet with reason: Maintenance
- 00:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T367781)', diff saved to https://phabricator.wikimedia.org/P66217 and previous config saved to /var/cache/conftool/dbconfig/20240711-003150-arnaudb.json
- 00:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P66216 and previous config saved to /var/cache/conftool/dbconfig/20240711-001643-arnaudb.json
- 00:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P66215 and previous config saved to /var/cache/conftool/dbconfig/20240711-000136-arnaudb.json
2024-07-10
- 23:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T367781)', diff saved to https://phabricator.wikimedia.org/P66214 and previous config saved to /var/cache/conftool/dbconfig/20240710-234629-arnaudb.json
- 23:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T367781)', diff saved to https://phabricator.wikimedia.org/P66213 and previous config saved to /var/cache/conftool/dbconfig/20240710-234418-arnaudb.json
- 23:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 23:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2154.codfw.wmnet with reason: Maintenance
- 23:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T367781)', diff saved to https://phabricator.wikimedia.org/P66212 and previous config saved to /var/cache/conftool/dbconfig/20240710-234356-arnaudb.json
- 23:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T367856)', diff saved to https://phabricator.wikimedia.org/P66211 and previous config saved to /var/cache/conftool/dbconfig/20240710-233558-marostegui.json
- 23:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
- 23:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
- 23:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T367856)', diff saved to https://phabricator.wikimedia.org/P66210 and previous config saved to /var/cache/conftool/dbconfig/20240710-233535-marostegui.json
- 23:35 rzl: $ sudo cumin A:all-mw enable-puppet T367012
- 23:34 rzl@deploy1002: Finished scap: T367012 (duration: 07m 45s)
- 23:30 rzl@deploy1002: rzl: Continuing with sync
- 23:29 rzl@deploy1002: rzl: T367012 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 23:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P66209 and previous config saved to /var/cache/conftool/dbconfig/20240710-232849-arnaudb.json
- 23:27 rzl@deploy1002: Started scap sync-world: T367012
- 23:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P66208 and previous config saved to /var/cache/conftool/dbconfig/20240710-232028-marostegui.json
- 23:20 rzl: $ sudo cumin A:all-mw disable-puppet # T367012 - really just for the old mwdebug hosts
- 23:16 zabe@deploy1002: Finished scap: update interwiki cache (duration: 07m 32s)
- 23:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P66207 and previous config saved to /var/cache/conftool/dbconfig/20240710-231342-arnaudb.json
- 23:09 zabe@deploy1002: Started scap sync-world: update interwiki cache
- 23:08 zabe@deploy1002: Finished scap: T362529 (duration: 07m 44s)
- 23:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P66206 and previous config saved to /var/cache/conftool/dbconfig/20240710-230522-marostegui.json
- 23:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T367856)', diff saved to https://phabricator.wikimedia.org/P66205 and previous config saved to /var/cache/conftool/dbconfig/20240710-230130-marostegui.json
- 23:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2209.codfw.wmnet with reason: Maintenance
- 23:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2209.codfw.wmnet with reason: Maintenance
- 23:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367856)', diff saved to https://phabricator.wikimedia.org/P66204 and previous config saved to /var/cache/conftool/dbconfig/20240710-230107-marostegui.json
- 23:00 zabe@deploy1002: Started scap sync-world: T362529
- 23:00 zabe: Create Wikimedians of United Arab Emirates User Group Wiki # T362529
- 23:00 mutante: puppetserver1001 - fixing failed unit geoip_update_ipinfo.service
- 22:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T367781)', diff saved to https://phabricator.wikimedia.org/P66203 and previous config saved to /var/cache/conftool/dbconfig/20240710-225835-arnaudb.json
- 22:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T367781)', diff saved to https://phabricator.wikimedia.org/P66202 and previous config saved to /var/cache/conftool/dbconfig/20240710-225725-arnaudb.json
- 22:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 22:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2152.codfw.wmnet with reason: Maintenance
- 22:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 22:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 22:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367781)', diff saved to https://phabricator.wikimedia.org/P66201 and previous config saved to /var/cache/conftool/dbconfig/20240710-225647-arnaudb.json
- 22:53 mutante: puppetmaster1001 - remove Enterprise product ID from MaxMind downloads. sudo systemctl start geoip_update_ipinfo - T366272
- 22:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T367856)', diff saved to https://phabricator.wikimedia.org/P66200 and previous config saved to /var/cache/conftool/dbconfig/20240710-225015-marostegui.json
- 22:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P66199 and previous config saved to /var/cache/conftool/dbconfig/20240710-224559-marostegui.json
- 22:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P66198 and previous config saved to /var/cache/conftool/dbconfig/20240710-224140-arnaudb.json
- 22:35 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release
- 22:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P66197 and previous config saved to /var/cache/conftool/dbconfig/20240710-223052-marostegui.json
- 22:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P66196 and previous config saved to /var/cache/conftool/dbconfig/20240710-222633-arnaudb.json
- 22:25 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release
- 22:19 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release
- 22:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367856)', diff saved to https://phabricator.wikimedia.org/P66195 and previous config saved to /var/cache/conftool/dbconfig/20240710-221545-marostegui.json
- 22:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367781)', diff saved to https://phabricator.wikimedia.org/P66194 and previous config saved to /var/cache/conftool/dbconfig/20240710-221126-arnaudb.json
- 22:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T367781)', diff saved to https://phabricator.wikimedia.org/P66193 and previous config saved to /var/cache/conftool/dbconfig/20240710-221018-arnaudb.json
- 22:10 mutante: gitlab-replica-b.wikimedia.org - version upgrade in progress
- 22:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1226.eqiad.wmnet with reason: Maintenance
- 22:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1226.eqiad.wmnet with reason: Maintenance
- 22:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 22:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 22:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367781)', diff saved to https://phabricator.wikimedia.org/P66192 and previous config saved to /var/cache/conftool/dbconfig/20240710-220951-arnaudb.json
- 22:09 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
- 21:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P66191 and previous config saved to /var/cache/conftool/dbconfig/20240710-215444-arnaudb.json
- 21:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P66190 and previous config saved to /var/cache/conftool/dbconfig/20240710-213935-arnaudb.json
- 21:30 jdrewniak@deploy1002: Finished scap: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871) (duration: 11m 35s)
- 21:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367781)', diff saved to https://phabricator.wikimedia.org/P66188 and previous config saved to /var/cache/conftool/dbconfig/20240710-212427-arnaudb.json
- 21:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T367781)', diff saved to https://phabricator.wikimedia.org/P66187 and previous config saved to /var/cache/conftool/dbconfig/20240710-212319-arnaudb.json
- 21:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1214.eqiad.wmnet with reason: Maintenance
- 21:23 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
- 21:23 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1214.eqiad.wmnet with reason: Maintenance
- 21:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66186 and previous config saved to /var/cache/conftool/dbconfig/20240710-212257-arnaudb.json
- 21:18 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871)
- 21:17 jdrewniak@deploy1002: Sync cancelled.
- 21:17 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:10 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1096*,elastic1097*,elastic1106* for T348977 - bking@cumin2002
- 21:10 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1096*,elastic1097*,elastic1106* for T348977 - bking@cumin2002
- 21:09 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871)
- 21:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P66185 and previous config saved to /var/cache/conftool/dbconfig/20240710-210750-arnaudb.json
- 21:06 jdrewniak@deploy1002: Sync cancelled.
- 21:06 jdrewniak@deploy1002: jdrewniak, jdlrobson: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1096-1097,1106].eqiad.wmnet with reason: T348977
- 21:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1096-1097,1106].eqiad.wmnet with reason: T348977
- 20:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P66184 and previous config saved to /var/cache/conftool/dbconfig/20240710-205242-arnaudb.json
- 20:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66183 and previous config saved to /var/cache/conftool/dbconfig/20240710-203735-arnaudb.json
- 20:37 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871)
- 20:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66182 and previous config saved to /var/cache/conftool/dbconfig/20240710-203627-arnaudb.json
- 20:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1211.eqiad.wmnet with reason: Maintenance
- 20:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1211.eqiad.wmnet with reason: Maintenance
- 20:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367781)', diff saved to https://phabricator.wikimedia.org/P66181 and previous config saved to /var/cache/conftool/dbconfig/20240710-203605-arnaudb.json
- 20:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P66180 and previous config saved to /var/cache/conftool/dbconfig/20240710-202057-arnaudb.json
- 20:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P66179 and previous config saved to /var/cache/conftool/dbconfig/20240710-200550-arnaudb.json
- 19:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367781)', diff saved to https://phabricator.wikimedia.org/P66178 and previous config saved to /var/cache/conftool/dbconfig/20240710-195043-arnaudb.json
- 19:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T367781)', diff saved to https://phabricator.wikimedia.org/P66177 and previous config saved to /var/cache/conftool/dbconfig/20240710-194935-arnaudb.json
- 19:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 19:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 19:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66176 and previous config saved to /var/cache/conftool/dbconfig/20240710-194913-arnaudb.json
- 19:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P66174 and previous config saved to /var/cache/conftool/dbconfig/20240710-193406-arnaudb.json
- 19:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P66173 and previous config saved to /var/cache/conftool/dbconfig/20240710-191859-arnaudb.json
- 19:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66172 and previous config saved to /var/cache/conftool/dbconfig/20240710-190352-arnaudb.json
- 19:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66171 and previous config saved to /var/cache/conftool/dbconfig/20240710-190244-arnaudb.json
- 19:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1193.eqiad.wmnet with reason: Maintenance
- 19:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1193.eqiad.wmnet with reason: Maintenance
- 19:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66170 and previous config saved to /var/cache/conftool/dbconfig/20240710-190222-arnaudb.json
- 18:56 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 18:56 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 18:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P66169 and previous config saved to /var/cache/conftool/dbconfig/20240710-184714-arnaudb.json
- 18:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add 4 new IPs (2 eqiad, 2 codfw) for wdqs graph split - ryankemper@cumin2002"
- 18:43 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add 4 new IPs (2 eqiad, 2 codfw) for wdqs graph split - ryankemper@cumin2002"
- 18:35 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 18:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P66168 and previous config saved to /var/cache/conftool/dbconfig/20240710-183207-arnaudb.json
- 18:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66166 and previous config saved to /var/cache/conftool/dbconfig/20240710-181700-arnaudb.json
- 17:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66164 and previous config saved to /var/cache/conftool/dbconfig/20240710-171644-arnaudb.json
- 17:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 17:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 17:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66163 and previous config saved to /var/cache/conftool/dbconfig/20240710-171622-arnaudb.json
- 17:01 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 100%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66162 and previous config saved to /var/cache/conftool/dbconfig/20240710-170143-arnaudb.json
- 17:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P66161 and previous config saved to /var/cache/conftool/dbconfig/20240710-170115-arnaudb.json
- 16:46 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 75%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66160 and previous config saved to /var/cache/conftool/dbconfig/20240710-164637-arnaudb.json
- 16:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P66159 and previous config saved to /var/cache/conftool/dbconfig/20240710-164608-arnaudb.json
- 16:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P66158 and previous config saved to /var/cache/conftool/dbconfig/20240710-164225-ladsgroup.json
- 16:31 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 50%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66157 and previous config saved to /var/cache/conftool/dbconfig/20240710-163131-arnaudb.json
- 16:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66156 and previous config saved to /var/cache/conftool/dbconfig/20240710-163100-arnaudb.json
- 16:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66155 and previous config saved to /var/cache/conftool/dbconfig/20240710-162952-arnaudb.json
- 16:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 16:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367781)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240710-162926-arnaudb.json
- 16:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P66153 and previous config saved to /var/cache/conftool/dbconfig/20240710-162718-ladsgroup.json
- 16:17 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 16:17 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 16:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 25%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66152 and previous config saved to /var/cache/conftool/dbconfig/20240710-161626-arnaudb.json
- 16:14 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary (T368083)
- 16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P66151 and previous config saved to /var/cache/conftool/dbconfig/20240710-161419-arnaudb.json
- 16:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P66150 and previous config saved to /var/cache/conftool/dbconfig/20240710-161211-ladsgroup.json
- 16:11 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary (T368083)
- 16:08 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic2 (T368083)
- 16:05 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic2 (T368083)
- 16:03 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic1 (T368083)
- 16:01 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 10%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66149 and previous config saved to /var/cache/conftool/dbconfig/20240710-160120-arnaudb.json
- 16:01 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 16:00 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 16:00 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic1 (T368083)
- 15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P66148 and previous config saved to /var/cache/conftool/dbconfig/20240710-155911-arnaudb.json
- 15:59 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 15:58 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 15:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P66147 and previous config saved to /var/cache/conftool/dbconfig/20240710-155703-ladsgroup.json
- 15:55 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic2-eqsin (T368083)
- 15:54 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic2-eqsin (T368083)
- 15:53 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 15:53 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
- 15:49 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic1-eqsin (T368083)
- 15:48 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic1-eqsin (T368083)
- 15:48 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 15:48 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 15:46 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 5%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66146 and previous config saved to /var/cache/conftool/dbconfig/20240710-154615-arnaudb.json
- 15:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66145 and previous config saved to /var/cache/conftool/dbconfig/20240710-154404-arnaudb.json
- 15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66144 and previous config saved to /var/cache/conftool/dbconfig/20240710-154256-arnaudb.json
- 15:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 15:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T367781)', diff saved to https://phabricator.wikimedia.org/P66143 and previous config saved to /var/cache/conftool/dbconfig/20240710-154234-arnaudb.json
- 15:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Shutting down to investigate RAM issue
- 15:36 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Shutting down to investigate RAM issue
- 15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P66142 and previous config saved to /var/cache/conftool/dbconfig/20240710-152727-arnaudb.json
- 15:27 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic1 (T368083)
- 15:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1233 from groups', diff saved to https://phabricator.wikimedia.org/P66141 and previous config saved to /var/cache/conftool/dbconfig/20240710-152616-ladsgroup.json
- 15:24 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic1 (T368083)
- 15:24 vgutierrez: rolling restart of high-traffic1 LVSs to switch ncredir to maglev - T368083
- 15:24 topranks: rebooting lsw1-e1-eqiad to install updated JunOS version T365993
- 15:24 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 26 hosts with reason: JunOS upgrade lsw1-e1-eqiad
- 15:23 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 26 hosts with reason: JunOS upgrade lsw1-e1-eqiad
- 15:23 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad,lsw1-e1-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e1-eqiad
- 15:23 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad,lsw1-e1-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e1-eqiad
- 15:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary (T368083)
- 15:16 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary (T368083)
- 15:14 vgutierrez: rolling restart of secondary LVSs to switch ncredir to maglev - T368083
- 15:13 elukey: restart turnilo on an-tool1007
- 15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P66140 and previous config saved to /var/cache/conftool/dbconfig/20240710-151219-arnaudb.json
- 14:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T367856)', diff saved to https://phabricator.wikimedia.org/P66139 and previous config saved to /var/cache/conftool/dbconfig/20240710-145807-marostegui.json
- 14:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
- 14:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
- 14:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66138 and previous config saved to /var/cache/conftool/dbconfig/20240710-145744-marostegui.json
- 14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T367781)', diff saved to https://phabricator.wikimedia.org/P66137 and previous config saved to /var/cache/conftool/dbconfig/20240710-145712-arnaudb.json
- 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1104*,elastic1089*,elastic1090* for T365993 - cmooney@cumin1002
- 14:55 cmooney@cumin1002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1104*,elastic1089*,elastic1090* for T365993 - cmooney@cumin1002
- 14:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66136 and previous config saved to /var/cache/conftool/dbconfig/20240710-144237-marostegui.json
- 14:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T367856)', diff saved to https://phabricator.wikimedia.org/P66135 and previous config saved to /var/cache/conftool/dbconfig/20240710-143713-marostegui.json
- 14:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
- 14:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
- 14:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T367856)', diff saved to https://phabricator.wikimedia.org/P66134 and previous config saved to /var/cache/conftool/dbconfig/20240710-143651-marostegui.json
- 14:34 cmooney@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic1104,elastic1089,elastic1090 for ban elastic nodes before switch upgrade rack E1 - cmooney@cumin1002 - T365993
- 14:34 cmooney@cumin1002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1104,elastic1089,elastic1090 for ban elastic nodes before switch upgrade rack E1 - cmooney@cumin1002 - T365993
- 14:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
- 14:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
- 14:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
- 14:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
- 14:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
- 14:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
- 14:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66133 and previous config saved to /var/cache/conftool/dbconfig/20240710-142730-marostegui.json
- 14:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P66132 and previous config saved to /var/cache/conftool/dbconfig/20240710-142144-marostegui.json
- 14:21 kamila@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 14:20 kamila@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 14:20 kamila@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 14:19 kamila@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 14:19 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
- 14:19 kamila@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
- 14:16 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:15 effie: disable puppet on mw memcached hosts - T352885
- 14:13 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:13 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66131 and previous config saved to /var/cache/conftool/dbconfig/20240710-141222-marostegui.json
- 14:11 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:11 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:10 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:08 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on lsw1-e1-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e1-eqiad
- 14:08 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on lsw1-e1-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e1-eqiad
- 14:07 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P66130 and previous config saved to /var/cache/conftool/dbconfig/20240710-140637-marostegui.json
- 14:06 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:06 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:05 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:05 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:04 XioNoX: add ipxe_1.21.1+git-20240627.b66e27d to bookworm-wikimedia reprepro
- 14:04 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 14:04 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1010.eqiad.wmnet,db1190.eqiad.wmnet,dbproxy1026.eqiad.wmnet with reason: T365993
- 14:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1010.eqiad.wmnet,db1190.eqiad.wmnet,dbproxy1026.eqiad.wmnet with reason: T365993
- 14:02 arnaudb@cumin1002: dbctl commit (dc=all): 'T365993 - depool db1190 - s4', diff saved to https://phabricator.wikimedia.org/P66129 and previous config saved to /var/cache/conftool/dbconfig/20240710-140224-arnaudb.json
- 13:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T367781)', diff saved to https://phabricator.wikimedia.org/P66128 and previous config saved to /var/cache/conftool/dbconfig/20240710-135656-arnaudb.json
- 13:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1172.eqiad.wmnet with reason: Maintenance
- 13:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
- 13:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1172.eqiad.wmnet with reason: Maintenance
- 13:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 13:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
- 13:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 13:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66127 and previous config saved to /var/cache/conftool/dbconfig/20240710-135619-arnaudb.json
- 13:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 13:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T367856)', diff saved to https://phabricator.wikimedia.org/P66126 and previous config saved to /var/cache/conftool/dbconfig/20240710-135130-marostegui.json
- 13:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 13:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 13:46 akosiaris@cumin1002: conftool action : set/pooled=inactive; selector: name=kubernetes1059.*
- 13:44 btullis: re-enabling the misc dumps jobs on snapshot1017 with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1053315
- 13:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P66125 and previous config saved to /var/cache/conftool/dbconfig/20240710-134112-arnaudb.json
- 13:34 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 13:34 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1001.eqiad.wmnet with OS bookworm
- 13:33 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 13:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P66124 and previous config saved to /var/cache/conftool/dbconfig/20240710-132604-arnaudb.json
- 13:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
- 13:15 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
- 13:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66123 and previous config saved to /var/cache/conftool/dbconfig/20240710-131057-arnaudb.json
- 13:01 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-mariadb1001.eqiad.wmnet with OS bookworm
- 12:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66122 and previous config saved to /var/cache/conftool/dbconfig/20240710-125928-root.json
- 12:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66121 and previous config saved to /var/cache/conftool/dbconfig/20240710-124422-root.json
- 12:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66120 and previous config saved to /var/cache/conftool/dbconfig/20240710-123844-arnaudb.json
- 12:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 12:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 12:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 12:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 12:30 topranks: removing unused wmcs vlans from asw2-b-eqiad
- 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66119 and previous config saved to /var/cache/conftool/dbconfig/20240710-122917-root.json
- 12:23 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
- 12:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
- 12:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
- 12:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
- 12:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
- 12:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
- 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66118 and previous config saved to /var/cache/conftool/dbconfig/20240710-121411-root.json
- 11:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66117 and previous config saved to /var/cache/conftool/dbconfig/20240710-115906-root.json
- 11:53 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
- 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Pool db2136 into api with small weight T365805', diff saved to https://phabricator.wikimedia.org/P66116 and previous config saved to /var/cache/conftool/dbconfig/20240710-115046-marostegui.json
- 11:50 claime: cleaned up leftover media files on videoscalers
- 11:50 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
- 11:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66115 and previous config saved to /var/cache/conftool/dbconfig/20240710-114401-root.json
- 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P66114 and previous config saved to /var/cache/conftool/dbconfig/20240710-113010-ladsgroup.json
- 11:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 11:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 11:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66113 and previous config saved to /var/cache/conftool/dbconfig/20240710-112856-root.json
- 11:22 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 41s)
- 11:21 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
- 10:43 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
- 10:43 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
- 10:43 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
- 10:43 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
- 10:42 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
- 10:39 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
- 10:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
- 10:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
- 10:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 10:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 10:29 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
- 10:26 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 04s)
- 10:26 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
- 10:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1208.eqiad.wmnet with reason: corruption issue
- 10:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1208.eqiad.wmnet with reason: corruption issue
- 10:21 jiji@deploy1002: Finished scap: Switch mediawiki everywhere to use node-local mcrouter ds - T346690 (duration: 05m 15s)
- 10:15 jiji@deploy1002: Started scap sync-world: Switch mediawiki everywhere to use node-local mcrouter ds - T346690
- 09:29 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 08:51 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.13 refs T366958
- 08:41 hashar: On deployment server, unblocked train by manually editing /var/lib/scap/scap/lib/python3.7/site-packages/scap/train.py to allow train blocker task with "progress" status instead of just "open" # T369689
- 08:08 kostajh: UTC morning deploys done
- 08:06 kharlan@deploy1002: Finished scap: Backport for ConfirmEdit: Enable showcaptcha action on testwiki and beta wikis (T20110) (duration: 09m 41s)
- 08:00 kharlan@deploy1002: kharlan: Continuing with sync
- 07:59 kharlan@deploy1002: kharlan: Backport for ConfirmEdit: Enable showcaptcha action on testwiki and beta wikis (T20110) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:57 kharlan@deploy1002: Started scap sync-world: Backport for ConfirmEdit: Enable showcaptcha action on testwiki and beta wikis (T20110)
- 07:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2025.codfw.wmnet
- 07:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2024.codfw.wmnet
- 07:36 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2025.codfw.wmnet
- 07:36 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2024.codfw.wmnet
- 07:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2023.codfw.wmnet
- 07:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2020.codfw.wmnet
- 07:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2021.codfw.wmnet
- 07:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2022.codfw.wmnet
- 07:28 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2023.codfw.wmnet
- 07:27 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2020.codfw.wmnet
- 07:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2021.codfw.wmnet
- 07:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2022.codfw.wmnet
- 07:22 kostajh: UTC morning deploys done
- 07:20 kharlan@deploy1002: Finished scap: Backport for IPReputation: Enable extension on testwiki (T360067) (duration: 14m 05s)
- 07:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2019.codfw.wmnet
- 07:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2018.codfw.wmnet
- 07:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2017.codfw.wmnet
- 07:15 kharlan@deploy1002: kharlan: Continuing with sync
- 07:11 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2018.codfw.wmnet
- 07:11 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2019.codfw.wmnet
- 07:09 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2017.codfw.wmnet
- 07:09 kharlan@deploy1002: kharlan: Backport for IPReputation: Enable extension on testwiki (T360067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:08 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2016.codfw.wmnet
- 07:06 kharlan@deploy1002: Started scap sync-world: Backport for IPReputation: Enable extension on testwiki (T360067)
- 07:02 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2016.codfw.wmnet
- 07:01 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2015.codfw.wmnet
- 06:58 XioNoX: push policy-statement BGP_agg_net_pops to all CRs (noop as it's not applied there) - T367439
- 06:54 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2015.codfw.wmnet
- 06:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 17072
- 06:52 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 17072
- 06:52 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2014.codfw.wmnet
- 06:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2013.codfw.wmnet
- 06:29 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2013.codfw.wmnet
- 06:28 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2012.codfw.wmnet
- 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66110 and previous config saved to /var/cache/conftool/dbconfig/20240710-062424-marostegui.json
- 06:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
- 06:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
- 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66109 and previous config saved to /var/cache/conftool/dbconfig/20240710-062401-marostegui.json
- 06:22 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet
- 06:16 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host wdqs2012.codfw.wmnet
- 06:15 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet
- 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66108 and previous config saved to /var/cache/conftool/dbconfig/20240710-060854-marostegui.json
- 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66107 and previous config saved to /var/cache/conftool/dbconfig/20240710-055347-marostegui.json
- 05:49 marostegui: Deploy schema change on s5 eqiad db1183 dbmaint T367856
- 05:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Long schema change
- 05:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Long schema change
- 05:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1183 T369616', diff saved to https://phabricator.wikimedia.org/P66106 and previous config saved to /var/cache/conftool/dbconfig/20240710-054710-root.json
- 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1230 to s5 primary and set section read-write T369616', diff saved to https://phabricator.wikimedia.org/P66105 and previous config saved to /var/cache/conftool/dbconfig/20240710-054621-marostegui.json
- 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T369616', diff saved to https://phabricator.wikimedia.org/P66104 and previous config saved to /var/cache/conftool/dbconfig/20240710-054559-marostegui.json
- 05:45 marostegui: Starting s5 eqiad failover from db1183 to db1230 - T369616
- 05:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66103 and previous config saved to /var/cache/conftool/dbconfig/20240710-053840-marostegui.json
- 05:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369616
- 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1230 with weight 0 T369616', diff saved to https://phabricator.wikimedia.org/P66102 and previous config saved to /var/cache/conftool/dbconfig/20240710-053009-root.json
- 05:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369616
- 05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T367856)', diff saved to https://phabricator.wikimedia.org/P66101 and previous config saved to /var/cache/conftool/dbconfig/20240710-052520-marostegui.json
- 05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 05:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
- 05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
- 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T367856)', diff saved to https://phabricator.wikimedia.org/P66100 and previous config saved to /var/cache/conftool/dbconfig/20240710-052443-marostegui.json
- 05:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P66099 and previous config saved to /var/cache/conftool/dbconfig/20240710-050935-marostegui.json
- 04:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P66098 and previous config saved to /var/cache/conftool/dbconfig/20240710-045428-marostegui.json
- 04:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T367856)', diff saved to https://phabricator.wikimedia.org/P66097 and previous config saved to /var/cache/conftool/dbconfig/20240710-043921-marostegui.json
- 03:22 eileen: tools upgraded from 95f10b20 to 94bac5c6
2024-07-09
- 22:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66096 and previous config saved to /var/cache/conftool/dbconfig/20240709-223336-marostegui.json
- 22:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 22:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 22:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367856)', diff saved to https://phabricator.wikimedia.org/P66095 and previous config saved to /var/cache/conftool/dbconfig/20240709-223314-marostegui.json
- 22:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66094 and previous config saved to /var/cache/conftool/dbconfig/20240709-221807-marostegui.json
- 22:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66093 and previous config saved to /var/cache/conftool/dbconfig/20240709-220300-marostegui.json
- 21:50 ejegg: payments-wiki upgraded from dc0c14d4 to 4e48059a (and ingenico config removed)
- 21:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367856)', diff saved to https://phabricator.wikimedia.org/P66092 and previous config saved to /var/cache/conftool/dbconfig/20240709-214752-marostegui.json
- 21:24 ejegg: fundraising civicrm upgraded from 84d6f5d1 to a03085ff
- 21:18 urbanecm@deploy1002: Finished scap: Backport for use text() instead of escaped() for msg recentchanges (T352626) (duration: 21m 50s)
- 21:13 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 21:13 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
- 21:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T352010)', diff saved to https://phabricator.wikimedia.org/P66091 and previous config saved to /var/cache/conftool/dbconfig/20240709-211231-ladsgroup.json
- 21:12 urbanecm@deploy1002: gergesshamon, urbanecm: Continuing with sync
- 21:00 urbanecm@deploy1002: gergesshamon, urbanecm: Backport for use text() instead of escaped() for msg recentchanges (T352626) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P66090 and previous config saved to /var/cache/conftool/dbconfig/20240709-205724-ladsgroup.json
- 20:56 urbanecm@deploy1002: Started scap sync-world: Backport for use text() instead of escaped() for msg recentchanges (T352626)
- 20:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P66089 and previous config saved to /var/cache/conftool/dbconfig/20240709-204217-ladsgroup.json
- 20:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T352010)', diff saved to https://phabricator.wikimedia.org/P66088 and previous config saved to /var/cache/conftool/dbconfig/20240709-202709-ladsgroup.json
- 20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T367856)', diff saved to https://phabricator.wikimedia.org/P66087 and previous config saved to /var/cache/conftool/dbconfig/20240709-201928-marostegui.json
- 20:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
- 20:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
- 20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T367856)', diff saved to https://phabricator.wikimedia.org/P66086 and previous config saved to /var/cache/conftool/dbconfig/20240709-201906-marostegui.json
- 20:16 urbanecm@deploy1002: Finished scap: Backport for Missing.php: check REQUEST_URI in addition to PATH_INFO (T9496 T355018) (duration: 13m 01s)
- 20:10 urbanecm@deploy1002: urbanecm, pppery: Continuing with sync
- 20:07 urbanecm@deploy1002: urbanecm, pppery: Backport for Missing.php: check REQUEST_URI in addition to PATH_INFO (T9496 T355018) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P66084 and previous config saved to /var/cache/conftool/dbconfig/20240709-200359-marostegui.json
- 20:03 urbanecm@deploy1002: Started scap sync-world: Backport for Missing.php: check REQUEST_URI in addition to PATH_INFO (T9496 T355018)
- 19:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P66083 and previous config saved to /var/cache/conftool/dbconfig/20240709-194851-marostegui.json
- 19:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T367856)', diff saved to https://phabricator.wikimedia.org/P66082 and previous config saved to /var/cache/conftool/dbconfig/20240709-193344-marostegui.json
- 17:14 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 17:13 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 17:11 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 17:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 17:11 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 17:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T368950
- 17:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T368950
- 16:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66080 and previous config saved to /var/cache/conftool/dbconfig/20240709-165921-root.json
- 16:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66079 and previous config saved to /var/cache/conftool/dbconfig/20240709-165746-root.json
- 16:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66078 and previous config saved to /var/cache/conftool/dbconfig/20240709-165738-root.json
- 16:57 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 16:57 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 16:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66077 and previous config saved to /var/cache/conftool/dbconfig/20240709-164415-root.json
- 16:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66076 and previous config saved to /var/cache/conftool/dbconfig/20240709-164241-root.json
- 16:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66075 and previous config saved to /var/cache/conftool/dbconfig/20240709-164233-root.json
- 16:40 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 16:40 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 16:30 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a203f30c] (duration: 03m 41s)
- 16:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66074 and previous config saved to /var/cache/conftool/dbconfig/20240709-162909-root.json
- 16:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66073 and previous config saved to /var/cache/conftool/dbconfig/20240709-162735-root.json
- 16:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66072 and previous config saved to /var/cache/conftool/dbconfig/20240709-162727-root.json
- 16:26 btullis@deploy1002: Started deploy [analytics/refinery@a203f30] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a203f30c]
- 16:25 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30] (thin): Regular analytics weekly train THIN [analytics/refinery@a203f30c] (duration: 04m 05s)
- 16:21 btullis@deploy1002: Started deploy [analytics/refinery@a203f30] (thin): Regular analytics weekly train THIN [analytics/refinery@a203f30c]
- 16:20 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c] (duration: 01m 18s)
- 16:19 btullis@deploy1002: Started deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c]
- 16:19 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c] (duration: 04m 51s)
- 16:14 btullis@deploy1002: Started deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c]
- 16:14 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c] (duration: 09m 23s)
- 16:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66071 and previous config saved to /var/cache/conftool/dbconfig/20240709-161404-root.json
- 16:14 btullis: pooled druid1010
- 16:13 btullis: unset noout mode on the cephosd cluster
- 16:13 btullis: uncordoned dse-k8s-worker1006
- 16:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66070 and previous config saved to /var/cache/conftool/dbconfig/20240709-161230-root.json
- 16:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66069 and previous config saved to /var/cache/conftool/dbconfig/20240709-161222-root.json
- 16:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 16:04 btullis@deploy1002: Started deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c]
- 15:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66068 and previous config saved to /var/cache/conftool/dbconfig/20240709-155858-root.json
- 15:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66067 and previous config saved to /var/cache/conftool/dbconfig/20240709-155724-root.json
- 15:57 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 15:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66066 and previous config saved to /var/cache/conftool/dbconfig/20240709-155717-root.json
- 15:56 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
- 15:46 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
- 15:44 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 15:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
- 15:44 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 15:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66065 and previous config saved to /var/cache/conftool/dbconfig/20240709-154353-root.json
- 15:42 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
- 15:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66064 and previous config saved to /var/cache/conftool/dbconfig/20240709-154219-root.json
- 15:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66063 and previous config saved to /var/cache/conftool/dbconfig/20240709-154211-root.json
- 15:41 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
- 15:41 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
- 15:39 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
- 15:38 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
- 15:35 sukhe: remove traffic-dnsbox VM on cloud-vps: T360710
- 15:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66062 and previous config saved to /var/cache/conftool/dbconfig/20240709-152847-root.json
- 15:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts
- 15:27 hnowlan@cumin1002: START - Cookbook sre.hosts.remove-downtime for 9 hosts
- 15:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66061 and previous config saved to /var/cache/conftool/dbconfig/20240709-152713-root.json
- 15:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66060 and previous config saved to /var/cache/conftool/dbconfig/20240709-152706-root.json
- 15:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
- 15:12 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
- 15:11 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
- 15:08 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
- 15:04 topranks: rebooting lsw1-e3-eqiad to install updated JunOS version T365998
- 15:03 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 27 hosts with reason: JunOS upgrade lsw1-e3-eqiad
- 15:02 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on 27 hosts with reason: JunOS upgrade lsw1-e3-eqiad
- 15:01 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 9 hosts with reason: network maintenance
- 15:01 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on 9 hosts with reason: network maintenance
- 15:00 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e3-eqiad,lsw1-e3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e3-eqiad
- 14:59 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e3-eqiad,lsw1-e3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e3-eqiad
- 14:54 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 14:53 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 14:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e3-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e3-eqiad
- 14:53 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e3-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e3-eqiad
- 14:50 hashar: Restart Gerrit primary on gerrit1003 to apply a configuration change | T367505
- 14:46 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
- 14:46 hashar@deploy1002: Finished deploy [integration/docroot@c8b0266]: (no justification provided) (duration: 00m 07s)
- 14:46 hashar@deploy1002: Started deploy [integration/docroot@c8b0266]: (no justification provided)
- 14:45 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1003.eqiad.wmnet
- 14:43 Lucas_WMDE: UTC afternoon backport+config window done
- 14:40 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
- 14:40 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
- 14:38 sukhe: dummy authdns-update
- 14:38 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-coord1003.eqiad.wmnet
- 14:37 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2003.codfw.wmnet
- 14:37 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for wmfRenderEmptyGraphTag: Fix count() warning (T369600) (duration: 14m 35s)
- 14:35 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
- 14:32 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
- 14:32 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
- 14:29 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for wmfRenderEmptyGraphTag: Fix count() warning (T369600) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:28 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd2003.codfw.wmnet
- 14:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2002.codfw.wmnet
- 14:27 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
- 14:26 hnowlan@cumin1002: conftool action : set/pooled=inactive; selector: name=(kubernetes1061.eqiad.wmnet|kubernetes1048.eqiad.wmnet|kubernetes1047.eqiad.wmnet|kubernetes1049.eqiad.wmnet|kubernetes1050.eqiad.wmnet|kubernetes1051.eqiad.wmnet|mw1491.eqiad.wmnet|mw1492.eqiad.wmnet|mw1493.eqiad.wmnet),cluster=kubernetes,service=kubesvc
- 14:26 hnowlan: kubectl drain kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet mw1492.eqiad.wmnet mw1492.eqiad.wmnet (T365995)
- 14:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
- 14:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for wmfRenderEmptyGraphTag: Fix count() warning (T369600)
- 14:21 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd2002.codfw.wmnet
- 14:21 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2001.codfw.wmnet
- 14:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Re-introduce notices (T369053) (duration: 39m 17s)
- 14:15 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
- 14:13 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
- 14:12 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
- 14:12 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd2001.codfw.wmnet
- 14:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, mlitn: Continuing with sync
- 14:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, mlitn: Backport for Re-introduce notices (T369053) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 14:03 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1080.eqiad.wmnet
- 14:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T352010)', diff saved to https://phabricator.wikimedia.org/P66059 and previous config saved to /var/cache/conftool/dbconfig/20240709-140033-ladsgroup.json
- 14:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 14:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 13:59 XioNoX: netbox-deploy - rebase the dev branch into main
- 13:41 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1080.eqiad.wmnet
- 13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Re-introduce notices (T369053)
- 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T367856)', diff saved to https://phabricator.wikimedia.org/P66058 and previous config saved to /var/cache/conftool/dbconfig/20240709-133450-marostegui.json
- 13:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 13:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367856)', diff saved to https://phabricator.wikimedia.org/P66057 and previous config saved to /var/cache/conftool/dbconfig/20240709-133428-marostegui.json
- 13:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66056 and previous config saved to /var/cache/conftool/dbconfig/20240709-131921-marostegui.json
- 13:16 sukhe: dummy authdns-update run
- 13:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add $wgMaxShellWallClockTime setting for shellbox (T356241) (duration: 08m 28s)
- 13:08 logmsgbot: lucaswerkmeister-wmde@deploy1002 kamila, lucaswerkmeister-wmde: Continuing with sync
- 13:08 logmsgbot: lucaswerkmeister-wmde@deploy1002 kamila, lucaswerkmeister-wmde: Backport for Add $wgMaxShellWallClockTime setting for shellbox (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Add $wgMaxShellWallClockTime setting for shellbox (T356241)
- 13:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66055 and previous config saved to /var/cache/conftool/dbconfig/20240709-130414-marostegui.json
- 12:59 hashar: Restart Gerrit replica on gerrit2002 to apply a configuration change | T367505
- 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367856)', diff saved to https://phabricator.wikimedia.org/P66054 and previous config saved to /var/cache/conftool/dbconfig/20240709-124907-marostegui.json
- 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66053 and previous config saved to /var/cache/conftool/dbconfig/20240709-120440-root.json
- 12:01 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lists1001.wikimedia.org
- 12:01 eoghan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:01 eoghan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - eoghan@cumin1002"
- 11:59 eoghan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - eoghan@cumin1002"
- 11:54 eoghan@cumin1002: START - Cookbook sre.dns.netbox
- 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66052 and previous config saved to /var/cache/conftool/dbconfig/20240709-114935-root.json
- 11:45 eoghan@cumin1002: START - Cookbook sre.hosts.decommission for hosts lists1001.wikimedia.org
- 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66051 and previous config saved to /var/cache/conftool/dbconfig/20240709-113430-root.json
- 11:28 eoghan: Decommissioning lists1001 T331706
- 11:26 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
- 11:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66050 and previous config saved to /var/cache/conftool/dbconfig/20240709-112611-root.json
- 11:19 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66049 and previous config saved to /var/cache/conftool/dbconfig/20240709-111925-root.json
- 11:18 btullis: depooled druid1010 for T365995
- 11:17 btullis: set cephosd cluster into noout mode to prevent rebalancing for T365995
- 11:16 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
- 11:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 11:15 btullis: drained dse-k8s-worker1006.eqiad.wmnet ready for T365995
- 11:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 11:13 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:12 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:11 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66048 and previous config saved to /var/cache/conftool/dbconfig/20240709-111105-root.json
- 11:10 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 11:10 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 11:04 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66047 and previous config saved to /var/cache/conftool/dbconfig/20240709-110420-root.json
- 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66046 and previous config saved to /var/cache/conftool/dbconfig/20240709-105600-root.json
- 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T367856)', diff saved to https://phabricator.wikimedia.org/P66045 and previous config saved to /var/cache/conftool/dbconfig/20240709-105454-marostegui.json
- 10:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
- 10:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
- 10:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66044 and previous config saved to /var/cache/conftool/dbconfig/20240709-104914-root.json
- 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66043 and previous config saved to /var/cache/conftool/dbconfig/20240709-104054-root.json
- 10:37 Dreamy_Jazz: Finished running maintenance scripts for T366781
- 10:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66042 and previous config saved to /var/cache/conftool/dbconfig/20240709-103409-root.json
- 10:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2212 T369515', diff saved to https://phabricator.wikimedia.org/P66041 and previous config saved to /var/cache/conftool/dbconfig/20240709-103331-root.json
- 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2203 to s1 primary T369515', diff saved to https://phabricator.wikimedia.org/P66040 and previous config saved to /var/cache/conftool/dbconfig/20240709-103238-root.json
- 10:32 marostegui: Starting s1 codfw failover from db2212 to db2203 - T369515
- 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1192 db1198 db1199 T365995', diff saved to https://phabricator.wikimedia.org/P66039 and previous config saved to /var/cache/conftool/dbconfig/20240709-102947-root.json
- 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66038 and previous config saved to /var/cache/conftool/dbconfig/20240709-102549-root.json
- 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66037 and previous config saved to /var/cache/conftool/dbconfig/20240709-101043-root.json
- 10:04 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 10:03 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 09:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369515
- 09:57 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2203 with weight 0 T369515', diff saved to https://phabricator.wikimedia.org/P66036 and previous config saved to /var/cache/conftool/dbconfig/20240709-095659-root.json
- 09:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369515
- 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66035 and previous config saved to /var/cache/conftool/dbconfig/20240709-095538-root.json
- 09:26 cparle@deploy1002: Finished deploy [airflow-dags/platform_eng@0e9b3ac]: (no justification provided) (duration: 00m 32s)
- 09:26 cparle@deploy1002: Started deploy [airflow-dags/platform_eng@0e9b3ac]: (no justification provided)
- 09:06 vgutierrez: restart purged @ cp3073
- 08:28 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 08:28 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 08:28 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 08:27 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 08:17 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.13 refs T366958
- 08:03 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 08:01 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 08:01 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 07:59 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 07:58 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 07:57 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox-dev2002.codfw.wmnet
- 07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
- 07:40 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
- 07:40 Dreamy_Jazz: Morning UTC backport window done
- 07:38 vgutierrez: repool cp3073
- 07:35 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 07:32 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp3073.*} and A:cp
- 07:32 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3073.esams.wmnet
- 07:30 dreamyjazz@deploy1002: Synchronized wmf-config/throttle.php: Deploying throttle change for T369522 (duration: 09m 50s)
- 07:26 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts netbox-dev2002.codfw.wmnet
- 07:25 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
- 07:12 fabfur@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on P{cp3073.*} and A:cp
- 07:10 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
- 07:08 fabfur@cumin1002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on P{cp3073.*} and A:cp
- 07:08 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
- 06:54 Dreamy_Jazz: Start `foreachwikiindblist group2.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` in a tmux session
- 05:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 05:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 05:20 marostegui: Deploy schema change on s2 eqiad db1162 dbmaint T367856
- 05:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Long schema change
- 05:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Long schema change
- 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1162 T369339', diff saved to https://phabricator.wikimedia.org/P66034 and previous config saved to /var/cache/conftool/dbconfig/20240709-051911-marostegui.json
- 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1222 to s2 primary and set section read-write T369339', diff saved to https://phabricator.wikimedia.org/P66033 and previous config saved to /var/cache/conftool/dbconfig/20240709-051814-marostegui.json
- 05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T369339', diff saved to https://phabricator.wikimedia.org/P66032 and previous config saved to /var/cache/conftool/dbconfig/20240709-051749-marostegui.json
- 05:17 marostegui: Starting s2 eqiad failover from db1162 to db1222 - T369339
- 04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369339
- 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1222 with weight 0 T369339', diff saved to https://phabricator.wikimedia.org/P66031 and previous config saved to /var/cache/conftool/dbconfig/20240709-045814-marostegui.json
- 04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369339
- 04:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T367856)', diff saved to https://phabricator.wikimedia.org/P66030 and previous config saved to /var/cache/conftool/dbconfig/20240709-044128-marostegui.json
- 04:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 04:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 04:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 04:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 04:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P66029 and previous config saved to /var/cache/conftool/dbconfig/20240709-044051-marostegui.json
- 04:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66028 and previous config saved to /var/cache/conftool/dbconfig/20240709-042544-marostegui.json
- 04:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66027 and previous config saved to /var/cache/conftool/dbconfig/20240709-041036-marostegui.json
- 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.10 (duration: 00m 57s)
- 03:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P66026 and previous config saved to /var/cache/conftool/dbconfig/20240709-035529-marostegui.json
- 03:53 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.13 refs T366958 (duration: 50m 52s)
- 03:03 mwpresync@deploy1002: Started scap sync-world: testwikis wikis to 1.43.0-wmf.13 refs T366958
- 01:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66025 and previous config saved to /var/cache/conftool/dbconfig/20240709-014242-arnaudb.json
- 01:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P66024 and previous config saved to /var/cache/conftool/dbconfig/20240709-012735-arnaudb.json
- 01:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P66023 and previous config saved to /var/cache/conftool/dbconfig/20240709-011227-arnaudb.json
- 00:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66022 and previous config saved to /var/cache/conftool/dbconfig/20240709-005720-arnaudb.json
- 00:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66021 and previous config saved to /var/cache/conftool/dbconfig/20240709-005456-arnaudb.json
- 00:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 00:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 00:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet
- 00:14 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
- 00:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 00:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 00:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66020 and previous config saved to /var/cache/conftool/dbconfig/20240709-001324-arnaudb.json
- 00:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 00:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 00:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P66019 and previous config saved to /var/cache/conftool/dbconfig/20240709-001250-marostegui.json
- 00:05 ejegg: payments-wiki upgraded from 82a5e588 to dc0c14d4
2024-07-08
- 23:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P66018 and previous config saved to /var/cache/conftool/dbconfig/20240708-235817-arnaudb.json
- 23:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P66017 and previous config saved to /var/cache/conftool/dbconfig/20240708-235742-marostegui.json
- 23:52 fabfur@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on A:cp-text_esams
- 23:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P66016 and previous config saved to /var/cache/conftool/dbconfig/20240708-234310-arnaudb.json
- 23:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P66015 and previous config saved to /var/cache/conftool/dbconfig/20240708-234235-marostegui.json
- 23:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66014 and previous config saved to /var/cache/conftool/dbconfig/20240708-232803-arnaudb.json
- 23:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P66013 and previous config saved to /var/cache/conftool/dbconfig/20240708-232728-marostegui.json
- 23:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66012 and previous config saved to /var/cache/conftool/dbconfig/20240708-232549-arnaudb.json
- 23:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 23:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 23:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66011 and previous config saved to /var/cache/conftool/dbconfig/20240708-232527-arnaudb.json
- 23:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P66010 and previous config saved to /var/cache/conftool/dbconfig/20240708-231020-arnaudb.json
- 22:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P66009 and previous config saved to /var/cache/conftool/dbconfig/20240708-225513-arnaudb.json
- 22:46 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
- 22:42 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_esams
- 22:42 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3081.esams.wmnet
- 22:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66008 and previous config saved to /var/cache/conftool/dbconfig/20240708-224006-arnaudb.json
- 22:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66007 and previous config saved to /var/cache/conftool/dbconfig/20240708-223752-arnaudb.json
- 22:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 22:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 22:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66006 and previous config saved to /var/cache/conftool/dbconfig/20240708-223741-arnaudb.json
- 22:26 bking@cumin2002: START - Cookbook sre.wdqs.reboot
- 22:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P66005 and previous config saved to /var/cache/conftool/dbconfig/20240708-222234-arnaudb.json
- 22:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P66004 and previous config saved to /var/cache/conftool/dbconfig/20240708-220727-arnaudb.json
- 21:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66003 and previous config saved to /var/cache/conftool/dbconfig/20240708-215220-arnaudb.json
- 21:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66002 and previous config saved to /var/cache/conftool/dbconfig/20240708-214954-arnaudb.json
- 21:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 21:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 21:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P66001 and previous config saved to /var/cache/conftool/dbconfig/20240708-214932-arnaudb.json
- 21:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P66000 and previous config saved to /var/cache/conftool/dbconfig/20240708-213425-arnaudb.json
- 21:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 21:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 21:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P65999 and previous config saved to /var/cache/conftool/dbconfig/20240708-211918-arnaudb.json
- 21:16 catrope@deploy1002: Finished scap: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342) (duration: 09m 23s)
- 21:10 catrope@deploy1002: catrope, nmw03: Continuing with sync
- 21:09 catrope@deploy1002: catrope, nmw03: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:06 catrope@deploy1002: Started scap sync-world: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342)
- 21:05 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic109[3-5]* for T348977 - bking@cumin2002
- 21:05 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic109[3-5]* for T348977 - bking@cumin2002
- 21:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1093-1095].eqiad.wmnet with reason: T348977
- 21:05 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1093-1095].eqiad.wmnet with reason: T348977
- 21:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P65998 and previous config saved to /var/cache/conftool/dbconfig/20240708-210410-arnaudb.json
- 21:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1023.eqiad.wmnet
- 21:02 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3080.esams.wmnet
- 21:01 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3072.esams.wmnet
- 21:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P65997 and previous config saved to /var/cache/conftool/dbconfig/20240708-210144-arnaudb.json
- 21:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 21:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
- 21:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 21:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 21:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65996 and previous config saved to /var/cache/conftool/dbconfig/20240708-210106-arnaudb.json
- 20:55 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1023.eqiad.wmnet
- 20:52 catrope@deploy1002: Finished scap: Backport for Graph extension: Add tracking for data sources used in <graph> tags (duration: 13m 00s)
- 20:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1022.eqiad.wmnet
- 20:47 catrope@deploy1002: catrope: Continuing with sync
- 20:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65995 and previous config saved to /var/cache/conftool/dbconfig/20240708-204559-arnaudb.json
- 20:43 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1022.eqiad.wmnet
- 20:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
- 20:42 catrope@deploy1002: catrope: Backport for Graph extension: Add tracking for data sources used in <graph> tags synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P65994 and previous config saved to /var/cache/conftool/dbconfig/20240708-204042-marostegui.json
- 20:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 20:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
- 20:39 catrope@deploy1002: Started scap sync-world: Backport for Graph extension: Add tracking for data sources used in <graph> tags
- 20:38 bking@cumin2002: START - Cookbook sre.wdqs.reboot
- 20:35 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
- 20:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65993 and previous config saved to /var/cache/conftool/dbconfig/20240708-203052-arnaudb.json
- 20:28 bking@cumin2002: START - Cookbook sre.wdqs.reboot
- 20:27 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
- 20:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65992 and previous config saved to /var/cache/conftool/dbconfig/20240708-201545-arnaudb.json
- 20:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65991 and previous config saved to /var/cache/conftool/dbconfig/20240708-201318-arnaudb.json
- 20:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 20:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 20:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65990 and previous config saved to /var/cache/conftool/dbconfig/20240708-201256-arnaudb.json
- 20:08 bking@cumin2002: START - Cookbook sre.wdqs.reboot
- 19:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P65989 and previous config saved to /var/cache/conftool/dbconfig/20240708-195749-arnaudb.json
- 19:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P65988 and previous config saved to /var/cache/conftool/dbconfig/20240708-194435-marostegui.json
- 19:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 19:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 19:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P65987 and previous config saved to /var/cache/conftool/dbconfig/20240708-194242-arnaudb.json
- 19:39 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
- 19:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65986 and previous config saved to /var/cache/conftool/dbconfig/20240708-192735-arnaudb.json
- 19:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65985 and previous config saved to /var/cache/conftool/dbconfig/20240708-192508-arnaudb.json
- 19:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2129.codfw.wmnet with reason: Maintenance
- 19:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2129.codfw.wmnet with reason: Maintenance
- 19:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65984 and previous config saved to /var/cache/conftool/dbconfig/20240708-192444-arnaudb.json
- 19:21 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3079.esams.wmnet
- 19:21 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3071.esams.wmnet
- 19:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 19:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 19:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65983 and previous config saved to /var/cache/conftool/dbconfig/20240708-190937-arnaudb.json
- 19:02 bking@cumin2002: START - Cookbook sre.wdqs.reboot
- 18:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65982 and previous config saved to /var/cache/conftool/dbconfig/20240708-185430-arnaudb.json
- 18:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65981 and previous config saved to /var/cache/conftool/dbconfig/20240708-183923-arnaudb.json
- 18:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65980 and previous config saved to /var/cache/conftool/dbconfig/20240708-183658-arnaudb.json
- 18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance
- 18:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance
- 18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 18:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance
- 18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 18:35 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 18:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65979 and previous config saved to /var/cache/conftool/dbconfig/20240708-183548-arnaudb.json
- 18:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P65978 and previous config saved to /var/cache/conftool/dbconfig/20240708-182041-arnaudb.json
- 18:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader2002.codfw.wmnet
- 18:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P65977 and previous config saved to /var/cache/conftool/dbconfig/20240708-180533-arnaudb.json
- 18:02 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader2002.codfw.wmnet
- 17:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65976 and previous config saved to /var/cache/conftool/dbconfig/20240708-175026-arnaudb.json
- 17:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65975 and previous config saved to /var/cache/conftool/dbconfig/20240708-174918-arnaudb.json
- 17:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 17:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 17:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65974 and previous config saved to /var/cache/conftool/dbconfig/20240708-174823-arnaudb.json
- 17:40 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3078.esams.wmnet
- 17:38 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3070.esams.wmnet
- 17:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65973 and previous config saved to /var/cache/conftool/dbconfig/20240708-173316-arnaudb.json
- 17:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65972 and previous config saved to /var/cache/conftool/dbconfig/20240708-171810-arnaudb.json
- 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65971 and previous config saved to /var/cache/conftool/dbconfig/20240708-170302-arnaudb.json
- 17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65970 and previous config saved to /var/cache/conftool/dbconfig/20240708-170053-arnaudb.json
- 17:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1224.eqiad.wmnet with reason: Maintenance
- 17:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1224.eqiad.wmnet with reason: Maintenance
- 17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65969 and previous config saved to /var/cache/conftool/dbconfig/20240708-170031-arnaudb.json
- 16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65968 and previous config saved to /var/cache/conftool/dbconfig/20240708-164524-arnaudb.json
- 16:39 ladsgroup@deploy1002: Finished scap: Backport for Reduce frequency of two query pages in commonswiki (T369024) (duration: 07m 50s)
- 16:34 ladsgroup@deploy1002: ladsgroup: Continuing with sync
- 16:33 ladsgroup@deploy1002: ladsgroup: Backport for Reduce frequency of two query pages in commonswiki (T369024) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 16:31 ladsgroup@deploy1002: Started scap sync-world: Backport for Reduce frequency of two query pages in commonswiki (T369024)
- 16:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65967 and previous config saved to /var/cache/conftool/dbconfig/20240708-163017-arnaudb.json
- 16:15 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
- 16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65966 and previous config saved to /var/cache/conftool/dbconfig/20240708-161510-arnaudb.json
- 16:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65965 and previous config saved to /var/cache/conftool/dbconfig/20240708-161302-arnaudb.json
- 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 16:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65964 and previous config saved to /var/cache/conftool/dbconfig/20240708-161238-arnaudb.json
- 16:09 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1011.eqiad.wmnet
- 16:08 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1011.eqiad.wmnet with OS bullseye
- 15:57 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3077.esams.wmnet
- 15:57 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3069.esams.wmnet
- 15:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65963 and previous config saved to /var/cache/conftool/dbconfig/20240708-155731-arnaudb.json
- 15:51 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 28s)
- 15:47 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:46 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:45 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:45 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:45 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:45 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 15:44 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 54s)
- 15:44 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 15:44 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65962 and previous config saved to /var/cache/conftool/dbconfig/20240708-154224-arnaudb.json
- 15:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
- 15:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
- 15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65961 and previous config saved to /var/cache/conftool/dbconfig/20240708-152717-arnaudb.json
- 15:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65960 and previous config saved to /var/cache/conftool/dbconfig/20240708-152508-arnaudb.json
- 15:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 15:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 15:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65959 and previous config saved to /var/cache/conftool/dbconfig/20240708-152446-arnaudb.json
- 15:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Bumping db1227 weight (T366852)', diff saved to https://phabricator.wikimedia.org/P65958 and previous config saved to /var/cache/conftool/dbconfig/20240708-152222-ladsgroup.json
- 15:16 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1011.eqiad.wmnet with reason: host reimage
- 15:13 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1011.eqiad.wmnet with reason: host reimage
- 15:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65957 and previous config saved to /var/cache/conftool/dbconfig/20240708-150939-arnaudb.json
- 14:59 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1011.eqiad.wmnet with OS bullseye
- 14:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader1002.eqiad.wmnet
- 14:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65956 and previous config saved to /var/cache/conftool/dbconfig/20240708-145432-arnaudb.json
- 14:53 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
- 14:53 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host search-loader1002.eqiad.wmnet
- 14:53 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
- 14:52 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host search-loader1002.eqiad.wmnet
- 14:51 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
- 14:51 claime: cleaning up old shellbox files on mw1438
- 14:43 root@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1011.eqiad.wmnet
- 14:43 root@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
- 14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65955 and previous config saved to /var/cache/conftool/dbconfig/20240708-143925-arnaudb.json
- 14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65954 and previous config saved to /var/cache/conftool/dbconfig/20240708-143716-arnaudb.json
- 14:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 14:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 14:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65953 and previous config saved to /var/cache/conftool/dbconfig/20240708-143654-arnaudb.json
- 14:34 root@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1011.eqiad.wmnet
- 14:31 root@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1011.eqiad.wmnet
- 14:27 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
- 14:27 bking@cumin2002: START - Cookbook sre.wdqs.reboot
- 14:23 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 14:22 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 14:22 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 14:21 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 14:21 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
- 14:21 bking@cumin2002: START - Cookbook sre.wdqs.reboot
- 14:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65952 and previous config saved to /var/cache/conftool/dbconfig/20240708-142147-arnaudb.json
- 14:21 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
- 14:21 bking@cumin2002: START - Cookbook sre.wdqs.reboot
- 14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
- 14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
- 14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
- 14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
- 14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
- 14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
- 14:18 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
- 14:17 bking@cumin2002: START - Cookbook sre.wdqs.reboot
- 14:17 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
- 14:17 bking@cumin2002: START - Cookbook sre.wdqs.reboot
- 14:17 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3068.esams.wmnet
- 14:16 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3076.esams.wmnet
- 14:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 14:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
- 14:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65951 and previous config saved to /var/cache/conftool/dbconfig/20240708-141432-marostegui.json
- 14:13 claime: cleaning up old shellbox files on mw1446
- 14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65950 and previous config saved to /var/cache/conftool/dbconfig/20240708-140640-arnaudb.json
- 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P65949 and previous config saved to /var/cache/conftool/dbconfig/20240708-135925-marostegui.json
- 13:58 urbanecm@deploy1002: Finished scap: Backport for lib: Update metrics-platform to 84ed8dcbe7c9 (duration: 10m 36s)
- 13:53 urbanecm@deploy1002: phuedx, urbanecm: Continuing with sync
- 13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65948 and previous config saved to /var/cache/conftool/dbconfig/20240708-135132-arnaudb.json
- 13:50 urbanecm@deploy1002: phuedx, urbanecm: Backport for lib: Update metrics-platform to 84ed8dcbe7c9 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65947 and previous config saved to /var/cache/conftool/dbconfig/20240708-135024-arnaudb.json
- 13:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 13:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65946 and previous config saved to /var/cache/conftool/dbconfig/20240708-135002-arnaudb.json
- 13:48 urbanecm@deploy1002: Started scap sync-world: Backport for lib: Update metrics-platform to 84ed8dcbe7c9
- 13:47 urbanecm@deploy1002: Finished scap: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408) (duration: 30m 38s)
- 13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P65945 and previous config saved to /var/cache/conftool/dbconfig/20240708-134418-marostegui.json
- 13:42 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
- 13:39 urbanecm@deploy1002: tchin, jforrester, urbanecm: Continuing with sync
- 13:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65944 and previous config saved to /var/cache/conftool/dbconfig/20240708-133456-arnaudb.json
- 13:32 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
- 13:32 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
- 13:32 urbanecm@deploy1002: tchin, jforrester, urbanecm: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:31 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
- 13:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65943 and previous config saved to /var/cache/conftool/dbconfig/20240708-132911-marostegui.json
- 13:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65942 and previous config saved to /var/cache/conftool/dbconfig/20240708-131948-arnaudb.json
- 13:17 urbanecm@deploy1002: Started scap sync-world: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408)
- 13:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65941 and previous config saved to /var/cache/conftool/dbconfig/20240708-130441-arnaudb.json
- 13:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65940 and previous config saved to /var/cache/conftool/dbconfig/20240708-130333-arnaudb.json
- 13:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 13:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 13:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 13:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 12:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:51 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1002.eqiad.wmnet with OS bookworm
- 12:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:48 vgutierrez: test bwlimit per url on cp4051 - T317799
- 12:43 marostegui@cumin1002: dbctl commit (dc=all): 'Pool with small weight T365805', diff saved to https://phabricator.wikimedia.org/P65939 and previous config saved to /var/cache/conftool/dbconfig/20240708-124310-marostegui.json
- 12:36 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3067.esams.wmnet
- 12:36 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3075.esams.wmnet
- 12:35 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
- 12:32 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
- 12:27 btullis@deploy1002: Finished deploy [airflow-dags/analytics@a2faba7]: (no justification provided) (duration: 00m 27s)
- 12:27 btullis@deploy1002: Started deploy [airflow-dags/analytics@a2faba7]: (no justification provided)
- 12:19 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-mariadb1002.eqiad.wmnet with OS bookworm
- 11:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65938 and previous config saved to /var/cache/conftool/dbconfig/20240708-115422-root.json
- 11:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 262476
- 11:47 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 262476
- 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65937 and previous config saved to /var/cache/conftool/dbconfig/20240708-113917-root.json
- 11:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 11:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
- 11:27 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 11:26 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 11:26 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 11:25 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 11:25 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 11:25 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 11:24 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 11:24 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 11:24 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 11:24 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 11:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65936 and previous config saved to /var/cache/conftool/dbconfig/20240708-112411-root.json
- 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65935 and previous config saved to /var/cache/conftool/dbconfig/20240708-110905-root.json
- 10:55 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3066.esams.wmnet
- 10:55 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3074.esams.wmnet
- 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65934 and previous config saved to /var/cache/conftool/dbconfig/20240708-105400-root.json
- 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65933 and previous config saved to /var/cache/conftool/dbconfig/20240708-105348-marostegui.json
- 10:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 10:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
- 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65932 and previous config saved to /var/cache/conftool/dbconfig/20240708-105325-marostegui.json
- 10:45 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_esams
- 10:45 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_esams
- 10:45 fabfur: rebooting A:cp-esams (T366555)
- 10:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270359
- 10:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 270359
- 10:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268248
- 10:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268248
- 10:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262476
- 10:42 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262476
- 10:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 272432
- 10:41 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 272432
- 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65931 and previous config saved to /var/cache/conftool/dbconfig/20240708-103854-root.json
- 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P65930 and previous config saved to /var/cache/conftool/dbconfig/20240708-103818-marostegui.json
- 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65929 and previous config saved to /var/cache/conftool/dbconfig/20240708-102347-root.json
- 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P65928 and previous config saved to /var/cache/conftool/dbconfig/20240708-102311-marostegui.json
- 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65927 and previous config saved to /var/cache/conftool/dbconfig/20240708-100804-marostegui.json
- 10:06 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
- 10:02 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
- 10:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 10:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 09:58 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 09:55 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 09:50 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: sync
- 09:50 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: sync
- 09:49 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
- 09:49 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
- 09:44 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: sync
- 09:44 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: sync
- 09:41 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
- 09:41 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
- 09:38 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
- 09:38 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
- 09:32 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
- 09:32 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
- 09:31 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
- 09:31 elukey@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
- 09:17 arturo: aborrero@apt1002:~$ sudo -i reprepro --component thirdparty/k9s includedeb bookworm-wikimedia /home/aborrero/k9s_linux_amd64.deb (T366061)
- 08:59 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
- 08:56 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
- 08:51 Dreamy_Jazz: Running `foreachwikiindblist group1.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` in a tmux session
- 08:50 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
- 08:42 arturo: update packages for thirdparty/kubeadm-k8s-1-25 bookworm-wikimedia in apt1002 (T369163)
- 08:26 godog: re-enable business hours americas oncall - T369122
- 07:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 270052
- 07:01 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 270052
- 06:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52455
- 06:16 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52455
- 06:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 137409
- 06:14 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 137409
- 06:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 27768
- 06:13 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 27768
- 06:11 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61512
- 06:09 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61512
- 06:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 269783
- 06:08 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 269783
- 06:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52320
- 06:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52320
- 06:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7738
- 06:04 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 7738
- 06:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52468
- 06:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52468
- 06:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270052
- 06:01 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 270052
- 05:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28008
- 05:59 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28008
- 05:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 17072
- 05:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 17072
- 05:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263522
- 05:38 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 263522
- 05:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61942
- 05:38 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61942
- 05:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18013
- 05:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 18013
- 05:37 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268248
- 05:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268248
- 05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61672
- 05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61672
- 05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28352
- 05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28352
- 05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 999
- 05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 999
- 05:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4788
- 05:34 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 4788
- 05:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 132167
- 05:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 132167
- 05:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6447
- 05:32 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 6447
- 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65926 and previous config saved to /var/cache/conftool/dbconfig/20240708-053133-marostegui.json
- 05:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 05:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
- 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65925 and previous config saved to /var/cache/conftool/dbconfig/20240708-053122-marostegui.json
- 05:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28306
- 05:29 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28306
- 05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Long schema change
- 05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Long schema change
- 05:24 marostegui: Deploy schema change on s5 codfw db2213 dbmaint T367856
- 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2213 T369478', diff saved to https://phabricator.wikimedia.org/P65923 and previous config saved to /var/cache/conftool/dbconfig/20240708-051935-root.json
- 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2123 to s5 primary T369478', diff saved to https://phabricator.wikimedia.org/P65922 and previous config saved to /var/cache/conftool/dbconfig/20240708-051840-root.json
- 05:18 marostegui: Starting s5 codfw failover from db2213 to db2123 - T369478
- 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P65921 and previous config saved to /var/cache/conftool/dbconfig/20240708-051615-marostegui.json
- 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2123 from dump/slow', diff saved to https://phabricator.wikimedia.org/P65920 and previous config saved to /var/cache/conftool/dbconfig/20240708-051605-marostegui.json
- 05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369478
- 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2123 with weight 0 T369478', diff saved to https://phabricator.wikimedia.org/P65919 and previous config saved to /var/cache/conftool/dbconfig/20240708-050301-root.json
- 05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369478
- 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P65918 and previous config saved to /var/cache/conftool/dbconfig/20240708-045246-marostegui.json
- 04:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65917 and previous config saved to /var/cache/conftool/dbconfig/20240708-043738-marostegui.json
- 01:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65916 and previous config saved to /var/cache/conftool/dbconfig/20240708-014044-marostegui.json
- 01:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 01:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
- 01:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65915 and previous config saved to /var/cache/conftool/dbconfig/20240708-014022-marostegui.json
- 01:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P65914 and previous config saved to /var/cache/conftool/dbconfig/20240708-012515-marostegui.json
- 01:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P65913 and previous config saved to /var/cache/conftool/dbconfig/20240708-011008-marostegui.json
- 00:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65912 and previous config saved to /var/cache/conftool/dbconfig/20240708-005501-marostegui.json
2024-07-07
- 21:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65911 and previous config saved to /var/cache/conftool/dbconfig/20240707-215014-marostegui.json
- 21:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 21:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
- 21:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65910 and previous config saved to /var/cache/conftool/dbconfig/20240707-214952-marostegui.json
- 21:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P65909 and previous config saved to /var/cache/conftool/dbconfig/20240707-213445-marostegui.json
- 21:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P65908 and previous config saved to /var/cache/conftool/dbconfig/20240707-211938-marostegui.json
- 21:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65907 and previous config saved to /var/cache/conftool/dbconfig/20240707-210430-marostegui.json
- 15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65906 and previous config saved to /var/cache/conftool/dbconfig/20240707-154059-marostegui.json
- 15:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 15:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 15:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
- 15:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
2024-07-06
- 18:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65905 and previous config saved to /var/cache/conftool/dbconfig/20240706-182625-marostegui.json
- 18:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P65904 and previous config saved to /var/cache/conftool/dbconfig/20240706-181117-marostegui.json
- 17:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P65903 and previous config saved to /var/cache/conftool/dbconfig/20240706-175610-marostegui.json
- 17:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65902 and previous config saved to /var/cache/conftool/dbconfig/20240706-174103-marostegui.json
- 17:21 hnowlan@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
- 17:18 hnowlan@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
- 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65901 and previous config saved to /var/cache/conftool/dbconfig/20240706-124535-marostegui.json
- 12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
- 12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
- 07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2201.codfw.wmnet with reason: Maintenance
- 07:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2201.codfw.wmnet with reason: Maintenance
- 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65900 and previous config saved to /var/cache/conftool/dbconfig/20240706-075448-marostegui.json
- 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P65899 and previous config saved to /var/cache/conftool/dbconfig/20240706-073941-marostegui.json
- 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P65898 and previous config saved to /var/cache/conftool/dbconfig/20240706-072434-marostegui.json
- 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65897 and previous config saved to /var/cache/conftool/dbconfig/20240706-070927-marostegui.json
- 04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65896 and previous config saved to /var/cache/conftool/dbconfig/20240706-043535-marostegui.json
- 04:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
- 04:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
- 04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65895 and previous config saved to /var/cache/conftool/dbconfig/20240706-043513-marostegui.json
- 04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P65894 and previous config saved to /var/cache/conftool/dbconfig/20240706-042006-marostegui.json
- 04:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P65893 and previous config saved to /var/cache/conftool/dbconfig/20240706-040459-marostegui.json
- 03:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65892 and previous config saved to /var/cache/conftool/dbconfig/20240706-034952-marostegui.json
- 00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65891 and previous config saved to /var/cache/conftool/dbconfig/20240706-005648-marostegui.json
- 00:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 00:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
- 00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65890 and previous config saved to /var/cache/conftool/dbconfig/20240706-005626-marostegui.json
- 00:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P65889 and previous config saved to /var/cache/conftool/dbconfig/20240706-004119-marostegui.json
- 00:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P65888 and previous config saved to /var/cache/conftool/dbconfig/20240706-002612-marostegui.json
- 00:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65887 and previous config saved to /var/cache/conftool/dbconfig/20240706-001105-marostegui.json
2024-07-05
- 20:05 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 20:04 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 18:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65886 and previous config saved to /var/cache/conftool/dbconfig/20240705-185604-marostegui.json
- 18:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 18:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
- 18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65885 and previous config saved to /var/cache/conftool/dbconfig/20240705-185542-marostegui.json
- 18:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P65884 and previous config saved to /var/cache/conftool/dbconfig/20240705-184034-marostegui.json
- 18:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65883 and previous config saved to /var/cache/conftool/dbconfig/20240705-183428-root.json
- 18:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P65882 and previous config saved to /var/cache/conftool/dbconfig/20240705-182527-marostegui.json
- 18:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65881 and previous config saved to /var/cache/conftool/dbconfig/20240705-181923-root.json
- 18:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65880 and previous config saved to /var/cache/conftool/dbconfig/20240705-181020-marostegui.json
- 18:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65879 and previous config saved to /var/cache/conftool/dbconfig/20240705-180417-root.json
- 17:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65878 and previous config saved to /var/cache/conftool/dbconfig/20240705-175653-ladsgroup.json
- 17:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65877 and previous config saved to /var/cache/conftool/dbconfig/20240705-174912-root.json
- 17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P65876 and previous config saved to /var/cache/conftool/dbconfig/20240705-174146-ladsgroup.json
- 17:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65875 and previous config saved to /var/cache/conftool/dbconfig/20240705-173406-root.json
- 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P65874 and previous config saved to /var/cache/conftool/dbconfig/20240705-172639-ladsgroup.json
- 17:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65873 and previous config saved to /var/cache/conftool/dbconfig/20240705-171901-root.json
- 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65872 and previous config saved to /var/cache/conftool/dbconfig/20240705-171131-ladsgroup.json
- 17:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65871 and previous config saved to /var/cache/conftool/dbconfig/20240705-170356-root.json
- 17:00 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@73c6618]: (no justification provided) (duration: 00m 06s)
- 17:00 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@73c6618]: (no justification provided)
- 13:40 hashar@deploy1002: Finished deploy [integration/docroot@18c8279]: Add AQS documentation to landing page - T368484 (duration: 00m 06s)
- 13:40 hashar@deploy1002: Started deploy [integration/docroot@18c8279]: Add AQS documentation to landing page - T368484
- 12:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1246.eqiad.wmnet with reason: Long schema change
- 12:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1246.eqiad.wmnet with reason: Long schema change
- 12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65869 and previous config saved to /var/cache/conftool/dbconfig/20240705-125152-marostegui.json
- 12:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 12:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
- 12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65868 and previous config saved to /var/cache/conftool/dbconfig/20240705-125130-marostegui.json
- 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P65867 and previous config saved to /var/cache/conftool/dbconfig/20240705-123623-marostegui.json
- 12:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P65866 and previous config saved to /var/cache/conftool/dbconfig/20240705-122115-marostegui.json
- 12:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65865 and previous config saved to /var/cache/conftool/dbconfig/20240705-120608-marostegui.json
- 11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P65864 and previous config saved to /var/cache/conftool/dbconfig/20240705-115703-ladsgroup.json
- 11:53 dcausse: T369149: re-indexed wikidata P12861 (cirrus_rerender.rerender --wiki wikidatawiki allpages --namespace 120 --from-title P12861 --to-title P12861)
- 11:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P65863 and previous config saved to /var/cache/conftool/dbconfig/20240705-114157-ladsgroup.json
- 11:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
- 11:29 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
- 11:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P65862 and previous config saved to /var/cache/conftool/dbconfig/20240705-112652-ladsgroup.json
- 11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65861 and previous config saved to /var/cache/conftool/dbconfig/20240705-111322-ladsgroup.json
- 11:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
- 11:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
- 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P65860 and previous config saved to /var/cache/conftool/dbconfig/20240705-111146-ladsgroup.json
- 10:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 10:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 10:41 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149) (duration: 21m 22s)
- 10:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
- 10:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 10:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149)
- 10:11 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 10:10 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 09:35 fabfur: running puppet on A:cp to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1052271 (T369345)
- 09:26 XioNoX: netbox-dev2003: move from netbox-dev to netbox-next - T336275
- 08:55 godog: silence NELNotReported NELByCountryNotReported until Tues - T369345
- 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65858 and previous config saved to /var/cache/conftool/dbconfig/20240705-085406-marostegui.json
- 08:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 08:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 08:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
- 08:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
- 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65857 and previous config saved to /var/cache/conftool/dbconfig/20240705-085329-marostegui.json
- 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P65856 and previous config saved to /var/cache/conftool/dbconfig/20240705-083821-marostegui.json
- 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P65855 and previous config saved to /var/cache/conftool/dbconfig/20240705-082314-marostegui.json
- 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65854 and previous config saved to /var/cache/conftool/dbconfig/20240705-080807-marostegui.json
- 08:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 08:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 07:50 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 07:50 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 07:47 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 07:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 07:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 05:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 05:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65852 and previous config saved to /var/cache/conftool/dbconfig/20240705-051202-marostegui.json
- 05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P65851 and previous config saved to /var/cache/conftool/dbconfig/20240705-050028-root.json
- 04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P65850 and previous config saved to /var/cache/conftool/dbconfig/20240705-045655-marostegui.json
- 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65849 and previous config saved to /var/cache/conftool/dbconfig/20240705-045145-marostegui.json
- 04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
- 04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
- 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65848 and previous config saved to /var/cache/conftool/dbconfig/20240705-044912-marostegui.json
- 04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
- 04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
- 04:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P65847 and previous config saved to /var/cache/conftool/dbconfig/20240705-044148-marostegui.json
- 04:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65846 and previous config saved to /var/cache/conftool/dbconfig/20240705-042641-marostegui.json
- 01:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65845 and previous config saved to /var/cache/conftool/dbconfig/20240705-013250-marostegui.json
- 01:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 01:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 01:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65844 and previous config saved to /var/cache/conftool/dbconfig/20240705-013229-marostegui.json
- 01:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P65843 and previous config saved to /var/cache/conftool/dbconfig/20240705-011721-marostegui.json
- 01:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P65842 and previous config saved to /var/cache/conftool/dbconfig/20240705-010214-marostegui.json
- 00:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65841 and previous config saved to /var/cache/conftool/dbconfig/20240705-004707-marostegui.json
2024-07-04
- 22:04 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 22:03 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65840 and previous config saved to /var/cache/conftool/dbconfig/20240704-220227-marostegui.json
- 22:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 22:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65839 and previous config saved to /var/cache/conftool/dbconfig/20240704-220205-marostegui.json
- 22:01 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 22:00 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 21:59 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 21:59 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
- 21:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P65838 and previous config saved to /var/cache/conftool/dbconfig/20240704-214658-marostegui.json
- 21:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P65837 and previous config saved to /var/cache/conftool/dbconfig/20240704-213151-marostegui.json
- 21:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65836 and previous config saved to /var/cache/conftool/dbconfig/20240704-211644-marostegui.json
- 20:17 jdrewniak@deploy1002: Finished scap: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113) (duration: 12m 14s)
- 20:12 jdrewniak@deploy1002: jdlrobson, nmw03, jdrewniak, dreamyjazz: Continuing with sync
- 20:08 jdrewniak@deploy1002: jdlrobson, nmw03, jdrewniak, dreamyjazz: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:05 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113)
- 19:57 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqiad
- 19:55 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqiad
- 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65835 and previous config saved to /var/cache/conftool/dbconfig/20240704-182308-marostegui.json
- 18:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 18:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 18:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65834 and previous config saved to /var/cache/conftool/dbconfig/20240704-182257-marostegui.json
- 18:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P65833 and previous config saved to /var/cache/conftool/dbconfig/20240704-180749-marostegui.json
- 17:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P65832 and previous config saved to /var/cache/conftool/dbconfig/20240704-175242-marostegui.json
- 17:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65831 and previous config saved to /var/cache/conftool/dbconfig/20240704-173735-marostegui.json
- 17:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1078.eqiad.wmnet
- 16:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:15 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1078.eqiad.wmnet
- 16:14 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 16:14 btullis@cumin1002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 16:06 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 15:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:02 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
- 15:02 elukey@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
- 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65830 and previous config saved to /var/cache/conftool/dbconfig/20240704-143350-marostegui.json
- 14:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 14:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65829 and previous config saved to /var/cache/conftool/dbconfig/20240704-143327-marostegui.json
- 14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P65827 and previous config saved to /var/cache/conftool/dbconfig/20240704-141820-marostegui.json
- 14:03 Lucas_WMDE: UTC afternoon backport+config window done
- 14:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P65826 and previous config saved to /var/cache/conftool/dbconfig/20240704-140313-marostegui.json
- 14:01 claime: Enabling and running puppet on P:trafficserver::backend to merge 1050293 - T367949
- 14:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65825 and previous config saved to /var/cache/conftool/dbconfig/20240704-140145-root.json
- 13:57 claime: Enabling puppet on cp4037.ulsfo.wmnet to test 1050293 - T367949
- 13:53 claime: disabling puppet on P:trafficserver::backend to merge 1049507 - T367949
- 13:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65824 and previous config saved to /var/cache/conftool/dbconfig/20240704-134806-marostegui.json
- 13:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65823 and previous config saved to /var/cache/conftool/dbconfig/20240704-134656-root.json
- 13:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65822 and previous config saved to /var/cache/conftool/dbconfig/20240704-134639-root.json
- 13:44 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900) (duration: 08m 35s)
- 13:41 claime: Enabling and running puppet on P:trafficserver::backend to merge 1050293 - T367949
- 13:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 13:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65821 and previous config saved to /var/cache/conftool/dbconfig/20240704-134105-marostegui.json
- 13:39 logmsgbot: lucaswerkmeister-wmde@deploy1002 dreamrimmer, lucaswerkmeister-wmde: Continuing with sync
- 13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 dreamrimmer, lucaswerkmeister-wmde: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:36 claime: Enabling puppet on cp6016.drmrs.wmnet to test 1050293 - T367949
- 13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900)
- 13:32 claime: disabling puppet on P:trafficserver::backend to merge 1050293 - T367949
- 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65820 and previous config saved to /var/cache/conftool/dbconfig/20240704-133150-root.json
- 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65819 and previous config saved to /var/cache/conftool/dbconfig/20240704-133133-root.json
- 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P65818 and previous config saved to /var/cache/conftool/dbconfig/20240704-132558-marostegui.json
- 13:20 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@d773cac]: (no justification provided) (duration: 00m 03s)
- 13:20 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@d773cac]: (no justification provided)
- 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65817 and previous config saved to /var/cache/conftool/dbconfig/20240704-131643-root.json
- 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65816 and previous config saved to /var/cache/conftool/dbconfig/20240704-131628-root.json
- 13:11 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:11 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P65815 and previous config saved to /var/cache/conftool/dbconfig/20240704-131050-marostegui.json
- 13:09 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:08 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:07 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65814 and previous config saved to /var/cache/conftool/dbconfig/20240704-130137-root.json
- 13:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65813 and previous config saved to /var/cache/conftool/dbconfig/20240704-130122-root.json
- 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65812 and previous config saved to /var/cache/conftool/dbconfig/20240704-125543-marostegui.json
- 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65811 and previous config saved to /var/cache/conftool/dbconfig/20240704-124632-root.json
- 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65810 and previous config saved to /var/cache/conftool/dbconfig/20240704-124617-root.json
- 12:36 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.12 refs T366957
- 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65808 and previous config saved to /var/cache/conftool/dbconfig/20240704-123127-root.json
- 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65807 and previous config saved to /var/cache/conftool/dbconfig/20240704-123111-root.json
- 12:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1213', diff saved to https://phabricator.wikimedia.org/P65806 and previous config saved to /var/cache/conftool/dbconfig/20240704-122752-root.json
- 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65805 and previous config saved to /var/cache/conftool/dbconfig/20240704-121631-root.json
- 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65804 and previous config saved to /var/cache/conftool/dbconfig/20240704-121621-root.json
- 12:11 hashar@deploy1002: Finished scap: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) (duration: 07m 45s)
- 12:06 hashar@deploy1002: hashar, d3r1ck01: Continuing with sync
- 12:06 hashar@deploy1002: hashar, d3r1ck01: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:03 hashar@deploy1002: Started scap sync-world: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260)
- 12:02 hashar@deploy1002: Sync cancelled.
- 12:02 hashar@deploy1002: hashar, d3r1ck01: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:56 hashar@deploy1002: Started scap sync-world: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260)
- 11:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65803 and previous config saved to /var/cache/conftool/dbconfig/20240704-115522-marostegui.json
- 11:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
- 11:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
- 11:54 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1185.eqiad.wmnet onto db1213.eqiad.wmnet
- 11:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 11:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 11:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 11:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 11:14 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1185.eqiad.wmnet onto db1213.eqiad.wmnet
- 11:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1213 db1185 T369250', diff saved to https://phabricator.wikimedia.org/P65802 and previous config saved to /var/cache/conftool/dbconfig/20240704-111324-root.json
- 10:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65801 and previous config saved to /var/cache/conftool/dbconfig/20240704-105205-marostegui.json
- 10:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 10:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 10:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65800 and previous config saved to /var/cache/conftool/dbconfig/20240704-105143-marostegui.json
- 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P65799 and previous config saved to /var/cache/conftool/dbconfig/20240704-103636-marostegui.json
- 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P65798 and previous config saved to /var/cache/conftool/dbconfig/20240704-102129-marostegui.json
- 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65797 and previous config saved to /var/cache/conftool/dbconfig/20240704-100622-marostegui.json
- 09:53 topranks: Pushing updated BGP policy to cr2-eqord in Chiacago to re-announce codfw IP ranges there T367439
- 09:29 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
- 09:24 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1009.eqiad.wmnet
- 09:23 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1009.eqiad.wmnet with OS bullseye
- 09:23 claime: Manual cleanup of puppet certs for renamed servers mw1417.eqiad.wmnet mw1418.eqiad.wmnet mw2300.codfw.wmnet
- 09:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 09:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 09:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old sretest2005 IP - ayounsi@cumin1002"
- 09:16 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old sretest2005 IP - ayounsi@cumin1002"
- 09:13 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 09:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.43.0-wmf.12" - T366957
- 09:03 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1009.eqiad.wmnet with reason: host reimage
- 09:00 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1009.eqiad.wmnet with reason: host reimage
- 08:59 elukey: restart mcrouter on mwmaint1002
- 08:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 08:45 fabfur: enable puppet on A:cp-ulsfo (T365718)
- 08:45 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1009.eqiad.wmnet with OS bullseye
- 08:44 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
- 08:43 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
- 08:28 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 08:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.12 refs T366957
- 08:24 fabfur: temporary disable puppet on A:cp-ulsfo to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1051198 (T365718)
- 08:10 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 08:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqiad
- 08:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqiad
- 08:01 fabfur: start rebooting A:cp-eqiad (upload|text in parallel) for T366555
- 07:52 root@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1009.eqiad.wmnet
- 07:52 root@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
- 07:41 root@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1009.eqiad.wmnet
- 07:35 root@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1009.eqiad.wmnet
- 07:18 dcausse: closing the backport window
- 07:15 dcausse: refreshing the wikitech search indices
- 07:11 dcausse@deploy1002: Finished scap: Backport for cirrus: re-enable search updates on wikitech (duration: 08m 28s)
- 07:06 dcausse@deploy1002: dcausse: Continuing with sync
- 07:05 dcausse@deploy1002: dcausse: Backport for cirrus: re-enable search updates on wikitech synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:02 dcausse@deploy1002: Started scap sync-world: Backport for cirrus: re-enable search updates on wikitech
- 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65794 and previous config saved to /var/cache/conftool/dbconfig/20240704-070100-marostegui.json
- 07:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 07:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
- 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65793 and previous config saved to /var/cache/conftool/dbconfig/20240704-070038-marostegui.json
- 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65791 and previous config saved to /var/cache/conftool/dbconfig/20240704-063024-marostegui.json
- 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65790 and previous config saved to /var/cache/conftool/dbconfig/20240704-061517-marostegui.json
- 05:11 marostegui: Deploy schema change on db1231 s6 eqiad dbmaint T367856
- 05:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Long schema change
- 05:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Long schema change
- 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1231 T369020', diff saved to https://phabricator.wikimedia.org/P65789 and previous config saved to /var/cache/conftool/dbconfig/20240704-050334-marostegui.json
- 05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1173 to s6 primary and set section read-write T369020', diff saved to https://phabricator.wikimedia.org/P65788 and previous config saved to /var/cache/conftool/dbconfig/20240704-050237-marostegui.json
- 05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T369020', diff saved to https://phabricator.wikimedia.org/P65787 and previous config saved to /var/cache/conftool/dbconfig/20240704-050216-marostegui.json
- 05:01 marostegui: Starting s6 eqiad failover from db1231 to db1173 - T369020
- 04:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369020
- 04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1173 with weight 0 T369020', diff saved to https://phabricator.wikimedia.org/P65786 and previous config saved to /var/cache/conftool/dbconfig/20240704-044429-marostegui.json
- 04:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369020
- 03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65785 and previous config saved to /var/cache/conftool/dbconfig/20240704-031151-marostegui.json
- 03:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 03:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65784 and previous config saved to /var/cache/conftool/dbconfig/20240704-031129-marostegui.json
- 02:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P65783 and previous config saved to /var/cache/conftool/dbconfig/20240704-025622-marostegui.json
- 02:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
- 02:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P65782 and previous config saved to /var/cache/conftool/dbconfig/20240704-024115-marostegui.json
- 02:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
- 02:31 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
- 02:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65781 and previous config saved to /var/cache/conftool/dbconfig/20240704-022608-marostegui.json
- 01:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 01:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 01:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65780 and previous config saved to /var/cache/conftool/dbconfig/20240704-014313-marostegui.json
- 01:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P65779 and previous config saved to /var/cache/conftool/dbconfig/20240704-012806-marostegui.json
- 01:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P65778 and previous config saved to /var/cache/conftool/dbconfig/20240704-011258-marostegui.json
- 00:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65777 and previous config saved to /var/cache/conftool/dbconfig/20240704-005750-marostegui.json
- 00:43 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parsoidtest1001.eqiad.wmnet with OS bullseye
- 00:43 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dzahn@cumin1002"
- 00:42 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dzahn@cumin1002"
- 00:29 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parsoidtest1001.eqiad.wmnet with reason: host reimage
- 00:25 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on parsoidtest1001.eqiad.wmnet with reason: host reimage
- 00:15 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
2024-07-03
- 23:47 tzatziki: removing 11 files for legal compliance
- 23:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65776 and previous config saved to /var/cache/conftool/dbconfig/20240703-232302-marostegui.json
- 23:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 23:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 23:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 23:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 23:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65775 and previous config saved to /var/cache/conftool/dbconfig/20240703-232221-marostegui.json
- 23:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65774 and previous config saved to /var/cache/conftool/dbconfig/20240703-232154-ladsgroup.json
- 23:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P65773 and previous config saved to /var/cache/conftool/dbconfig/20240703-230713-marostegui.json
- 23:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P65772 and previous config saved to /var/cache/conftool/dbconfig/20240703-230646-ladsgroup.json
- 22:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P65771 and previous config saved to /var/cache/conftool/dbconfig/20240703-225206-marostegui.json
- 22:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P65770 and previous config saved to /var/cache/conftool/dbconfig/20240703-225139-ladsgroup.json
- 22:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65769 and previous config saved to /var/cache/conftool/dbconfig/20240703-223659-marostegui.json
- 22:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65768 and previous config saved to /var/cache/conftool/dbconfig/20240703-223632-ladsgroup.json
- 22:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
- 21:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
- 21:40 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 21:40 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 21:35 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 20:13 cjming: end of UTC late backport window
- 20:11 cjming@deploy1002: Finished scap: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969) (duration: 08m 22s)
- 20:10 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
- 20:06 cjming@deploy1002: kgraessle, cjming: Continuing with sync
- 20:05 cjming@deploy1002: kgraessle, cjming: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:05 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 20:04 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 20:04 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 20:03 cjming@deploy1002: Started scap sync-world: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969)
- 19:56 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:55 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 19:54 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
- 19:49 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
- 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65766 and previous config saved to /var/cache/conftool/dbconfig/20240703-194055-marostegui.json
- 19:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 19:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65765 and previous config saved to /var/cache/conftool/dbconfig/20240703-194033-marostegui.json
- 19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 19:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P65761 and previous config saved to /var/cache/conftool/dbconfig/20240703-192526-marostegui.json
- 19:25 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
- 19:24 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bookworm
- 19:19 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
- 19:16 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
- 19:12 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@d773cac]: (no justification provided) (duration: 00m 33s)
- 19:11 ebysans@deploy1002: Started deploy [airflow-dags/analytics@d773cac]: (no justification provided)
- 19:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P65760 and previous config saved to /var/cache/conftool/dbconfig/20240703-191019-marostegui.json
- 19:08 SandraEbele_: deploying airflow dags
- 18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65759 and previous config saved to /var/cache/conftool/dbconfig/20240703-185511-marostegui.json
- 18:54 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
- 18:36 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 18:36 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 18:35 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 18:34 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 17:50 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 17:49 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 17:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 17:48 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 17:46 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 17:45 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 17:45 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
- 17:44 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 17:44 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 17:43 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 17:43 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 17:41 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 17:41 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 17:40 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 17:40 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 17:37 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 17:37 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
- 17:36 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
- 17:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65758 and previous config saved to /var/cache/conftool/dbconfig/20240703-173601-root.json
- 17:35 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
- 17:35 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
- 17:35 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
- 17:35 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
- 17:35 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
- 17:34 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
- 17:34 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
- 17:34 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
- 17:34 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
- 17:33 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
- 17:33 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
- 17:31 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
- 17:30 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 17:29 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 17:28 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 17:28 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 17:22 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 17:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65756 and previous config saved to /var/cache/conftool/dbconfig/20240703-172055-root.json
- 17:19 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 17:19 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 17:17 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 17:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 17:15 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 17:11 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 17:10 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 17:10 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 17:09 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 17:08 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 17:07 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 17:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65755 and previous config saved to /var/cache/conftool/dbconfig/20240703-170549-root.json
- 16:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65754 and previous config saved to /var/cache/conftool/dbconfig/20240703-165044-root.json
- 16:47 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Cold booting to investigate RAM issue
- 16:46 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Cold booting to investigate RAM issue
- 16:44 jhathaway: adding inbound email servers mx-in{1001,2001} to our MX record
- 16:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65752 and previous config saved to /var/cache/conftool/dbconfig/20240703-163538-root.json
- 16:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65751 and previous config saved to /var/cache/conftool/dbconfig/20240703-162032-root.json
- 16:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 1%: Repooling', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240703-160521-root.json
- 16:04 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 15:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65750 and previous config saved to /var/cache/conftool/dbconfig/20240703-154716-marostegui.json
- 15:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 15:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 15:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65749 and previous config saved to /var/cache/conftool/dbconfig/20240703-154643-marostegui.json
- 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65748 and previous config saved to /var/cache/conftool/dbconfig/20240703-154142-arnaudb.json
- 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65747 and previous config saved to /var/cache/conftool/dbconfig/20240703-154121-arnaudb.json
- 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65746 and previous config saved to /var/cache/conftool/dbconfig/20240703-154109-arnaudb.json
- 15:32 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 15:31 sukhe: restart haproxy on dns1005
- 15:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P65744 and previous config saved to /var/cache/conftool/dbconfig/20240703-153136-marostegui.json
- 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65743 and previous config saved to /var/cache/conftool/dbconfig/20240703-152636-arnaudb.json
- 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65742 and previous config saved to /var/cache/conftool/dbconfig/20240703-152616-arnaudb.json
- 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65741 and previous config saved to /var/cache/conftool/dbconfig/20240703-152603-arnaudb.json
- 15:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P65740 and previous config saved to /var/cache/conftool/dbconfig/20240703-151628-marostegui.json
- 15:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 v6 - ayounsi@cumin1002"
- 15:13 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 v6 - ayounsi@cumin1002"
- 15:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65739 and previous config saved to /var/cache/conftool/dbconfig/20240703-151131-arnaudb.json
- 15:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65738 and previous config saved to /var/cache/conftool/dbconfig/20240703-151110-arnaudb.json
- 15:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65737 and previous config saved to /var/cache/conftool/dbconfig/20240703-151057-arnaudb.json
- 15:10 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65736 and previous config saved to /var/cache/conftool/dbconfig/20240703-150411-marostegui.json
- 15:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
- 15:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
- 15:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65735 and previous config saved to /var/cache/conftool/dbconfig/20240703-150348-marostegui.json
- 15:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65734 and previous config saved to /var/cache/conftool/dbconfig/20240703-150121-marostegui.json
- 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65733 and previous config saved to /var/cache/conftool/dbconfig/20240703-145625-arnaudb.json
- 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65732 and previous config saved to /var/cache/conftool/dbconfig/20240703-145604-arnaudb.json
- 14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65731 and previous config saved to /var/cache/conftool/dbconfig/20240703-145552-arnaudb.json
- 14:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
- 14:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
- 14:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
- 14:51 fabfur: start rebooting A:cp-drmrs (upload|text in parallel) for T366555
- 14:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65730 and previous config saved to /var/cache/conftool/dbconfig/20240703-144841-marostegui.json
- 14:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 14:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65729 and previous config saved to /var/cache/conftool/dbconfig/20240703-144119-arnaudb.json
- 14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65728 and previous config saved to /var/cache/conftool/dbconfig/20240703-144059-arnaudb.json
- 14:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65727 and previous config saved to /var/cache/conftool/dbconfig/20240703-144046-arnaudb.json
- 14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1006.eqiad.wmnet with OS bookworm
- 14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1005.eqiad.wmnet with OS bookworm
- 14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1004.eqiad.wmnet with OS bookworm
- 14:39 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 14:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 14:38 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 14:38 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 14:35 sukhe: [correction of previous A:dnsbox run] sudo cumin -b1 -s60 "A:dnsbox" "run-puppet-agent"
- 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65726 and previous config saved to /var/cache/conftool/dbconfig/20240703-143334-marostegui.json
- 14:33 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent"
- 14:32 sukhe: sudo cumin "A:wikidough" "run-puppet-agent"
- 14:32 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet
- 14:32 jayme@cumin1002: START - Cookbook sre.hosts.remove-downtime for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet
- 14:30 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
- 14:27 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
- 14:27 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
- 14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65725 and previous config saved to /var/cache/conftool/dbconfig/20240703-142614-arnaudb.json
- 14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65724 and previous config saved to /var/cache/conftool/dbconfig/20240703-142553-arnaudb.json
- 14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65723 and previous config saved to /var/cache/conftool/dbconfig/20240703-142541-arnaudb.json
- 14:25 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 14:21 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
- 14:18 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65722 and previous config saved to /var/cache/conftool/dbconfig/20240703-141826-marostegui.json
- 14:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994
- 14:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994
- 14:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on db1154.eqiad.wmnet with reason: T365994
- 14:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on db1154.eqiad.wmnet with reason: T365994
- 14:11 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 14:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
- 14:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 14:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 14:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 14:08 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 14:07 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
- 14:04 topranks: rebooting lsw1-e2-eqiad to install updated JunOS version T365994
- 14:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad
- 14:00 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad
- 13:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977
- 13:59 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977
- 13:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad
- 13:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad
- 13:57 jayme@cumin1002: conftool action : set/pooled=no; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
- 13:56 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002
- 13:56 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002
- 13:56 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2
- 13:55 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2
- 13:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad
- 13:52 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad
- 13:48 Lucas_WMDE: UTC afternoon backport+config window done
- 13:48 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup (duration: 08m 38s)
- 13:44 jayme: draining wikikube-worker1007.eqiad.wmnet wikikube-worker1021.eqiad.wmnet kubernetes1060.eqiad.wmnet for T365994
- 13:43 logmsgbot: lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Continuing with sync
- 13:42 logmsgbot: lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:39 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup
- 13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147) (duration: 09m 28s)
- 13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Continuing with sync
- 13:31 logmsgbot: lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1006.eqiad.wmnet with OS bookworm
- 13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1005.eqiad.wmnet with OS bookworm
- 13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm
- 13:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)
- 13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155) (duration: 08m 20s)
- 13:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
- 13:20 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host parsoidtest1001
- 13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
- 13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:19 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host parsoidtest1001
- 13:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994
- 13:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994
- 13:17 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 49.3.193.10.in-addr.arpa. on all recursors
- 13:17 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 49.3.193.10.in-addr.arpa. on all recursors
- 13:17 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2002.mgmt.codfw.wmnet on all recursors
- 13:17 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache sretest2002.mgmt.codfw.wmnet on all recursors
- 13:17 arnaudb@cumin1002: dbctl commit (dc=all): 'T365994 - depool db1191,db1196,db1197', diff saved to https://phabricator.wikimedia.org/P65721 and previous config saved to /var/cache/conftool/dbconfig/20240703-131715-arnaudb.json
- 13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155)
- 13:16 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:16 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
- 13:15 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes kawikisource --fix # T363243; 34 pages to fix, 34 were resolvable; 774 links to fix, 774 were resolvable, 0 were deleted
- 13:15 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
- 13:14 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes mswikisource --fix # T369047; 6 pages to fix, 6 were resolvable; 76 links to fix, 73 were resolvable, 3 were deleted
- 13:13 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 13:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243) (duration: 10m 39s)
- 13:07 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Continuing with sync
- 13:04 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243)
- 12:51 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
- 12:47 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
- 12:39 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
- 12:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
- 12:37 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 12:34 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 12:30 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
- 12:17 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
- 12:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P65720 and previous config saved to /var/cache/conftool/dbconfig/20240703-121009-ladsgroup.json
- 11:55 ladsgroup@deploy1002: Finished scap: Backport for rpc: Update function call in RunSingleJob (T363839) (duration: 08m 08s)
- 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P65719 and previous config saved to /var/cache/conftool/dbconfig/20240703-115504-ladsgroup.json
- 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65718 and previous config saved to /var/cache/conftool/dbconfig/20240703-115211-marostegui.json
- 11:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 11:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65717 and previous config saved to /var/cache/conftool/dbconfig/20240703-115149-marostegui.json
- 11:50 ladsgroup@deploy1002: ladsgroup: Continuing with sync
- 11:49 ladsgroup@deploy1002: ladsgroup: Backport for rpc: Update function call in RunSingleJob (T363839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:47 ladsgroup@deploy1002: Started scap sync-world: Backport for rpc: Update function call in RunSingleJob (T363839)
- 11:45 ladsgroup@deploy1002: Finished scap: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190) (duration: 09m 28s)
- 11:40 ladsgroup@deploy1002: volker-e, ladsgroup: Continuing with sync
- 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P65716 and previous config saved to /var/cache/conftool/dbconfig/20240703-113958-ladsgroup.json
- 11:39 ladsgroup@deploy1002: volker-e, ladsgroup: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P65715 and previous config saved to /var/cache/conftool/dbconfig/20240703-113642-marostegui.json
- 11:35 ladsgroup@deploy1002: Started scap sync-world: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190)
- 11:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65714 and previous config saved to /var/cache/conftool/dbconfig/20240703-112728-ladsgroup.json
- 11:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 11:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P65713 and previous config saved to /var/cache/conftool/dbconfig/20240703-112452-ladsgroup.json
- 11:21 cgoubert@deploy1002: Finished scap: mw-on-k8s: Move php.envvars to mediawiki-common - T365265 (duration: 05m 22s)
- 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P65712 and previous config saved to /var/cache/conftool/dbconfig/20240703-112135-marostegui.json
- 11:16 cgoubert@deploy1002: Started scap sync-world: mw-on-k8s: Move php.envvars to mediawiki-common - T365265
- 11:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 11:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65711 and previous config saved to /var/cache/conftool/dbconfig/20240703-110627-marostegui.json
- 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65710 and previous config saved to /var/cache/conftool/dbconfig/20240703-103839-marostegui.json
- 10:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
- 10:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
- 10:33 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 10:32 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 10:32 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 09:49 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@d773cac]: (no justification provided) (duration: 00m 07s)
- 09:49 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@d773cac]: (no justification provided)
- 09:31 mlitn@deploy1002: Finished scap: Backport for Handle campaigns where wikibase is not enabled (T369085) (duration: 12m 59s)
- 09:27 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testvm2008 - ayounsi@cumin1002"
- 09:26 mlitn@deploy1002: mlitn: Continuing with sync
- 09:26 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testvm2008 - ayounsi@cumin1002"
- 09:21 mlitn@deploy1002: mlitn: Backport for Handle campaigns where wikibase is not enabled (T369085) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:20 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 09:20 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 09:20 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 09:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2008.wikimedia.org
- 09:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2008.wikimedia.org with OS bookworm
- 09:20 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65709 and previous config saved to /var/cache/conftool/dbconfig/20240703-091956-marostegui.json
- 09:18 mlitn@deploy1002: Started scap sync-world: Backport for Handle campaigns where wikibase is not enabled (T369085)
- 09:09 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2002.codfw.wmnet
- 09:06 topranks: merge host firewall changes to set default DSCP marking (T339850)
- 09:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2008.wikimedia.org with reason: host reimage
- 09:02 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2008.wikimedia.org with reason: host reimage
- 09:02 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2002.codfw.wmnet
- 09:01 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 09:01 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 09:00 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 09:00 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 09:00 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 08:59 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 08:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 08:58 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2001.codfw.wmnet
- 08:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 08:53 jayme: deployed istio (adding securityContext) to wikikube clusters - T362978
- 08:51 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2001.codfw.wmnet
- 08:51 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1002.eqiad.wmnet
- 08:49 Lucas_WMDE: RELEASE_NAME=r72z2aop helmfile --file /srv/deployment-charts/helmfile.d/services/mw-script/helmfile.yaml --environment eqiad --selector name=r72z2aop destroy # clean up broken mwscript-k8s run I did just to test something
- 08:46 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2008.wikimedia.org with OS bookworm
- 08:45 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
- 08:45 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
- 08:44 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1002.eqiad.wmnet
- 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2008.wikimedia.org on all recursors
- 08:44 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache testvm2008.wikimedia.org on all recursors
- 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
- 08:43 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
- 08:43 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
- 08:42 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
- 08:42 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
- 08:42 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1001.eqiad.wmnet
- 08:41 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
- 08:41 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
- 08:41 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
- 08:41 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
- 08:41 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 08:41 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
- 08:40 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 08:40 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
- 08:40 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host testvm2008.wikimedia.org
- 08:40 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
- 08:40 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 08:40 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
- 08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
- 08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
- 08:40 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
- 08:39 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
- 08:39 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 08:39 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 08:39 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 08:38 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 08:35 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1001.eqiad.wmnet
- 08:31 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host karapace1002.eqiad.wmnet
- 08:22 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1002.eqiad.wmnet
- 08:18 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host karapace1001.eqiad.wmnet
- 08:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.12 refs T366957
- 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65707 and previous config saved to /var/cache/conftool/dbconfig/20240703-081059-marostegui.json
- 08:09 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
- 08:09 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host karapace1001.eqiad.wmnet
- 08:09 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
- 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65706 and previous config saved to /var/cache/conftool/dbconfig/20240703-075245-marostegui.json
- 07:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 07:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65705 and previous config saved to /var/cache/conftool/dbconfig/20240703-074321-marostegui.json
- 07:36 kart_: Updated MinT to 2024-07-02-060114-production (T364525)
- 07:33 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
- 07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P65704 and previous config saved to /var/cache/conftool/dbconfig/20240703-072814-marostegui.json
- 07:23 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
- 07:21 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
- 07:14 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
- 07:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P65702 and previous config saved to /var/cache/conftool/dbconfig/20240703-071306-marostegui.json
- 07:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
- 07:07 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
- 06:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65701 and previous config saved to /var/cache/conftool/dbconfig/20240703-065759-marostegui.json
- 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65700 and previous config saved to /var/cache/conftool/dbconfig/20240703-062057-root.json
- 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65699 and previous config saved to /var/cache/conftool/dbconfig/20240703-060552-root.json
- 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65698 and previous config saved to /var/cache/conftool/dbconfig/20240703-055046-root.json
- 05:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65697 and previous config saved to /var/cache/conftool/dbconfig/20240703-053541-root.json
- 05:23 marostegui: Deploy schema change on db2207 s2 codfw dbmaint T367856
- 05:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Long schema change
- 05:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Long schema change
- 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2207 T369130', diff saved to https://phabricator.wikimedia.org/P65696 and previous config saved to /var/cache/conftool/dbconfig/20240703-052118-root.json
- 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65695 and previous config saved to /var/cache/conftool/dbconfig/20240703-052035-root.json
- 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2204 to s2 primary T369130', diff saved to https://phabricator.wikimedia.org/P65694 and previous config saved to /var/cache/conftool/dbconfig/20240703-052029-root.json
- 05:20 marostegui: Starting s2 codfw failover from db2207 to db2204 - T369130
- 05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369130
- 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2204 with weight 0 T369130', diff saved to https://phabricator.wikimedia.org/P65693 and previous config saved to /var/cache/conftool/dbconfig/20240703-050647-root.json
- 05:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369130
- 05:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65692 and previous config saved to /var/cache/conftool/dbconfig/20240703-050523-root.json
- 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Pool with small weight T365805', diff saved to https://phabricator.wikimedia.org/P65691 and previous config saved to /var/cache/conftool/dbconfig/20240703-045109-marostegui.json
- 04:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65690 and previous config saved to /var/cache/conftool/dbconfig/20240703-045018-root.json
- 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65689 and previous config saved to /var/cache/conftool/dbconfig/20240703-043335-marostegui.json
- 04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65688 and previous config saved to /var/cache/conftool/dbconfig/20240703-043312-marostegui.json
- 04:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P65687 and previous config saved to /var/cache/conftool/dbconfig/20240703-041805-marostegui.json
- 04:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P65686 and previous config saved to /var/cache/conftool/dbconfig/20240703-040258-marostegui.json
- 03:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65685 and previous config saved to /var/cache/conftool/dbconfig/20240703-034751-marostegui.json
- 01:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65684 and previous config saved to /var/cache/conftool/dbconfig/20240703-011701-marostegui.json
- 01:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
- 01:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
- 00:48 eileen: civicrm upgraded from 6e03cff2 to 84d6f5d1
- 00:27 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
- 00:16 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
- 00:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 00:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 00:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65683 and previous config saved to /var/cache/conftool/dbconfig/20240703-000506-marostegui.json
2024-07-02
- 23:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P65682 and previous config saved to /var/cache/conftool/dbconfig/20240702-234959-marostegui.json
- 23:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P65681 and previous config saved to /var/cache/conftool/dbconfig/20240702-233452-marostegui.json
- 23:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65680 and previous config saved to /var/cache/conftool/dbconfig/20240702-231945-marostegui.json
- 22:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 22:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 22:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65679 and previous config saved to /var/cache/conftool/dbconfig/20240702-225835-marostegui.json
- 22:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P65678 and previous config saved to /var/cache/conftool/dbconfig/20240702-224328-marostegui.json
- 22:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P65677 and previous config saved to /var/cache/conftool/dbconfig/20240702-222820-marostegui.json
- 22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65676 and previous config saved to /var/cache/conftool/dbconfig/20240702-221312-marostegui.json
- 22:05 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
- 22:05 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
- 22:05 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
- 22:04 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
- 22:04 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 22:04 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 22:04 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 22:04 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 22:04 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 22:04 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 22:04 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
- 22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
- 22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
- 22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
- 22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
- 22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
- 22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
- 22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
- 22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
- 22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
- 22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
- 22:02 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
- 22:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
- 22:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
- 22:02 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 22:02 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 22:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 22:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 22:02 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 22:01 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 22:01 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 22:01 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 21:58 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 21:58 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 21:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 21:57 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 21:54 rzl@deploy1002: Finished scap: T369080 (duration: 04m 13s)
- 21:54 rzl@deploy1002: rzl: Continuing with sync
- 21:52 rzl@deploy1002: rzl: T369080 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:51 rzl@deploy1002: Started scap sync-world: T369080
- 21:26 eileen: civicrm upgraded from 08e568e4 to 6e03cff2
- 21:21 eileen: civicrm upgraded from 67bcfd72 to 08e568e4
- 20:50 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
- 20:49 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
- 20:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 20:45 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
- 20:39 cmooney@cumin1002: START - Cookbook sre.hosts.provision for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
- 20:35 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:35 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
- 20:34 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
- 20:33 urbanecm@deploy1002: Finished scap: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720) (duration: 11m 44s)
- 20:31 cmooney@cumin1002: START - Cookbook sre.dns.netbox
- 20:28 urbanecm@deploy1002: arlolra, urbanecm: Continuing with sync
- 20:25 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet
- 20:24 urbanecm@deploy1002: arlolra, urbanecm: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:21 urbanecm@deploy1002: Started scap sync-world: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720)
- 20:21 urbanecm@deploy1002: Finished scap: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292) (duration: 16m 31s)
- 20:16 urbanecm@deploy1002: jdlrobson, arlolra, urbanecm: Continuing with sync
- 20:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 20:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 20:07 urbanecm@deploy1002: jdlrobson, arlolra, urbanecm: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:04 urbanecm@deploy1002: Started scap sync-world: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292)
- 19:45 jhathaway: running another email inbound mx test on mx-in1001
- 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65675 and previous config saved to /var/cache/conftool/dbconfig/20240702-194027-marostegui.json
- 19:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 19:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65674 and previous config saved to /var/cache/conftool/dbconfig/20240702-194005-marostegui.json
- 19:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P65673 and previous config saved to /var/cache/conftool/dbconfig/20240702-192457-marostegui.json
- 19:21 eileen: civicrm upgraded from 64f23ed0 to 67bcfd72
- 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P65672 and previous config saved to /var/cache/conftool/dbconfig/20240702-190950-marostegui.json
- 18:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65671 and previous config saved to /var/cache/conftool/dbconfig/20240702-185443-marostegui.json
- 17:40 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 17:40 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 17:39 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 17:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 17:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 17:36 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
- 17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
- 17:20 jforrester@deploy1002: Finished scap: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010) (duration: 10m 06s)
- 17:15 jforrester@deploy1002: jforrester: Continuing with sync
- 17:14 jforrester@deploy1002: jforrester: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 17:10 jforrester@deploy1002: Started scap sync-world: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010)
- 17:07 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 17:07 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 17:07 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 17:06 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 17:06 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 17:06 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 17:06 mutante: lists1004 - sudo systemctl start wmf_auto_restart_exim4 (T369017)
- 16:54 ejegg: fundraising civicrm upgraded from 41c1bd78 to 64f23ed0
- 16:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2007.codfw.wmnet with OS bookworm
- 16:13 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
- 16:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
- 16:01 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
- 15:58 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1004.eqiad.wmnet
- 15:57 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
- 15:51 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-master1004.eqiad.wmnet
- 15:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
- 15:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
- 15:49 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
- 15:46 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
- 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
- 15:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 20:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
- 15:43 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2007.codfw.wmnet with OS bookworm
- 15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65670 and previous config saved to /var/cache/conftool/dbconfig/20240702-154127-marostegui.json
- 15:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 15:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65669 and previous config saved to /var/cache/conftool/dbconfig/20240702-154105-marostegui.json
- 15:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P65668 and previous config saved to /var/cache/conftool/dbconfig/20240702-152558-marostegui.json
- 15:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2007.codfw.wmnet with OS bookworm
- 15:12 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
- 15:12 elukey@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
- 15:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P65667 and previous config saved to /var/cache/conftool/dbconfig/20240702-151050-marostegui.json
- 15:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubetcd[2004-2006].codfw.wmnet
- 15:05 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:05 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
- 15:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
- 15:02 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
- 14:58 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
- 14:58 jiji@cumin1002: START - Cookbook sre.dns.netbox
- 14:58 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
- 14:55 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
- 14:55 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
- 14:55 fabfur: upgrading A:cp-esams to haproxy 2.8.10 (T367756)
- 14:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65666 and previous config saved to /var/cache/conftool/dbconfig/20240702-145542-marostegui.json
- 14:53 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
- 14:53 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
- 14:53 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
- 14:52 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
- 14:52 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
- 14:52 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
- 14:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
- 14:51 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubetcd[1004-1006].eqiad.wmnet
- 14:51 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:51 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
- 14:50 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
- 14:48 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
- 14:47 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubetcd[2004-2006].codfw.wmnet
- 14:45 jiji@cumin1002: START - Cookbook sre.dns.netbox
- 14:38 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2007.codfw.wmnet with OS bookworm
- 14:37 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubetcd[1004-1006].eqiad.wmnet
- 14:28 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1008.eqiad.wmnet
- 14:19 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1008.eqiad.wmnet
- 14:15 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1008.eqiad.wmnet with OS bullseye
- 14:12 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: decom
- 14:12 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: decom
- 14:11 jiji@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2 days, 0:00:00 on 6 hosts with reason: decom
- 14:11 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: decom
- 14:07 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 14:06 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=recdns
- 14:06 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
- 14:05 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
- 14:05 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 14:05 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 14:05 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
- 14:05 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
- 14:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
- 14:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
- 14:04 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org,service=recdns
- 14:04 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 14:03 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 14:03 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 14:03 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org
- 14:02 sukhe: restart anycast-hc on dns6001
- 14:01 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org
- 13:58 effie: decom old eqiad and codfw kubetcd hosts
- 13:46 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 13:44 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 13:44 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 13:43 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 13:42 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 13:42 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 13:41 brouberol@cumin1002: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
- 13:39 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
- 13:35 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2030.codfw.wmnet|wikikube-worker2031.codfw.wmnet|wikikube-worker2032.codfw.wmnet|wikikube-worker2033.codfw.wmnet|wikikube-worker2034.codfw.wmnet),cluster=kubernetes,service=kubesvc
- 13:35 claime: Pooling and uncordoning wikikube-worker2030.codfw.wmnet wikikube-worker2031.codfw.wmnet wikikube-worker2032.codfw.wmnet wikikube-worker2033.codfw.wmnet wikikube-worker2034.codfw.wmnet - T351074
- 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65665 and previous config saved to /var/cache/conftool/dbconfig/20240702-133100-marostegui.json
- 13:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 13:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 13:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65664 and previous config saved to /var/cache/conftool/dbconfig/20240702-133038-marostegui.json
- 13:30 Lucas_WMDE: UTC afternoon backport+config window done
- 13:27 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877) (duration: 10m 22s)
- 13:22 claime: homer 'cr*codfw*' commit 'T351074'
- 13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 sgimeno, jforrester, lucaswerkmeister-wmde: Continuing with sync
- 13:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubemaster[1001-1002].eqiad.wmnet
- 13:21 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:21 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
- 13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 sgimeno, jforrester, lucaswerkmeister-wmde: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:18 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
- 13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877)
- 13:16 jiji@cumin1002: START - Cookbook sre.dns.netbox
- 13:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P65663 and previous config saved to /var/cache/conftool/dbconfig/20240702-131531-marostegui.json
- 13:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable EntitySchema data type on Wikidata (T332157) (duration: 10m 54s)
- 13:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2032.codfw.wmnet with OS bullseye
- 13:09 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubemaster[1001-1002].eqiad.wmnet
- 13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
- 13:08 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
- 13:08 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
- 13:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Enable EntitySchema data type on Wikidata (T332157) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2033.codfw.wmnet with OS bullseye
- 13:03 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Enable EntitySchema data type on Wikidata (T332157)
- 13:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P65662 and previous config saved to /var/cache/conftool/dbconfig/20240702-130024-marostegui.json
- 12:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2034.codfw.wmnet with OS bullseye
- 12:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2031.codfw.wmnet with OS bullseye
- 12:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2030.codfw.wmnet with OS bullseye
- 12:55 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=kubemaster100[1-2].eqiad.wmnet
- 12:49 jiji@cumin1002: conftool action : set/pooled=no; selector: name=kubemaster100[1-2].eqiad.wmnet
- 12:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2032.codfw.wmnet with reason: host reimage
- 12:46 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubemaster[1001-1002].eqiad.wmnet with reason: decom
- 12:46 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubemaster[1001-1002].eqiad.wmnet with reason: decom
- 12:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
- 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65661 and previous config saved to /var/cache/conftool/dbconfig/20240702-124517-marostegui.json
- 12:44 effie: decom eqiad old kubemasters - T353464
- 12:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
- 12:41 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubernetes1051.eqiad.wmnet
- 12:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
- 12:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
- 12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2032.codfw.wmnet with reason: host reimage
- 12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
- 12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
- 12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
- 12:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
- 12:25 brouberol@cumin1002: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
- 12:25 marostegui: Deploy schema change on db2129 s6 codfw dbmaint T367856
- 12:25 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
- 12:24 jforrester@deploy1002: Finished scap: Backport for Reference widget: check for undefined config (T368736) (duration: 09m 59s)
- 12:19 jforrester@deploy1002: jforrester: Continuing with sync
- 12:19 jforrester@deploy1002: jforrester: Backport for Reference widget: check for undefined config (T368736) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 12:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2034.codfw.wmnet with OS bullseye
- 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2033.codfw.wmnet with OS bullseye
- 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2032.codfw.wmnet with OS bullseye
- 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2031.codfw.wmnet with OS bullseye
- 12:17 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2030.codfw.wmnet with OS bullseye
- 12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2393 to wikikube-worker2034
- 12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2034
- 12:17 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2034
- 12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2393 to wikikube-worker2034 - cgoubert@cumin1002"
- 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65660 and previous config saved to /var/cache/conftool/dbconfig/20240702-121638-root.json
- 12:16 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on lists1001.wikimedia.org with reason: Pre-decommissioning lists1001
- 12:16 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on lists1001.wikimedia.org with reason: Pre-decommissioning lists1001
- 12:16 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 12:15 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 12:15 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2393 to wikikube-worker2034 - cgoubert@cumin1002"
- 12:14 jforrester@deploy1002: Started scap sync-world: Backport for Reference widget: check for undefined config (T368736)
- 12:11 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 12:11 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2393 to wikikube-worker2034
- 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2392 to wikikube-worker2033
- 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2033
- 12:09 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2033
- 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2392 to wikikube-worker2033 - cgoubert@cumin1002"
- 12:09 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
- 12:08 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2392 to wikikube-worker2033 - cgoubert@cumin1002"
- 12:07 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
- 12:07 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
- 12:05 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 12:05 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2392 to wikikube-worker2033
- 12:05 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
- 12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2365 to wikikube-worker2032
- 12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2032
- 12:03 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2032
- 12:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2365 to wikikube-worker2032 - cgoubert@cumin1002"
- 12:01 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2365 to wikikube-worker2032 - cgoubert@cumin1002"
- 12:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65659 and previous config saved to /var/cache/conftool/dbconfig/20240702-120133-root.json
- 12:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 12:00 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 12:00 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 11:59 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 11:59 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2365 to wikikube-worker2032
- 11:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2309 to wikikube-worker2031
- 11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2031
- 11:58 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2031
- 11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2309 to wikikube-worker2031 - cgoubert@cumin1002"
- 11:58 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 11:58 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
- 11:57 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2309 to wikikube-worker2031 - cgoubert@cumin1002"
- 11:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 11:55 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2309 to wikikube-worker2031
- 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2307 to wikikube-worker2030
- 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2030
- 11:52 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2030
- 11:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2307 to wikikube-worker2030 - cgoubert@cumin1002"
- 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65658 and previous config saved to /var/cache/conftool/dbconfig/20240702-115026-marostegui.json
- 11:50 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2307 to wikikube-worker2030 - cgoubert@cumin1002"
- 11:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 11:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65657 and previous config saved to /var/cache/conftool/dbconfig/20240702-115003-marostegui.json
- 11:48 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye
- 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65656 and previous config saved to /var/cache/conftool/dbconfig/20240702-114627-root.json
- 11:44 root@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1008.eqiad.wmnet with OS bullseye
- 11:43 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
- 11:43 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2307 to wikikube-worker2030
- 11:37 brouberol@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
- 11:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Long schema change
- 11:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Long schema change
- 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P65655 and previous config saved to /var/cache/conftool/dbconfig/20240702-113457-marostegui.json
- 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65654 and previous config saved to /var/cache/conftool/dbconfig/20240702-113122-root.json
- 11:27 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
- 11:26 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1003.eqiad.wmnet
- 11:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2129 T369021', diff saved to https://phabricator.wikimedia.org/P65653 and previous config saved to /var/cache/conftool/dbconfig/20240702-112616-root.json
- 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2214 to s6 primary T369021', diff saved to https://phabricator.wikimedia.org/P65652 and previous config saved to /var/cache/conftool/dbconfig/20240702-112518-marostegui.json
- 11:24 marostegui: Starting s6 codfw failover from db2129 to db2214 - T369021
- 11:24 jayme: switched wikikube production clusters from PSP to PSS for restricted namespaces - T273507
- 11:23 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 11:22 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host eventlog1003.eqiad.wmnet
- 11:22 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:22 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
- 11:22 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
- 11:21 jayme@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubernetes1051.eqiad.wmnet
- 11:21 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 11:21 claime: Uncordoning wikikube-ctrl2001.codfw.wmnet and wikikube-ctrl2002.codfw.wmnet
- 11:20 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P65651 and previous config saved to /var/cache/conftool/dbconfig/20240702-111949-marostegui.json
- 11:17 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye
- 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65650 and previous config saved to /var/cache/conftool/dbconfig/20240702-111616-root.json
- 11:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
- 11:12 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2025.codfw.wmnet|wikikube-worker2026.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet),cluster=kubernetes,service=kubesvc
- 11:12 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
- 11:12 claime: pooling and uncordoning wikikube-worker2025.codfw.wmnet|wikikube-worker2026.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet - T351074
- 11:11 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubemaster[2001-2002].codfw.wmnet
- 11:11 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:11 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
- 11:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369021
- 11:07 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2214 with weight 0 T369021', diff saved to https://phabricator.wikimedia.org/P65649 and previous config saved to /var/cache/conftool/dbconfig/20240702-110750-root.json
- 11:07 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
- 11:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369021
- 11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65648 and previous config saved to /var/cache/conftool/dbconfig/20240702-110442-marostegui.json
- 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65647 and previous config saved to /var/cache/conftool/dbconfig/20240702-110111-root.json
- 10:56 jiji@cumin1002: START - Cookbook sre.dns.netbox
- 10:50 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubemaster[2001-2002].codfw.wmnet
- 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65646 and previous config saved to /var/cache/conftool/dbconfig/20240702-104605-root.json
- 10:42 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 10:42 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 10:42 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 10:41 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 10:35 brouberol@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
- 10:34 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1003.eqiad.wmnet
- 10:32 brouberol@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
- 10:28 fabfur: upgrading A:cp-eqiad to haproxy 2.8.10 (T367756)
- 10:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
- 10:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
- 10:25 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-master1003.eqiad.wmnet
- 10:06 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65645 and previous config saved to /var/cache/conftool/dbconfig/20240702-100636-jynus.json
- 10:02 claime: homer 'cr*codfw*' commit 'T351074'
- 09:53 jiji@cumin1002: conftool action : set/pooled=no; selector: name=kubemaster200[1-2].codfw.wmnet
- 09:52 elukey: volatile dir on puppetserver1001 with the new point release (12.6) for Bookworm
- 09:48 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubemaster[2001-2002].codfw.wmnet with reason: decom
- 09:47 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubemaster[2001-2002].codfw.wmnet with reason: decom
- 09:20 brouberol@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
- 09:15 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 50% weight T363812', diff saved to https://phabricator.wikimedia.org/P65644 and previous config saved to /var/cache/conftool/dbconfig/20240702-091508-jynus.json
- 08:57 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 10% weight T363812', diff saved to https://phabricator.wikimedia.org/P65643 and previous config saved to /var/cache/conftool/dbconfig/20240702-085733-jynus.json
- 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65642 and previous config saved to /var/cache/conftool/dbconfig/20240702-084447-marostegui.json
- 08:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 08:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65641 and previous config saved to /var/cache/conftool/dbconfig/20240702-084425-marostegui.json
- 08:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp6009.*} and A:cp
- 08:38 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp6009.*} and A:cp
- 08:36 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru
- 08:34 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.12 refs T366957
- 08:34 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru
- 08:30 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=kubernetes1051.eqiad.wmnet
- 08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P65640 and previous config saved to /var/cache/conftool/dbconfig/20240702-082918-marostegui.json
- 08:22 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2031.*} and A:cp
- 08:20 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2031.*} and A:cp
- 08:17 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2030.*} and A:cp
- 08:16 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 08:15 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2030.*} and A:cp
- 08:15 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 08:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2028.*} and A:cp
- 08:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P65639 and previous config saved to /var/cache/conftool/dbconfig/20240702-081411-marostegui.json
- 08:13 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2028.*} and A:cp
- 08:12 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2027.*} and A:cp
- 08:11 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2027.*} and A:cp
- 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65638 and previous config saved to /var/cache/conftool/dbconfig/20240702-081025-marostegui.json
- 08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 08:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
- 08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 08:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65637 and previous config saved to /var/cache/conftool/dbconfig/20240702-080948-marostegui.json
- 08:07 jayme: draining kubernetes1051.eqiad.wmnet
- 08:07 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru
- 08:06 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru
- 08:01 jayme: cordon kubernetes1051.eqiad.wmnet because of several failed image pulls
- 07:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65635 and previous config saved to /var/cache/conftool/dbconfig/20240702-075904-marostegui.json
- 07:58 kharlan@deploy1002: Finished scap: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459) (duration: 41m 45s)
- 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P65634 and previous config saved to /var/cache/conftool/dbconfig/20240702-075440-marostegui.json
- 07:52 kharlan@deploy1002: kharlan: Continuing with sync
- 07:51 kharlan@deploy1002: kharlan: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P65633 and previous config saved to /var/cache/conftool/dbconfig/20240702-073933-marostegui.json
- 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65632 and previous config saved to /var/cache/conftool/dbconfig/20240702-072426-marostegui.json
- 07:16 kharlan@deploy1002: Started scap sync-world: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459)
- 07:06 kharlan@deploy1002: Started scap sync-world: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459)
- 07:01 oblivian@deploy1002: Finished scap: Rebuilding images for change to the base image for httpd (duration: 26m 52s)
- 06:59 XioNoX: update netboot bookworm image to pickup new point release
- 06:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65631 and previous config saved to /var/cache/conftool/dbconfig/20240702-065831-root.json
- 06:35 oblivian@deploy1002: Started scap sync-world: Rebuilding images for change to the base image for httpd
- 06:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65629 and previous config saved to /var/cache/conftool/dbconfig/20240702-062820-root.json
- 06:21 _joe_: rebuilding httpd-fcgi, mediawiki-httpd images T363342 T368640
- 06:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65628 and previous config saved to /var/cache/conftool/dbconfig/20240702-061315-root.json
- 05:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65627 and previous config saved to /var/cache/conftool/dbconfig/20240702-055809-root.json
- 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65626 and previous config saved to /var/cache/conftool/dbconfig/20240702-054304-root.json
- 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65625 and previous config saved to /var/cache/conftool/dbconfig/20240702-052759-root.json
- 05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1192 T368371', diff saved to https://phabricator.wikimedia.org/P65624 and previous config saved to /var/cache/conftool/dbconfig/20240702-052543-root.json
- 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1209 to s8 primary and set section read-write T368371', diff saved to https://phabricator.wikimedia.org/P65623 and previous config saved to /var/cache/conftool/dbconfig/20240702-052447-marostegui.json
- 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T368371', diff saved to https://phabricator.wikimedia.org/P65622 and previous config saved to /var/cache/conftool/dbconfig/20240702-052408-marostegui.json
- 05:23 marostegui: Starting s8 eqiad failover from db1192 to db1209 - T368371
- 04:59 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1209 remove from API T368371', diff saved to https://phabricator.wikimedia.org/P65621 and previous config saved to /var/cache/conftool/dbconfig/20240702-045929-marostegui.json
- 04:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368371
- 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1209 with weight 0 T368371', diff saved to https://phabricator.wikimedia.org/P65620 and previous config saved to /var/cache/conftool/dbconfig/20240702-045856-marostegui.json
- 04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368371
- 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65619 and previous config saved to /var/cache/conftool/dbconfig/20240702-043349-marostegui.json
- 04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65618 and previous config saved to /var/cache/conftool/dbconfig/20240702-043326-marostegui.json
- 04:22 eileen: civicrm upgraded from f6af6380 to 41c1bd78
- 04:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P65617 and previous config saved to /var/cache/conftool/dbconfig/20240702-041819-marostegui.json
- 04:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65616 and previous config saved to /var/cache/conftool/dbconfig/20240702-040705-marostegui.json
- 04:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 04:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 04:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65615 and previous config saved to /var/cache/conftool/dbconfig/20240702-040643-marostegui.json
- 04:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P65614 and previous config saved to /var/cache/conftool/dbconfig/20240702-040312-marostegui.json
- 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.9 (duration: 01m 02s)
- 03:54 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.12 refs T366957 (duration: 51m 33s)
- 03:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P65613 and previous config saved to /var/cache/conftool/dbconfig/20240702-035135-marostegui.json
- 03:51 eileen: civicrm upgraded from 52dc4f1d to f6af6380
- 03:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65612 and previous config saved to /var/cache/conftool/dbconfig/20240702-034805-marostegui.json
- 03:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P65611 and previous config saved to /var/cache/conftool/dbconfig/20240702-033628-marostegui.json
- 03:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65610 and previous config saved to /var/cache/conftool/dbconfig/20240702-032121-marostegui.json
- 03:03 mwpresync@deploy1002: Started scap sync-world: testwikis wikis to 1.43.0-wmf.12 refs T366957
- 00:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65609 and previous config saved to /var/cache/conftool/dbconfig/20240702-004524-marostegui.json
- 00:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 00:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 00:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65608 and previous config saved to /var/cache/conftool/dbconfig/20240702-004502-marostegui.json
- 00:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P65607 and previous config saved to /var/cache/conftool/dbconfig/20240702-002955-marostegui.json
- 00:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1038.eqiad.wmnet with OS bullseye
- 00:16 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 00:15 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 00:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P65606 and previous config saved to /var/cache/conftool/dbconfig/20240702-001448-marostegui.json
- 00:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
- 00:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 00:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
2024-07-01
- 23:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65605 and previous config saved to /var/cache/conftool/dbconfig/20240701-235941-marostegui.json
- 23:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
- 23:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1036.eqiad.wmnet with OS bullseye
- 23:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
- 23:54 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
- 23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
- 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
- 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:39 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 23:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
- 23:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
- 23:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
- 23:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1038.eqiad.wmnet with OS bullseye
- 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
- 23:19 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
- 23:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bullseye
- 23:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
- 22:54 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1038
- 22:54 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1038
- 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1041.eqiad.wmnet with OS bullseye
- 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 22:10 sbassett@deploy1002: Synchronized private/PrivateSettings.php: Un-deployed a PS.php mitigation for T341908 (duration: 07m 24s)
- 21:59 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1089*,elastic1090*,elastic1104* for T348977 - bking@cumin2002
- 21:59 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1089*,elastic1090*,elastic1104* for T348977 - bking@cumin2002
- 21:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1089-1090,1104].eqiad.wmnet with reason: T348977
- 21:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1089-1090,1104].eqiad.wmnet with reason: T348977
- 21:55 maryum: deployed patch for T366991
- 21:39 eileen: civicrm upgraded from f8b1f5c4 to 52dc4f1d
- 21:39 eileen: tools upgraded from c51f6e62 to 95f10b20
- 21:32 zabe: zabe@mwmaint1002:/tmp/upload$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Yann . # T368703
- 21:24 cjming: end of UTC late backport window
- 21:23 cjming@deploy1002: Finished scap: Backport for extension-list: Add Metrics Platform (T366234) (duration: 28m 16s)
- 21:16 cjming@deploy1002: cjming: Continuing with sync
- 21:16 cjming@deploy1002: cjming: Backport for extension-list: Add Metrics Platform (T366234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65604 and previous config saved to /var/cache/conftool/dbconfig/20240701-210534-marostegui.json
- 21:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 21:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65603 and previous config saved to /var/cache/conftool/dbconfig/20240701-210512-marostegui.json
- 21:04 ejegg: fundraising civicrm upgraded from f9782670 to f8b1f5c4
- 20:55 cjming@deploy1002: Started scap sync-world: Backport for extension-list: Add Metrics Platform (T366234)
- 20:53 cjming@deploy1002: Finished scap: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915) (duration: 09m 03s)
- 20:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P65602 and previous config saved to /var/cache/conftool/dbconfig/20240701-205003-marostegui.json
- 20:47 cjming@deploy1002: cjming, pppery: Continuing with sync
- 20:47 cjming@deploy1002: cjming, pppery: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:44 cjming@deploy1002: Started scap sync-world: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915)
- 20:42 cjming@deploy1002: Finished scap: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483) (duration: 10m 39s)
- 20:36 cjming@deploy1002: cjming, jdlrobson: Continuing with sync
- 20:35 ejegg: standalone SmashPig upgraded from c8993ec6 to 565c61e4
- 20:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P65601 and previous config saved to /var/cache/conftool/dbconfig/20240701-203456-marostegui.json
- 20:34 cjming@deploy1002: cjming, jdlrobson: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:31 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483)
- 20:30 cjming@deploy1002: Sync cancelled.
- 20:28 cjming@deploy1002: jdlrobson, cjming: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:26 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151)
- 20:23 cjming@deploy1002: Sync cancelled.
- 20:23 cjming@deploy1002: jdlrobson, cjming: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65600 and previous config saved to /var/cache/conftool/dbconfig/20240701-201949-marostegui.json
- 20:03 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151)
- 19:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 19:19 dancy@deploy1002: Installation of scap version "4.91.0" completed for 233 hosts
- 19:19 dancy@deploy1002: Installing scap version "4.91.0" for 233 hosts
- 19:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
- 19:15 dancy@deploy1002: Installing scap version "4.91.0" for 234 hosts
- 19:14 dancy@deploy1002: Installing scap version "4.91.0" for 234 hosts
- 19:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
- 18:57 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
- 18:56 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
- 18:56 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
- 17:49 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
- 17:49 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
- 17:49 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
- 17:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
- 17:45 jclark@cumin1002: START - Cookbook sre.dns.netbox
- 17:44 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
- 17:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
- 17:42 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
- 17:42 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
- 17:41 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
- 17:41 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
- 17:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1041.eqiad.wmnet with OS bullseye
- 17:36 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: reboot ssw1-d8-codfw
- 17:35 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: reboot ssw1-d8-codfw
- 17:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
- 17:27 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 17:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65599 and previous config saved to /var/cache/conftool/dbconfig/20240701-171609-marostegui.json
- 17:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 17:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 17:08 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 17:08 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 17:05 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 17:04 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 16:51 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 16:51 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 16:51 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 16:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
- 16:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
- 16:34 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
- 16:34 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
- 16:33 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
- 16:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65598 and previous config saved to /var/cache/conftool/dbconfig/20240701-163010-marostegui.json
- 16:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 16:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 16:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65597 and previous config saved to /var/cache/conftool/dbconfig/20240701-162948-marostegui.json
- 16:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
- 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1039
- 16:20 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1039
- 16:18 urandom: restarting Cassandra —restbase2023-{a,b,c}— troubleshooting storage utilization
- 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
- 16:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P65596 and previous config saved to /var/cache/conftool/dbconfig/20240701-161441-marostegui.json
- 16:11 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
- 16:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
- 15:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P65595 and previous config saved to /var/cache/conftool/dbconfig/20240701-155934-marostegui.json
- 15:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65594 and previous config saved to /var/cache/conftool/dbconfig/20240701-154427-marostegui.json
- 15:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65593 and previous config saved to /var/cache/conftool/dbconfig/20240701-153758-root.json
- 15:37 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:32 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:25 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
- 15:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65592 and previous config saved to /var/cache/conftool/dbconfig/20240701-152253-root.json
- 15:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:22 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
- 15:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:16 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
- 15:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 15:14 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2026.codfw.wmnet with OS bullseye
- 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65591 and previous config saved to /var/cache/conftool/dbconfig/20240701-150747-root.json
- 15:07 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 15:07 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
- 15:05 akosiaris: reboot deploy1003 T364416
- 15:04 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
- 15:03 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
- 14:57 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 14:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 14:55 claime: deploying statsd-exporter for mw-web - T365265
- 14:54 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
- 14:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 14:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65590 and previous config saved to /var/cache/conftool/dbconfig/20240701-145242-root.json
- 14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
- 14:49 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
- 14:48 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
- 14:48 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
- 14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
- 14:44 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
- 14:44 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 14:43 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 14:40 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 14:40 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 14:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65589 and previous config saved to /var/cache/conftool/dbconfig/20240701-143736-root.json
- 14:36 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
- 14:36 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
- 14:35 fabfur: upgrading A:cp-codfw to haproxy 2.8.10 (T367756)
- 14:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
- 14:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
- 14:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
- 14:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65587 and previous config saved to /var/cache/conftool/dbconfig/20240701-142231-root.json
- 14:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 14:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 14:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65586 and previous config saved to /var/cache/conftool/dbconfig/20240701-141640-marostegui.json
- 14:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
- 14:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65585 and previous config saved to /var/cache/conftool/dbconfig/20240701-140725-root.json
- 14:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
- 14:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P65584 and previous config saved to /var/cache/conftool/dbconfig/20240701-140133-marostegui.json
- 13:57 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 13:56 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 13:48 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 13:48 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 13:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P65583 and previous config saved to /var/cache/conftool/dbconfig/20240701-134626-marostegui.json
- 13:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
- 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1040
- 13:41 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1040
- 13:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2026.codfw.wmnet with OS bullseye
- 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65581 and previous config saved to /var/cache/conftool/dbconfig/20240701-133118-marostegui.json
- 13:30 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
- 13:30 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
- 13:30 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: sync
- 13:29 elukey@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: sync
- 13:29 urbanecm: mwmaint1002: [urbanecm@mwmaint1002 ~]$ foreachwiki DiscussionTools:FixTrailingWhitespaceIds (T356196)
- 13:27 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
- 13:27 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
- 13:26 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:26 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 13:26 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:26 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:25 urbanecm@deploy1002: Finished scap: Backport for FixTrailingWhitespaceIds: Don't crash on complex conflicts (T356196) (duration: 08m 46s)
- 13:21 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:19 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru
- 13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
- 13:17 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru
- 13:16 urbanecm@deploy1002: Started scap: Backport for FixTrailingWhitespaceIds: Don't crash on complex conflicts (T356196)
- 13:16 urbanecm@deploy1002: Finished scap: Backport for Update interwiki map (T368862) (duration: 09m 01s)
- 13:14 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
- 13:10 urbanecm@deploy1002: urbanecm: Continuing with sync
- 13:10 urbanecm@deploy1002: urbanecm: Backport for Update interwiki map (T368862) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 13:07 urbanecm@deploy1002: Started scap: Backport for Update interwiki map (T368862)
- 12:56 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 12:56 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 12:56 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 12:55 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 12:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
- 12:54 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2026.codfw.wmnet with OS bullseye
- 12:51 claime: Running update-netboot-image bullseye for 11.10 release on puppetserver1001
- 12:49 fabfur: upgrading A:cp-magru to haproxy 2.8.10 (T367756)
- 12:49 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru
- 12:49 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru
- 12:39 claime: Running update-netboot-image bullseye for 11.10 release
- 12:35 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 12:35 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 12:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
- 12:35 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 12:35 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 12:35 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:35 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 12:34 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:32 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
- 12:32 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:32 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:32 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:31 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 12:31 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 12:30 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 12:28 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 12:27 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 12:23 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 12:22 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 12:21 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 12:21 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 12:20 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 12:19 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 12:18 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 12:17 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:16 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 12:14 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 12:12 daniel@deploy1002: Finished scap: Backport for REST: detect mismatching value types in json request (T305973) (duration: 32m 48s)
- 12:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 12:08 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 12:06 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 12:04 daniel@deploy1002: daniel: Continuing with sync
- 12:03 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 12:01 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
- 12:01 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2026.codfw.wmnet with OS bullseye
- 12:00 daniel@deploy1002: daniel: Backport for REST: detect mismatching value types in json request (T305973) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 11:58 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 11:51 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
- 11:49 klausman@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 11:46 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 11:45 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 11:45 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 11:43 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging FebinBellamy out of all services on: 2188 hosts
- 11:43 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 11:43 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging FebinBellamy out of all services on: 2188 hosts
- 11:41 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 2188 hosts
- 11:41 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 2188 hosts
- 11:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
- 11:39 daniel@deploy1002: Started scap: Backport for REST: detect mismatching value types in json request (T305973)
- 11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 11:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
- 11:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
- 11:30 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
- 11:29 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
- 11:27 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
- 11:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
- 10:57 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
- 10:49 claime: running /usr/local/bin/apply-config-kartotherian on maps-master
- 10:47 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
- 10:47 claime: running /usr/local/bin/apply-config-kartotherian on maps-replica
- 10:46 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:46 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 10:43 claime: running puppet on maps servers
- 10:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
- 10:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
- 10:38 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:37 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
- 10:37 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
- 10:37 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
- 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65580 and previous config saved to /var/cache/conftool/dbconfig/20240701-102633-marostegui.json
- 10:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 10:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
- 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65579 and previous config saved to /var/cache/conftool/dbconfig/20240701-102611-marostegui.json
- 10:23 fabfur: upgrading A:cp-drmrs to haproxy 2.8.10 (T367756)
- 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P65578 and previous config saved to /var/cache/conftool/dbconfig/20240701-101104-marostegui.json
- 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P65577 and previous config saved to /var/cache/conftool/dbconfig/20240701-095557-marostegui.json
- 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65576 and previous config saved to /var/cache/conftool/dbconfig/20240701-094547-root.json
- 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65575 and previous config saved to /var/cache/conftool/dbconfig/20240701-094341-root.json
- 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65574 and previous config saved to /var/cache/conftool/dbconfig/20240701-094050-marostegui.json
- 09:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65573 and previous config saved to /var/cache/conftool/dbconfig/20240701-093042-root.json
- 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65572 and previous config saved to /var/cache/conftool/dbconfig/20240701-092835-root.json
- 09:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 09:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 09:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65570 and previous config saved to /var/cache/conftool/dbconfig/20240701-091536-root.json
- 09:14 urbanecm@deploy1002: Finished scap: Backport for JsonSchemaValidator: Measure duration (T365245) (duration: 22m 15s)
- 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65569 and previous config saved to /var/cache/conftool/dbconfig/20240701-091329-root.json
- 09:06 urbanecm@deploy1002: urbanecm: Continuing with sync
- 09:06 urbanecm@deploy1002: urbanecm: Backport for JsonSchemaValidator: Measure duration (T365245) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
- 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65568 and previous config saved to /var/cache/conftool/dbconfig/20240701-090031-root.json
- 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65567 and previous config saved to /var/cache/conftool/dbconfig/20240701-085824-root.json
- 08:51 urbanecm@deploy1002: Started scap: Backport for JsonSchemaValidator: Measure duration (T365245)
- 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65566 and previous config saved to /var/cache/conftool/dbconfig/20240701-084525-root.json
- 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65565 and previous config saved to /var/cache/conftool/dbconfig/20240701-084318-root.json
- 08:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65564 and previous config saved to /var/cache/conftool/dbconfig/20240701-083020-root.json
- 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65563 and previous config saved to /var/cache/conftool/dbconfig/20240701-082813-root.json
- 08:18 jynus@cumin1002: dbctl commit (dc=all): 'Depool es1025 for backups T363812', diff saved to https://phabricator.wikimedia.org/P65562 and previous config saved to /var/cache/conftool/dbconfig/20240701-081811-jynus.json
- 08:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65561 and previous config saved to /var/cache/conftool/dbconfig/20240701-081514-root.json
- 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65560 and previous config saved to /var/cache/conftool/dbconfig/20240701-081307-root.json
- 08:07 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1169.eqiad.wmnet onto db1195.eqiad.wmnet
- 07:44 elukey: `apt-get clean` on buil2001 to free some space in the root partition
- 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Place db1195 in s1 T368871', diff saved to https://phabricator.wikimedia.org/P65559 and previous config saved to /var/cache/conftool/dbconfig/20240701-070243-marostegui.json
- 06:36 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1169.eqiad.wmnet onto db1195.eqiad.wmnet
- 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169 T368871', diff saved to https://phabricator.wikimedia.org/P65558 and previous config saved to /var/cache/conftool/dbconfig/20240701-063601-root.json
- 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65557 and previous config saved to /var/cache/conftool/dbconfig/20240701-063344-marostegui.json
- 06:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 06:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
- 05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: Reboot
- 05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: Reboot
- 04:56 marostegui: Failover m2 from db1195 to db1228 - T368494
- 04:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2133,2160].codfw.wmnet,db[1195,1217,1228].eqiad.wmnet with reason: m2 switchover T368494
- 04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2133,2160].codfw.wmnet,db[1195,1217,1228].eqiad.wmnet with reason: m2 switchover T368494
- 04:50 marostegui: dbmaint eqiad Rebuild pagelinks table on s8 master T364069
- 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65556 and previous config saved to /var/cache/conftool/dbconfig/20240701-044945-marostegui.json
- 04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
- 04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
Other archives
2000s
- Archive 1: 2004 Jun - 2004 Sep
- Archive 2: 2004 Oct - 2004 Nov
- Archive 3: 2004 Dec - 2005 Mar
- Archive 4: 2005 Apr - 2005 Jul
- Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
- Archive 6: 2005 Nov - 2006 Feb
- Archive 7: 2006 Mar - 2006 Jun
- Archive 8: 2006 Jul - 2006 Sep
- Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
- Archive 10: 2007 Feb - 2007 Jun
- Archive 11: 2007 Jul - 2007 Dec
- Archive 12: 2008 Jan - 2008 Jul
- Archive 12a: 2008 Aug
- Archive 12b: 2008 Sept
- Archive 13: 2008 Oct - 2009 Jun
- Archive 14: 2009 Jun - 2009 Dec
2010s
- Archive 15: 2010 Jan - 2010 Jun
- Archive 16: 2010 Jul - 2010 Oct
- Archive 17: 2010 Nov - 2010 Dec
- Archive 18: 2011 Jan - 2011 Jun
- Archive 19: 2011 Jul - 2011 Dec
- Archive 20: 2011 Dec - 2012 Jun, with revision history 2007-02-21 to 2012-03-27
- Archive 21: 2012 Jul - 2013 Jan
- Archive 22: 2013 Jan - 2013 Jul
- Archive 23: 2013 Aug - 2013 Dec
- Archive 24: 2014 Jan - 2014 Mar
- Archive 25: 2014 April - 2014 September
- Archive 26: 2014 October - 2014 December
- Archive 27: 2015 January - 2015 July
- Archive 28: 2015 August - 2015 December
- Archive 29: 2016 January - 2016 May
- Archive 30: 2016 June - 2016 August
- Archive 31: 2016 September - 2016 December
- Archive 32: 2017 January - 2017 July
- Archive 33: 2017 August - 2017 December
- Archive 34: 2018 January - 2018 April
- Archive 35: 2018 May - 2018 August
- Archive 36: 2018 September - 2018 December
- Archive 37: 2019 January - 2019 April
- Archive 38: 2019 May - 2019 August
- Archive 39: 2019 September - 2019 December
2020s
- Archive 40: 2020 January - 2020 April
- Archive 41: 2020 May - 2020 July
- Archive 42: 2020 August - 2020 November
- Archive 43: 2020 December
- Archive 44: 2021 January - 2021 April
- Archive 45: 2021 May - 2021 July
- Archive 46: 2021 August - 2021 October
- Archive 47: 2021 November - 2021 December
- Archive 48: 2022 January
- Archive 49: 2022 February
- Archive 50: 2022 March
- Archive 51: 2022 April 1-15
- Archive 52: 2022 April 16-30
- Archive 53: 2022 May
- Archive 54: 2022 June
- Archive 55: 2022 July
- Archive 56: 2022 August
- Archive 57: 2022 September
- Archive 58: 2022 October
- Archive 59: 2022 November 1-15
- Archive 60: 2022 November 16-30
- Archive 61: 2022 December
- Archive 62: 2023 January
- Archive 63: 2023 February
- Archive 64: 2023 March
- Archive 65: 2023 April
- Archive 66: 2023 May
- Archive 67: 2023 June
- Archive 68: 2023 July
- Archive 69: 2023 August 1-15
- Archive 70: 2023 August 16-31
- Archive 71: 2023 September
- Archive 72: 2023 October
- Archive 73: 2023 November
- Archive 74: 2023 December
- Archive 75: 2024 January
- Archive 76: 2024 February
- Archive 77: 2024 March
- Archive 78: 2024 April
- Archive 79: 2024 May 1-15
- Archive 80: 2024 May 16-31
- Archive 81: 2024 June 1-15
- Archive 82: 2024 June 16-30
- Archive 83: 2024 July
- Archive 84: 2024 August
- Archive 85: 2024 September
- Archive 86: 2024 October
- Archive 87: 2024 November