Server Admin Log/Archive 68

2023-07-31

23:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T342617)', diff saved to https://phabricator.wikimedia.org/P49860 and previous config saved to /var/cache/conftool/dbconfig/20230731-235442-ladsgroup.json
23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T342617)', diff saved to https://phabricator.wikimedia.org/P49859 and previous config saved to /var/cache/conftool/dbconfig/20230731-233039-ladsgroup.json
23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T342617)', diff saved to https://phabricator.wikimedia.org/P49858 and previous config saved to /var/cache/conftool/dbconfig/20230731-233018-ladsgroup.json
23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P49857 and previous config saved to /var/cache/conftool/dbconfig/20230731-231512-ladsgroup.json
23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P49856 and previous config saved to /var/cache/conftool/dbconfig/20230731-230006-ladsgroup.json
22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T342617)', diff saved to https://phabricator.wikimedia.org/P49855 and previous config saved to /var/cache/conftool/dbconfig/20230731-224500-ladsgroup.json
22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T342617)', diff saved to https://phabricator.wikimedia.org/P49854 and previous config saved to /var/cache/conftool/dbconfig/20230731-223547-ladsgroup.json
22:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
22:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T342617)', diff saved to https://phabricator.wikimedia.org/P49853 and previous config saved to /var/cache/conftool/dbconfig/20230731-223526-ladsgroup.json
22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P49852 and previous config saved to /var/cache/conftool/dbconfig/20230731-222020-ladsgroup.json
22:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P49851 and previous config saved to /var/cache/conftool/dbconfig/20230731-220514-ladsgroup.json
21:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T342617)', diff saved to https://phabricator.wikimedia.org/P49850 and previous config saved to /var/cache/conftool/dbconfig/20230731-215008-ladsgroup.json
21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T342617)', diff saved to https://phabricator.wikimedia.org/P49849 and previous config saved to /var/cache/conftool/dbconfig/20230731-213017-ladsgroup.json
21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
21:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T342617)', diff saved to https://phabricator.wikimedia.org/P49848 and previous config saved to /var/cache/conftool/dbconfig/20230731-212941-ladsgroup.json
21:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P49847 and previous config saved to /var/cache/conftool/dbconfig/20230731-211435-ladsgroup.json
20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P49846 and previous config saved to /var/cache/conftool/dbconfig/20230731-205928-ladsgroup.json
20:45 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T342617)', diff saved to https://phabricator.wikimedia.org/P49845 and previous config saved to /var/cache/conftool/dbconfig/20230731-204422-ladsgroup.json
20:37 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T342617)', diff saved to https://phabricator.wikimedia.org/P49844 and previous config saved to /var/cache/conftool/dbconfig/20230731-203451-ladsgroup.json
20:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
20:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
20:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
20:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T342617)', diff saved to https://phabricator.wikimedia.org/P49843 and previous config saved to /var/cache/conftool/dbconfig/20230731-203413-ladsgroup.json
20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P49842 and previous config saved to /var/cache/conftool/dbconfig/20230731-201907-ladsgroup.json
20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P49841 and previous config saved to /var/cache/conftool/dbconfig/20230731-200401-ladsgroup.json
19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T342617)', diff saved to https://phabricator.wikimedia.org/P49840 and previous config saved to /var/cache/conftool/dbconfig/20230731-194854-ladsgroup.json
19:03 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@47f9458]: (no justification provided) (duration: 00m 16s)
19:03 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@47f9458]: (no justification provided)
18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T342617)', diff saved to https://phabricator.wikimedia.org/P49839 and previous config saved to /var/cache/conftool/dbconfig/20230731-184200-ladsgroup.json
18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T342617)', diff saved to https://phabricator.wikimedia.org/P49838 and previous config saved to /var/cache/conftool/dbconfig/20230731-184140-ladsgroup.json
18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P49837 and previous config saved to /var/cache/conftool/dbconfig/20230731-182633-ladsgroup.json
18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T342617)', diff saved to https://phabricator.wikimedia.org/P49836 and previous config saved to /var/cache/conftool/dbconfig/20230731-182114-ladsgroup.json
18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P49835 and previous config saved to /var/cache/conftool/dbconfig/20230731-181127-ladsgroup.json
18:00 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
17:59 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
17:59 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
17:57 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
17:57 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
17:56 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T342617)', diff saved to https://phabricator.wikimedia.org/P49834 and previous config saved to /var/cache/conftool/dbconfig/20230731-175621-ladsgroup.json
17:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1108.eqiad.wmnet
17:04 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:04 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1108.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
17:02 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1108.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
17:00 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
17:00 btullis@cumin1001: Added views for new wiki: gpewiki T338678
16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T342617)', diff saved to https://phabricator.wikimedia.org/P49833 and previous config saved to /var/cache/conftool/dbconfig/20230731-164759-ladsgroup.json
16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
16:47 btullis@cumin1001: START - Cookbook sre.dns.netbox
16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T342617)', diff saved to https://phabricator.wikimedia.org/P49832 and previous config saved to /var/cache/conftool/dbconfig/20230731-164738-ladsgroup.json
16:42 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1108.eqiad.wmnet
16:34 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P49831 and previous config saved to /var/cache/conftool/dbconfig/20230731-163232-ladsgroup.json
16:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-airflow1003.eqiad.wmnet
16:30 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:30 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
16:28 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
16:25 btullis@cumin1001: START - Cookbook sre.dns.netbox
16:19 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-airflow1003.eqiad.wmnet
16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P49830 and previous config saved to /var/cache/conftool/dbconfig/20230731-161726-ladsgroup.json
16:08 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 24s)
16:07 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
16:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
16:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49829 and previous config saved to /var/cache/conftool/dbconfig/20230731-160500-ladsgroup.json
16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T342617)', diff saved to https://phabricator.wikimedia.org/P49828 and previous config saved to /var/cache/conftool/dbconfig/20230731-160220-ladsgroup.json
16:01 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 44s)
15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P49827 and previous config saved to /var/cache/conftool/dbconfig/20230731-154954-ladsgroup.json
15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P49826 and previous config saved to /var/cache/conftool/dbconfig/20230731-153448-ladsgroup.json
15:20 volans: deploying python3-wmflib fleet wide
15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49825 and previous config saved to /var/cache/conftool/dbconfig/20230731-151942-ladsgroup.json
14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T342617)', diff saved to https://phabricator.wikimedia.org/P49824 and previous config saved to /var/cache/conftool/dbconfig/20230731-145252-ladsgroup.json
14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T342617)', diff saved to https://phabricator.wikimedia.org/P49823 and previous config saved to /var/cache/conftool/dbconfig/20230731-145232-ladsgroup.json
14:47 sukhe: finished rolling out gdnsd 3.99.0~alpha2 upgrade
14:45 fabfur: imported prometheus-rdkafka-exporter package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/software/prometheus-rdkafka-exporter/+/942613) T342154
14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P49821 and previous config saved to /var/cache/conftool/dbconfig/20230731-143725-ladsgroup.json
14:32 fabfur: imported file-read-backwards package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/file-read-backwards/+/942491) T342154
14:31 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
14:26 volans: uploaded python3-wmflib_1.2.3 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
14:25 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P49820 and previous config saved to /var/cache/conftool/dbconfig/20230731-142220-ladsgroup.json
14:21 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
14:10 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:09 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) (T342964) (duration: 07m 27s)
14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T342617)', diff saved to https://phabricator.wikimedia.org/P49819 and previous config saved to /var/cache/conftool/dbconfig/20230731-140713-ladsgroup.json
14:05 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:03 jforrester@deploy1002: jforrester: Continuing with sync
14:03 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) (T342964) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
14:02 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) (T342964)
13:59 sukhe: reprepro -C component/dnsdist include bookworm-wikimedia dnsdist_1.8.0-1+wmf12u1_amd64.changes: T342154
13:57 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Disable the Collection extension for now, broken (T342931) (duration: 07m 38s)
13:57 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
13:52 sukhe: reprepro -C main include bullseye-wikimedia gdnsd_3.99.0~alpha2-1_amd64.changes
13:51 jforrester@deploy1002: jforrester: Continuing with sync
13:51 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Disable the Collection extension for now, broken (T342931) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:50 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Disable the Collection extension for now, broken (T342931)
13:49 jforrester@deploy1002: Synchronized wmf-config/interwiki.php: T325908 (duration: 06m 25s)
13:45 moritzm: install gtk+3.0 bugfix updates from Bullseye 11.7 point release
13:44 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
13:42 fabfur: imported fifo-log-demux package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/software/fifo-log-demux/+/942414) T342154
13:38 jforrester@deploy1002: Finished scap: Backport for Remove F: namespace alias (T325910) (duration: 24m 24s)
13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49818 and previous config saved to /var/cache/conftool/dbconfig/20230731-133707-root.json
13:29 jforrester@deploy1002: jforrester and epicpupper: Continuing with sync
13:29 jforrester@deploy1002: jforrester and epicpupper: Backport for Remove F: namespace alias (T325910) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:24 moritzm: imported jenkins 2.401.3 to thirdparty/ci for bullseye-wikimedia T342572
13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49817 and previous config saved to /var/cache/conftool/dbconfig/20230731-132201-root.json
13:14 jforrester@deploy1002: Started scap: Backport for Remove F: namespace alias (T325910)
13:13 James_F: WikiLambda backport verified for T342891 T342687 T341500 T343006 T342901 and T343041
13:09 jforrester@deploy1002: Synchronized php-1.41.0-wmf.19/extensions/WikiLambda/: (no justification provided) (duration: 07m 16s)
13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 50%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49816 and previous config saved to /var/cache/conftool/dbconfig/20230731-130657-root.json
13:00 jnuche: CI Jenkins upgraded to 2.401.3: https://phabricator.wikimedia.org/T342572
12:57 moritzm: installing 6.1.38 kernels on Bookworm hosts
12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T342617)', diff saved to https://phabricator.wikimedia.org/P49815 and previous config saved to /var/cache/conftool/dbconfig/20230731-125513-ladsgroup.json
12:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
12:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1196.eqiad.wmnet with reason: Maint
12:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1196.eqiad.wmnet with reason: Maint
12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1196 T342284', diff saved to https://phabricator.wikimedia.org/P49814 and previous config saved to /var/cache/conftool/dbconfig/20230731-125252-ladsgroup.json
12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49813 and previous config saved to /var/cache/conftool/dbconfig/20230731-125152-root.json
12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49812 and previous config saved to /var/cache/conftool/dbconfig/20230731-124912-ladsgroup.json
12:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
12:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
12:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T342617)', diff saved to https://phabricator.wikimedia.org/P49811 and previous config saved to /var/cache/conftool/dbconfig/20230731-124851-ladsgroup.json
12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49810 and previous config saved to /var/cache/conftool/dbconfig/20230731-123647-root.json
12:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P49809 and previous config saved to /var/cache/conftool/dbconfig/20230731-123345-ladsgroup.json
12:32 moritzm: installing xapian-core bugfix updates on Bullseye
12:23 moritzm: installing mariadb-10.5 updates from Bullseye 11.7 point release (libs/tools, unrelated to wmf-mariadb packages)
12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 5%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49808 and previous config saved to /var/cache/conftool/dbconfig/20230731-122142-root.json
12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P49807 and previous config saved to /var/cache/conftool/dbconfig/20230731-121839-ladsgroup.json
12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 3%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49806 and previous config saved to /var/cache/conftool/dbconfig/20230731-120638-root.json
12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T342617)', diff saved to https://phabricator.wikimedia.org/P49805 and previous config saved to /var/cache/conftool/dbconfig/20230731-120332-ladsgroup.json
11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 1%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49804 and previous config saved to /var/cache/conftool/dbconfig/20230731-115133-root.json
11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2114 T334650', diff saved to https://phabricator.wikimedia.org/P49803 and previous config saved to /var/cache/conftool/dbconfig/20230731-114645-root.json
11:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:11 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
11:11 btullis@cumin1001: Added views for new wiki: wikifunctionswiki T289316
11:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:55 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
10:45 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:45 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
10:36 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=parse1002.eqiad.wmnet
10:36 claime: Repooling parse1002 following CPU replacement - T339340
10:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
10:34 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
10:28 _joe_: disabling puppet on mwdebug2002, testing noc.wikimedia.org
10:20 moritzm: installing bind9 security updates (client-side tools/libs)
10:11 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1016.eqiad.wmnet with OS bookworm
10:02 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
10:02 btullis@cumin1001: Added views for new wiki: btmwiktionary T342670
09:56 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
09:54 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
09:51 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable Lift Wing for most wikis (T342115) (duration: 23m 00s)
09:50 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
09:45 ladsgroup@deploy1002: ladsgroup and isaranto: Continuing with sync
09:41 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,name=parse1002.eqiad.wmnet
09:37 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
09:36 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
09:32 urbanecm: Unblock stuck global rename by running `extensions/CentralAuth/maintenance/fixStuckGlobalRename.php` (T343099)
09:29 ladsgroup@deploy1002: ladsgroup and isaranto: Backport for ores-extension: enable Lift Wing for most wikis (T342115) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
09:29 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1016.eqiad.wmnet with OS bookworm
09:29 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
09:28 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable Lift Wing for most wikis (T342115)
09:28 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
09:27 ladsgroup@deploy1002: Finished scap: Backport for Remove ak from wgImportSources (T333765) (duration: 08m 10s)
09:21 ladsgroup@deploy1002: amire80 and ladsgroup: Continuing with sync
09:20 ladsgroup@deploy1002: amire80 and ladsgroup: Backport for Remove ak from wgImportSources (T333765) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
09:19 ladsgroup@deploy1002: Started scap: Backport for Remove ak from wgImportSources (T333765)
09:13 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
09:10 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
08:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
08:53 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
08:52 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1016.eqiad.wmnet with OS bookworm
08:42 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T342617)', diff saved to https://phabricator.wikimedia.org/P49802 and previous config saved to /var/cache/conftool/dbconfig/20230731-083941-ladsgroup.json
08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
07:21 taavi@deploy1002: Finished scap: Backport for ruwikibooks: Set wgRestrictDisplayTitle to false (T342800) (duration: 17m 02s)
07:15 taavi@deploy1002: anzx and taavi: Continuing with sync
07:13 taavi@deploy1002: anzx and taavi: Backport for ruwikibooks: Set wgRestrictDisplayTitle to false (T342800) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:04 taavi@deploy1002: Started scap: Backport for ruwikibooks: Set wgRestrictDisplayTitle to false (T342800)
06:06 moritzm: imported jenkins 2.401.3 to thirdparty/ci for buster-wikimedia T342572

2023-07-29

16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 T343077', diff saved to https://phabricator.wikimedia.org/P49801 and previous config saved to /var/cache/conftool/dbconfig/20230729-165954-root.json
16:58 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1183 to s5 primary T343077', diff saved to https://phabricator.wikimedia.org/P49800 and previous config saved to /var/cache/conftool/dbconfig/20230729-165813-root.json
16:57 marostegui@cumin1001: dbctl commit (dc=all): 'Emergency switchover T343077', diff saved to https://phabricator.wikimedia.org/P49799 and previous config saved to /var/cache/conftool/dbconfig/20230729-165748-root.json
16:57 marostegui: Starting emergency s5 eqiad failover from db1130 to db1183 - T343077 T343076
16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1183 with weight 0 T343077', diff saved to https://phabricator.wikimedia.org/P49798 and previous config saved to /var/cache/conftool/dbconfig/20230729-163621-root.json
16:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T343077
16:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T343077
16:19 _joe_: set read_only=0 on db1130
16:15 _joe_: systemctl start mariadb.service on db1130

2023-07-28

22:17 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@1ff1629]: Updating webrequest refine to include wikifunctions (duration: 00m 21s)
22:16 milimetric@deploy1002: Started deploy [airflow-dags/analytics@1ff1629]: Updating webrequest refine to include wikifunctions
22:03 milimetric@deploy1002: Finished deploy [analytics/refinery@f7e74ae] (thin): Fix wikifunction special page (duration: 00m 03s)
22:03 milimetric@deploy1002: Started deploy [analytics/refinery@f7e74ae] (thin): Fix wikifunction special page
22:00 milimetric@deploy1002: Finished deploy [analytics/refinery@f7e74ae]: Fix wikifunction special page (duration: 10m 18s)
21:50 milimetric@deploy1002: Started deploy [analytics/refinery@f7e74ae]: Fix wikifunction special page
20:12 milimetric@deploy1002: Finished deploy [analytics/refinery@53db2ca]: Publish refinery-source-0.2.19 (duration: 16m 53s)
19:55 milimetric@deploy1002: Started deploy [analytics/refinery@53db2ca]: Publish refinery-source-0.2.19
19:37 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@4d8c3db]: Deploying T342926 and https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/469 (duration: 00m 14s)
19:37 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@4d8c3db]: Deploying T342926 and https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/469
18:12 zabe: zabe@mwmaint1002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=brwikimedia --logwiki=metawiki 'Viniciuspontesoficial' 'Eusouvinipontes' # T343013
16:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
16:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
15:54 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
15:44 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
15:42 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
15:34 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
15:34 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
15:32 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
15:27 kamila@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:26 kamila@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:26 kamila_: k8s: delete and recreate the benthos-cache-invalidator namespace
15:25 kamila@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:25 kamila@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
14:25 milimetric@deploy1002: Finished deploy [analytics/refinery@1523f12] (thin): Patch sqoop of wikifunctions (duration: 00m 03s)
14:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
14:25 milimetric@deploy1002: Started deploy [analytics/refinery@1523f12] (thin): Patch sqoop of wikifunctions
14:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
14:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
14:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
14:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
14:21 milimetric@deploy1002: Finished deploy [analytics/refinery@1523f12]: Patch sqoop of wikifunctions (duration: 06m 11s)
14:15 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
14:15 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
14:15 milimetric@deploy1002: Started deploy [analytics/refinery@1523f12]: Patch sqoop of wikifunctions
14:14 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
14:08 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
14:03 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
13:30 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
12:28 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
12:28 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
10:51 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:50 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
10:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
10:44 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
10:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
10:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
10:41 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
10:41 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
10:39 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
10:38 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
10:36 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
10:36 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
10:33 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
10:33 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
10:29 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:28 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:25 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
10:18 dcausse: T342924: created search indices for wikifunctions
10:00 aikochou@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:57 aikochou@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1108.eqiad.wmnet with reason: db1108 has been replaced with db1208 - leaving for a few days before decom
09:07 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1108.eqiad.wmnet with reason: db1108 has been replaced with db1208 - leaving for a few days before decom
00:20 tgr@deploy1002: Finished scap: Backport for help: Fix navigation in the help panel (T342927) (duration: 10m 09s)
00:14 tgr@deploy1002: tgr: Continuing with sync
00:11 tgr@deploy1002: tgr: Backport for help: Fix navigation in the help panel (T342927) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
00:10 tgr@deploy1002: Started scap: Backport for help: Fix navigation in the help panel (T342927)

2023-07-27

21:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
21:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T342617)', diff saved to https://phabricator.wikimedia.org/P49790 and previous config saved to /var/cache/conftool/dbconfig/20230727-214302-ladsgroup.json
21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P49789 and previous config saved to /var/cache/conftool/dbconfig/20230727-212756-ladsgroup.json
21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P49788 and previous config saved to /var/cache/conftool/dbconfig/20230727-211250-ladsgroup.json
20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T342617)', diff saved to https://phabricator.wikimedia.org/P49787 and previous config saved to /var/cache/conftool/dbconfig/20230727-205744-ladsgroup.json
20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T342617)', diff saved to https://phabricator.wikimedia.org/P49786 and previous config saved to /var/cache/conftool/dbconfig/20230727-203435-ladsgroup.json
20:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
20:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49785 and previous config saved to /var/cache/conftool/dbconfig/20230727-203415-ladsgroup.json
20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P49784 and previous config saved to /var/cache/conftool/dbconfig/20230727-201908-ladsgroup.json
20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P49783 and previous config saved to /var/cache/conftool/dbconfig/20230727-200402-ladsgroup.json
19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49782 and previous config saved to /var/cache/conftool/dbconfig/20230727-194856-ladsgroup.json
19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb1014.eqiad.wmnet with OS bullseye
19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1014.eqiad.wmnet with reason: host reimage
19:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1014.eqiad.wmnet with reason: host reimage
19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49781 and previous config saved to /var/cache/conftool/dbconfig/20230727-190637-ladsgroup.json
19:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
19:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T342617)', diff saved to https://phabricator.wikimedia.org/P49780 and previous config saved to /var/cache/conftool/dbconfig/20230727-190617-ladsgroup.json
18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P49779 and previous config saved to /var/cache/conftool/dbconfig/20230727-185110-ladsgroup.json
18:41 milimetric@deploy1002: Finished deploy [analytics/refinery@1af57de] (thin): Deploying to sync script updates and static files (duration: 00m 04s)
18:41 milimetric@deploy1002: Started deploy [analytics/refinery@1af57de] (thin): Deploying to sync script updates and static files
18:41 milimetric@deploy1002: Finished deploy [analytics/refinery@1af57de]: Deploying to sync script updates and static files (duration: 08m 25s)
18:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P49778 and previous config saved to /var/cache/conftool/dbconfig/20230727-183604-ladsgroup.json
18:33 milimetric@deploy1002: Started deploy [analytics/refinery@1af57de]: Deploying to sync script updates and static files
18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T342617)', diff saved to https://phabricator.wikimedia.org/P49777 and previous config saved to /var/cache/conftool/dbconfig/20230727-182058-ladsgroup.json
18:13 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
18:12 krinkle@deploy1002: Finished deploy [performance/navtiming@c868e79]: Rename FID labels (Ibab711), Remove QuickSurveys (T336169), Add Vietnam (T340714) (duration: 00m 05s)
18:12 krinkle@deploy1002: Started deploy [performance/navtiming@c868e79]: Rename FID labels (Ibab711), Remove QuickSurveys (T336169), Add Vietnam (T340714)
18:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb1013.eqiad.wmnet with OS bullseye
18:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T342617)', diff saved to https://phabricator.wikimedia.org/P49776 and previous config saved to /var/cache/conftool/dbconfig/20230727-175659-ladsgroup.json
17:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
17:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T342617)', diff saved to https://phabricator.wikimedia.org/P49775 and previous config saved to /var/cache/conftool/dbconfig/20230727-175638-ladsgroup.json
17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1013.eqiad.wmnet with reason: host reimage
17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P49774 and previous config saved to /var/cache/conftool/dbconfig/20230727-174132-ladsgroup.json
17:41 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1013.eqiad.wmnet with reason: host reimage
17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P49773 and previous config saved to /var/cache/conftool/dbconfig/20230727-172626-ladsgroup.json
17:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
17:04 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:03 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:03 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:02 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:02 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:01 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
16:52 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable lw on itwiki and hewiki (T342115) (duration: 20m 53s)
16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T342617)', diff saved to https://phabricator.wikimedia.org/P49771 and previous config saved to /var/cache/conftool/dbconfig/20230727-164711-ladsgroup.json
16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49770 and previous config saved to /var/cache/conftool/dbconfig/20230727-164650-ladsgroup.json
16:45 dancy@deploy1002: Installation of scap version "4.57.0" completed for 600 hosts
16:44 dancy@deploy1002: Installing scap version "4.57.0" for 600 hosts
16:41 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
16:41 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
16:34 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
16:34 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
16:33 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores-extension: enable lw on itwiki and hewiki (T342115) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
16:31 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable lw on itwiki and hewiki (T342115)
16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P49769 and previous config saved to /var/cache/conftool/dbconfig/20230727-163144-ladsgroup.json
16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P49768 and previous config saved to /var/cache/conftool/dbconfig/20230727-161638-ladsgroup.json
16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49766 and previous config saved to /var/cache/conftool/dbconfig/20230727-160132-ladsgroup.json
15:56 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
15:56 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
15:49 jynus: restart db2097
15:49 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
15:49 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
15:45 zabe@deploy1002: Finished scap: Backport for Revert "CheckUser event table migration: Write new on group0" (T342902) (duration: 07m 43s)
15:44 kamila@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:42 kamila@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:40 kamila@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:40 kamila@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:40 kamila@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:40 kamila@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:38 zabe@deploy1002: zabe and dreamyjazz: Backport for Revert "CheckUser event table migration: Write new on group0" (T342902) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
15:37 zabe@deploy1002: Started scap: Backport for Revert "CheckUser event table migration: Write new on group0" (T342902)
15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49764 and previous config saved to /var/cache/conftool/dbconfig/20230727-153649-ladsgroup.json
15:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T342617)', diff saved to https://phabricator.wikimedia.org/P49763 and previous config saved to /var/cache/conftool/dbconfig/20230727-153629-ladsgroup.json
15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P49762 and previous config saved to /var/cache/conftool/dbconfig/20230727-152123-ladsgroup.json
15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P49761 and previous config saved to /var/cache/conftool/dbconfig/20230727-150616-ladsgroup.json
14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T342617)', diff saved to https://phabricator.wikimedia.org/P49759 and previous config saved to /var/cache/conftool/dbconfig/20230727-145110-ladsgroup.json
14:33 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:27 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T342617)', diff saved to https://phabricator.wikimedia.org/P49758 and previous config saved to /var/cache/conftool/dbconfig/20230727-142721-ladsgroup.json
14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T342617)', diff saved to https://phabricator.wikimedia.org/P49757 and previous config saved to /var/cache/conftool/dbconfig/20230727-142700-ladsgroup.json
14:26 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a1-codfw.mgmt.codfw.wmnet
14:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:25 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
14:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
14:22 cmooney@cumin1001: START - Cookbook sre.dns.netbox
14:22 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
14:20 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:19 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
14:19 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P49756 and previous config saved to /var/cache/conftool/dbconfig/20230727-141154-ladsgroup.json
13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P49755 and previous config saved to /var/cache/conftool/dbconfig/20230727-135648-ladsgroup.json
13:55 fabfur: done restarting lvs6002 (T335835)
13:55 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6002.drmrs.wmnet
13:52 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6002.drmrs.wmnet
13:49 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
13:49 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T342617)', diff saved to https://phabricator.wikimedia.org/P49754 and previous config saved to /var/cache/conftool/dbconfig/20230727-134141-ladsgroup.json
13:32 fabfur: begin restarting lvs6002 (T335835)
13:18 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
13:17 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T342617)', diff saved to https://phabricator.wikimedia.org/P49752 and previous config saved to /var/cache/conftool/dbconfig/20230727-131733-ladsgroup.json
13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T342617)', diff saved to https://phabricator.wikimedia.org/P49751 and previous config saved to /var/cache/conftool/dbconfig/20230727-131712-ladsgroup.json
13:15 fabfur: done restarting lvs6001 (T335835)
13:15 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6001.drmrs.wmnet
13:12 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6001.drmrs.wmnet
13:11 samtar@deploy1002: Finished scap: Backport for Re-enable PC writes for parsoid endpoints (T339867) (duration: 07m 02s)
13:05 samtar@deploy1002: samtar and daniel: Backport for Re-enable PC writes for parsoid endpoints (T339867) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:04 samtar@deploy1002: Started scap: Backport for Re-enable PC writes for parsoid endpoints (T339867)
13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P49750 and previous config saved to /var/cache/conftool/dbconfig/20230727-130206-ladsgroup.json
12:54 fabfur: begin restarting lvs6001 (T335835)
12:53 fabfur: done restarting lvs6003 (T335835)
12:48 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6003.drmrs.wmnet
12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P49749 and previous config saved to /var/cache/conftool/dbconfig/20230727-124700-ladsgroup.json
12:45 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6003.drmrs.wmnet
12:43 fabfur: begin restarting lvs6003 (T335835)
12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T342617)', diff saved to https://phabricator.wikimedia.org/P49748 and previous config saved to /var/cache/conftool/dbconfig/20230727-123153-ladsgroup.json
12:08 jynus: systemctl stop mariadb@s1 @ db2097
12:07 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Also add square logo for Vector-2022 (duration: 07m 05s)
12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T342617)', diff saved to https://phabricator.wikimedia.org/P49747 and previous config saved to /var/cache/conftool/dbconfig/20230727-120710-ladsgroup.json
12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
12:01 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Also add square logo for Vector-2022 synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
12:00 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Also add square logo for Vector-2022
12:00 ladsgroup@deploy1002: Finished scap: Backport for rdbms: Avoid making wasteful memcached calls in CP (T314434) (duration: 08m 54s)
11:52 ladsgroup@deploy1002: ladsgroup: Backport for rdbms: Avoid making wasteful memcached calls in CP (T314434) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
11:51 ladsgroup@deploy1002: Started scap: Backport for rdbms: Avoid making wasteful memcached calls in CP (T314434)
11:50 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudnet2005-dev.private.codfw.wikimedia.cloud on all recursors
11:50 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache cloudnet2005-dev.private.codfw.wikimedia.cloud on all recursors
11:50 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudnet2006-dev.private.codfw.wikimedia.cloud on all recursors
11:50 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache cloudnet2006-dev.private.codfw.wikimedia.cloud on all recursors
11:50 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:50 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet2005-dev/2006-dev - aborrero@cumin1001"
11:49 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet2005-dev/2006-dev - aborrero@cumin1001"
11:48 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Add logo, wordmark (duration: 08m 35s)
11:47 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:42 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudnet1005.private.eqiad.wikimedia.cloud on all recursors
11:42 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache cloudnet1005.private.eqiad.wikimedia.cloud on all recursors
11:41 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Add logo, wordmark synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
11:39 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Add logo, wordmark
11:37 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
11:31 ladsgroup@deploy1002: backport Cancelled
11:30 ladsgroup@deploy1002: Finished scap: Backport for CentralAuthUser: Don't load user information unless needed (duration: 07m 47s)
11:24 ladsgroup@deploy1002: ladsgroup: Backport for CentralAuthUser: Don't load user information unless needed synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
11:22 ladsgroup@deploy1002: Started scap: Backport for CentralAuthUser: Don't load user information unless needed
11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
11:12 fabfur: done restarting lvs3006 (T335835)
11:03 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3006.esams.wmnet
11:00 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3006.esams.wmnet
10:53 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet1005/1006 - aborrero@cumin1001"
10:52 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet1005/1006 - aborrero@cumin1001"
10:50 aborrero@cumin1001: START - Cookbook sre.dns.netbox
10:50 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet1005
10:50 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet1005
10:50 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet1006
10:50 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet1006
10:37 taavi: purge edge caches for "https://wikifunctions.org/"
10:35 fabfur: begin restarting lvs3006 (T335835)
10:34 fabfur: done restarting lvs3005 (T335835)
10:33 kevinbazira@deploy1002: Finished deploy [ores/deploy@c30920f]: T342118 (duration: 09m 04s)
10:24 kevinbazira@deploy1002: Started deploy [ores/deploy@c30920f]: T342118
10:14 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3005.esams.wmnet
10:12 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3005.esams.wmnet
09:56 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
09:54 fabfur: begin restarting lvs3005 (T335835)
09:44 fabfur: done restarting lvs3007 (T335835)
09:42 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3007.esams.wmnet
09:40 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3007.esams.wmnet
09:38 fabfur: begin restarting lvs3007 (T335835)
09:20 urbanecm: Run `mwscript extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php --wiki=frwiki --page="Sensibilité électromagnétique" --force` to debug T342488
09:12 fabfur: done restarting lvs1019 (T335835)
09:11 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
09:07 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
08:42 fabfur: begin restarting lvs1019 (T335835)
08:34 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
08:16 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.19 refs T340247
07:54 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
07:54 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
07:54 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
07:54 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
07:40 XioNoX: reboot lsw1-a1-codfw (test device)
06:53 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
06:39 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
06:38 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
06:36 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
06:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
05:57 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
05:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
05:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
05:26 oblivian@deploy1002: Started scap: (no justification provided)
05:26 _joe_: scap is not syncing; just rebuilding the image from scratch to verify the reason for a bug.
05:22 oblivian@deploy1002: Started scap: (no justification provided)
03:19 cstone: payments-wiki upgraded from 2a68dfe2 to 1a6ca7ab
03:04 eileen: civicrm upgraded from 5a84b138 to 16c2e58a
00:54 eileen: civicrm upgraded from 68f29b70 to 5a84b138
00:51 eileen: civicrm upgraded from 853c14f3 to 68f29b70
00:20 eileen: rollback because I got an error when I tried to view - so let's see
00:20 eileen: civicrm rolled back from 68f29b70 to 853c14f3 (locked)
00:17 eileen: civicrm upgraded from 853c14f3 to 68f29b70

2023-07-26

23:01 jforrester@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache now that wikifunctions is here (duration: 06m 52s)
21:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wcqs2001.codfw.wmnet
21:46 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wcqs2001.codfw.wmnet
21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49745 and previous config saved to /var/cache/conftool/dbconfig/20230726-212310-ladsgroup.json
21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P49744 and previous config saved to /var/cache/conftool/dbconfig/20230726-210804-ladsgroup.json
21:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
21:04 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
21:00 taavi: manually attach User:WikiLambda_system to SUL T342811
20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P49743 and previous config saved to /var/cache/conftool/dbconfig/20230726-205257-ladsgroup.json
20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49742 and previous config saved to /var/cache/conftool/dbconfig/20230726-203751-ladsgroup.json
20:34 taavi@deploy1002: Finished scap: Backport for clienthints: Start collecting client hints data on testwiki (T341110), CheckUser event table migration: Write new on group0 (T330158) (duration: 26m 17s)
20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49741 and previous config saved to /var/cache/conftool/dbconfig/20230726-201554-ladsgroup.json
20:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
20:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49740 and previous config saved to /var/cache/conftool/dbconfig/20230726-201533-ladsgroup.json
20:09 taavi@deploy1002: dreamyjazz and taavi: Backport for clienthints: Start collecting client hints data on testwiki (T341110), CheckUser event table migration: Write new on group0 (T330158) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD
20:08 taavi@deploy1002: Started scap: Backport for clienthints: Start collecting client hints data on testwiki (T341110), CheckUser event table migration: Write new on group0 (T330158)
20:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P49739 and previous config saved to /var/cache/conftool/dbconfig/20230726-200026-ladsgroup.json
19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P49738 and previous config saved to /var/cache/conftool/dbconfig/20230726-194520-ladsgroup.json
19:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49737 and previous config saved to /var/cache/conftool/dbconfig/20230726-193014-ladsgroup.json
18:48 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:47 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
18:45 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
18:44 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49736 and previous config saved to /var/cache/conftool/dbconfig/20230726-184430-ladsgroup.json
18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
18:44 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49735 and previous config saved to /var/cache/conftool/dbconfig/20230726-184408-ladsgroup.json
18:43 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
18:37 jforrester@deploy1002: Synchronized wmf-config/: Last fixes for initial wikifunctions.org, he says (duration: 06m 44s)
18:37 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
18:36 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
18:36 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
18:34 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
18:34 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:34 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
18:33 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:33 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P49734 and previous config saved to /var/cache/conftool/dbconfig/20230726-182902-ladsgroup.json
18:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P49732 and previous config saved to /var/cache/conftool/dbconfig/20230726-181356-ladsgroup.json
18:13 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
18:12 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
18:12 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
18:10 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
18:10 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:09 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49731 and previous config saved to /var/cache/conftool/dbconfig/20230726-175850-ladsgroup.json
17:49 jforrester@deploy1002: Finished scap: Hopefully final update for wikifunctions.org initial config (duration: 07m 30s)
17:41 jforrester@deploy1002: Started scap: Hopefully final update for wikifunctions.org initial config
17:37 jforrester@deploy1002: Finished scap: Backport for wgNoFollowDomainExceptions: Add wikifunctions.org (T275945) (duration: 11m 27s)
17:27 jforrester@deploy1002: jforrester: Backport for wgNoFollowDomainExceptions: Add wikifunctions.org (T275945) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
17:25 jforrester@deploy1002: Started scap: Backport for wgNoFollowDomainExceptions: Add wikifunctions.org (T275945)
17:13 jforrester@deploy1002: Finished scap: Backport for MWMultiVersion: Alert this code to wikifunctions.org existing (T275945) (duration: 08m 40s)
17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49730 and previous config saved to /var/cache/conftool/dbconfig/20230726-171244-ladsgroup.json
17:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
17:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49729 and previous config saved to /var/cache/conftool/dbconfig/20230726-171223-ladsgroup.json
17:06 jforrester@deploy1002: jforrester: Backport for MWMultiVersion: Alert this code to wikifunctions.org existing (T275945) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
17:04 jforrester@deploy1002: Started scap: Backport for MWMultiVersion: Alert this code to wikifunctions.org existing (T275945)
16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P49728 and previous config saved to /var/cache/conftool/dbconfig/20230726-165717-ladsgroup.json
16:53 jforrester@deploy1002: Finished scap: Backport for docroot: Add wikifunctions.org (T275945) (duration: 08m 05s)
16:47 jforrester@deploy1002: jforrester: Backport for docroot: Add wikifunctions.org (T275945) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
16:45 jforrester@deploy1002: Started scap: Backport for docroot: Add wikifunctions.org (T275945)
16:44 fabfur: end reboot of lvs1018 (T335835)
16:43 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P49727 and previous config saved to /var/cache/conftool/dbconfig/20230726-164211-ladsgroup.json
16:40 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
16:27 jforrester@deploy1002: Finished scap: Initial deploy of wikifunctionswiki in locked-down mode for T275945 (duration: 07m 49s)
16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49726 and previous config saved to /var/cache/conftool/dbconfig/20230726-162705-ladsgroup.json
16:20 jforrester@deploy1002: Started scap: Initial deploy of wikifunctionswiki in locked-down mode for T275945
16:18 fabfur: begin reboot of lvs1018 (T335835)
16:15 jforrester@deploy1002: Finished scap: Backport for Add wikifunctions.org to prod wgLocalVirtualHosts (T275945) (duration: 09m 07s)
16:08 jforrester@deploy1002: jforrester: Backport for Add wikifunctions.org to prod wgLocalVirtualHosts (T275945) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
16:07 fabfur: end reboot of lvs1017 (T335835)
16:06 jforrester@deploy1002: Started scap: Backport for Add wikifunctions.org to prod wgLocalVirtualHosts (T275945)
16:03 jnuche@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.19 refs T340247 (duration: 06m 56s)
15:56 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.19 refs T340247
15:55 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
15:52 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
15:47 jforrester@deploy1002: Finished scap: Backport for Load RescoreFunctions from the ExtensionRegistry (T342744) (duration: 09m 39s)
15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49725 and previous config saved to /var/cache/conftool/dbconfig/20230726-154245-ladsgroup.json
15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T342617)', diff saved to https://phabricator.wikimedia.org/P49724 and previous config saved to /var/cache/conftool/dbconfig/20230726-154209-ladsgroup.json
15:39 jforrester@deploy1002: dcausse and jforrester: Backport for Load RescoreFunctions from the ExtensionRegistry (T342744) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
15:38 derick@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
15:37 jforrester@deploy1002: Started scap: Backport for Load RescoreFunctions from the ExtensionRegistry (T342744)
15:37 derick@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
15:37 derick@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
15:35 derick@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
15:34 derick@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
15:34 derick@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
15:32 derick@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
15:32 derick@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
15:30 fabfur: begin reboot of lvs1017 (T335835)
15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P49723 and previous config saved to /var/cache/conftool/dbconfig/20230726-152703-ladsgroup.json
15:26 fabfur: end reboot of lvs1020 (T335835)
15:25 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
15:21 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
15:20 fabfur: begin reboot of lvs1020 (T335835)
15:17 fabfur: end reboot of lvs4009 (T335835)
15:13 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4009.ulsfo.wmnet
15:13 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable lw on eswikiquotes and eswikibooks (T342115) (duration: 14m 06s)
15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P49722 and previous config saved to /var/cache/conftool/dbconfig/20230726-151157-ladsgroup.json
15:10 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4009.ulsfo.wmnet
15:00 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores-extension: enable lw on eswikiquotes and eswikibooks (T342115) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
14:59 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable lw on eswikiquotes and eswikibooks (T342115)
14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T342617)', diff saved to https://phabricator.wikimedia.org/P49721 and previous config saved to /var/cache/conftool/dbconfig/20230726-145651-ladsgroup.json
14:49 fabfur: begin reboot of lvs4009 (T335835)
14:38 jforrester@deploy1002: Finished scap: Backport for Normalize the skin name when it comes from preferences or useskin (T342733) (duration: 08m 24s)
14:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
14:33 hnowlan: enabling puppet on A:cp to deploy r/941440
14:32 jforrester@deploy1002: jforrester: Backport for Normalize the skin name when it comes from preferences or useskin (T342733) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
14:31 kuncung: test
14:30 jforrester@deploy1002: Started scap: Backport for Normalize the skin name when it comes from preferences or useskin (T342733)
14:28 fabfur: end reboot of lvs4008 (T335835)
14:27 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4008.ulsfo.wmnet
14:24 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4008.ulsfo.wmnet
14:19 hnowlan: disabling puppet on A:cp to deploy r/941440
14:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
14:13 urbanecm@deploy1002: Finished scap: Backport for Revert "specials: Use cross-wiki aware UserIdentityLookup on Special:UserRights" (T255309 T342747) (duration: 12m 33s)
14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T342617)', diff saved to https://phabricator.wikimedia.org/P49720 and previous config saved to /var/cache/conftool/dbconfig/20230726-141228-ladsgroup.json
14:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
14:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
14:02 urbanecm@deploy1002: urbanecm: Backport for Revert "specials: Use cross-wiki aware UserIdentityLookup on Special:UserRights" (T255309 T342747) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
14:00 urbanecm@deploy1002: Started scap: Backport for Revert "specials: Use cross-wiki aware UserIdentityLookup on Special:UserRights" (T255309 T342747)
14:00 fabfur: begin reboot of lvs4008 (T335835)
13:55 fabfur: end reboot of lvs4010 (T335835)
13:52 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
13:50 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
13:46 fabfur: begin reboot of lvs4010 (T335835)
13:34 jforrester@deploy1002: Finished scap: Backport for Add stream config for iOS schema (T341896) (duration: 20m 16s)
13:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
13:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
13:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T342617)', diff saved to https://phabricator.wikimedia.org/P49719 and previous config saved to /var/cache/conftool/dbconfig/20230726-133104-ladsgroup.json
13:23 jforrester@deploy1002: jforrester and tsev: Backport for Add stream config for iOS schema (T341896) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:23 fab@deploy1002: Finished deploy [airflow-dags/research@e7b9253]: (no justification provided) (duration: 00m 07s)
13:22 fab@deploy1002: Started deploy [airflow-dags/research@e7b9253]: (no justification provided)
13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P49718 and previous config saved to /var/cache/conftool/dbconfig/20230726-131557-ladsgroup.json
13:14 jforrester@deploy1002: Started scap: Backport for Add stream config for iOS schema (T341896)
13:13 jforrester@deploy1002: sync-world aborted: Backport for Add stream config for iOS schema (T341896) (duration: 11m 00s)
13:05 James_F: Created cu_useragent_clienthints.sql and cu_useragent_clienthints_map.sql on testwiki for T258105
13:02 jforrester@deploy1002: Started scap: Backport for Add stream config for iOS schema (T341896)
13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P49717 and previous config saved to /var/cache/conftool/dbconfig/20230726-130051-ladsgroup.json
12:52 jforrester@deploy1002: Synchronized php-1.41.0-wmf.19/extensions/WikiLambda/: Update WikiLambda wmf.19 branch to latest ahead of wikifunctions.org roll-out (duration: 07m 10s)
12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T342617)', diff saved to https://phabricator.wikimedia.org/P49716 and previous config saved to /var/cache/conftool/dbconfig/20230726-124545-ladsgroup.json
12:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
12:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
12:36 jforrester@deploy1002: Finished scap: Backport for ProductionServices: Define the wikifunctions orchestrator access point (T297314) (duration: 07m 39s)
12:30 jforrester@deploy1002: jforrester: Backport for ProductionServices: Define the wikifunctions orchestrator access point (T297314) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
12:28 jforrester@deploy1002: Started scap: Backport for ProductionServices: Define the wikifunctions orchestrator access point (T297314)
11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T342617)', diff saved to https://phabricator.wikimedia.org/P49714 and previous config saved to /var/cache/conftool/dbconfig/20230726-115528-ladsgroup.json
11:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T342617)', diff saved to https://phabricator.wikimedia.org/P49713 and previous config saved to /var/cache/conftool/dbconfig/20230726-115507-ladsgroup.json
11:50 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
11:48 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
11:48 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
11:47 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
11:46 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
11:45 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
11:40 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P49712 and previous config saved to /var/cache/conftool/dbconfig/20230726-114001-ladsgroup.json
11:32 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1016.eqiad.wmnet with OS bullseye
11:27 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bullseye
11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P49711 and previous config saved to /var/cache/conftool/dbconfig/20230726-112454-ladsgroup.json
11:19 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.19 refs T340247
11:14 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1016.eqiad.wmnet with OS bullseye
11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T342617)', diff saved to https://phabricator.wikimedia.org/P49710 and previous config saved to /var/cache/conftool/dbconfig/20230726-110948-ladsgroup.json
11:05 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts releases1002.eqiad.wmnet
11:05 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:05 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: releases1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
11:03 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: releases1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
11:01 eoghan@cumin1001: START - Cookbook sre.dns.netbox
10:57 eoghan@cumin1001: START - Cookbook sre.hosts.decommission for hosts releases1002.eqiad.wmnet
10:56 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts releases2002.codfw.wmnet
10:56 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:56 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: releases2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
10:56 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bullseye
10:55 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: releases2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
10:53 eoghan@cumin1001: START - Cookbook sre.dns.netbox
10:31 eoghan@cumin1001: START - Cookbook sre.hosts.decommission for hosts releases2002.codfw.wmnet
10:27 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1016.eqiad.wmnet with OS bookworm
10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T342617)', diff saved to https://phabricator.wikimedia.org/P49708 and previous config saved to /var/cache/conftool/dbconfig/20230726-102232-ladsgroup.json
10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
10:17 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
09:59 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
08:36 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
08:34 jnuche@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.19 refs T340247 (duration: 19m 56s)
08:14 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.19 refs T340247
08:01 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
07:58 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
07:56 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
07:52 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
off: updating bookworm netboot image for point release 12.1 ( https://wikitech.wikimedia.org/wiki/Updating_netboot_image_with_newer_kernel#Updating_production_point_release )
07:46 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
07:37 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
07:37 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
07:36 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bullseye
07:36 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
07:35 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
07:35 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
07:30 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
07:18 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
07:07 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
06:48 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:47 oblivian@cumin1001: START - Cookbook sre.dns.netbox
06:37 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts xhgui2001.codfw.wmnet,xhgui1001.eqiad.wmnet
06:37 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:37 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui2001.codfw.wmnet,xhgui1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
06:34 marostegui: Stop mariadb on clouddb1021 T334651
06:33 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui2001.codfw.wmnet,xhgui1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
06:31 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
06:30 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
06:26 denisse@cumin1001: START - Cookbook sre.dns.netbox
06:25 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
06:24 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
06:21 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts xhgui2001.codfw.wmnet,xhgui1001.eqiad.wmnet
06:18 oblivian@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
06:17 oblivian@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
06:17 oblivian@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
06:15 oblivian@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
01:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye

2023-07-25

22:52 eileen: revision c62433ab -> 8689d10d
21:18 zabe@deploy1002: Finished scap: Backport for Create UserIdentityValue with correct wiki (T342655), Create UserIdentityValue with correct wiki (T342655) (duration: 10m 06s)
21:10 zabe@deploy1002: zabe: Backport for Create UserIdentityValue with correct wiki (T342655), Create UserIdentityValue with correct wiki (T342655) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
21:08 zabe@deploy1002: Started scap: Backport for Create UserIdentityValue with correct wiki (T342655), Create UserIdentityValue with correct wiki (T342655)
21:02 zabe@deploy1002: Finished scap: update interwiki cache, gerrit:941057 (duration: 07m 20s)
20:55 zabe@deploy1002: Started scap: update interwiki cache, gerrit:941057
20:53 zabe@deploy1002: Finished scap: T335216 (duration: 08m 24s)
20:46 zabe@deploy1002: zabe: T335216 synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:44 zabe@deploy1002: Started scap: T335216
20:42 zabe: create Wiktionary Mandailing # T335216
20:33 taavi@deploy1002: Finished scap: Backport for Fix text showing on icon only buttons (duration: 12m 08s)
20:23 taavi@deploy1002: taavi and bwang: Backport for Fix text showing on icon only buttons synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:21 taavi@deploy1002: Started scap: Backport for Fix text showing on icon only buttons
18:24 dwisehaupt@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:24 dwisehaupt@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"
18:23 dwisehaupt@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"
18:21 dwisehaupt@cumin1001: START - Cookbook sre.dns.netbox
18:21 sukhe: dummy authdns-update returns
18:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4003.wikimedia.org
18:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns4003.wikimedia.org
17:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4004.wikimedia.org
17:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns4004.wikimedia.org
16:56 sukhe: dummy authdns-update
16:51 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
16:51 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
16:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6002.wikimedia.org
16:45 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns6002.wikimedia.org
16:41 fabfur: end rebooting lvs5005 (T335835)
16:41 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5005.eqsin.wmnet
16:40 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
16:37 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5005.eqsin.wmnet
16:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6001.wikimedia.org
16:26 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns6001.wikimedia.org
16:19 fabfur: begin rebooting lvs5005 (T335835)
15:57 dancy@deploy1002: Finished deploy [releng/jenkins-deploy@97b4674] (releasing): (no justification provided) (duration: 33m 26s)
15:36 fabfur: lvs5004 restarted and services are reactivating (T335835)
15:28 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5004.eqsin.wmnet
15:26 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
15:25 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5004.eqsin.wmnet
15:25 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5004.eqsin.wmnet
15:25 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5004.eqsin.wmnet
15:25 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
15:24 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
15:23 dancy@deploy1002: Started deploy [releng/jenkins-deploy@97b4674] (releasing): (no justification provided)
15:22 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
15:21 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
15:17 _joe_: removing all tags for docker image openjdk-8-jre T341115
15:16 zabe@deploy1002: Finished scap: Backport for Add namespace translations for Mandailing (btm) (T335217), Add namespace translations for Mandailing (btm) (T335217) (duration: 07m 51s)
15:14 _joe_: removing all tags for docker image openjdk-8-jdk T341115
15:10 zabe@deploy1002: zabe: Backport for Add namespace translations for Mandailing (btm) (T335217), Add namespace translations for Mandailing (btm) (T335217) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
15:09 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
15:08 zabe@deploy1002: Started scap: Backport for Add namespace translations for Mandailing (btm) (T335217), Add namespace translations for Mandailing (btm) (T335217)
14:59 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
14:59 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:59 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
14:59 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
14:58 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
14:58 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
14:58 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
14:58 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
14:58 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
14:52 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
14:47 damilare: SmashPig upgraded from 9ee24eef to f40badde
14:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
14:45 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
14:45 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
14:45 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
14:43 fabfur: begin rebooting lvs5004 (T335835)
14:35 fabfur: lvs5006 rebooted and services restarted (T335835)
14:33 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5006.eqsin.wmnet
14:30 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
14:30 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
14:30 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5006.eqsin.wmnet
14:29 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
14:29 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5006.eqsin.wmnet
14:29 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
14:29 hnowlan: disabling puppet on A:cp for rollout of r/941405
14:28 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5006.eqsin.wmnet
14:28 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
14:27 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5006.eqsin.wmnet
14:26 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
14:24 fabfur: start stopping services and rebooting lvs5006 (T335835)
14:12 damilare: SmashPig upgraded from a9156920 to 9ee24eef
14:02 urbanecm@deploy1002: Finished scap: Backport for Enable write new on testwiki for CheckUser event tables migration (T330158) (duration: 22m 27s)
14:00 sukhe: rolling out pdns-recursor update on A:dns-rec
13:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on parse1002.eqiad.wmnet with reason: T339340 - hw troubleshooting
13:42 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on parse1002.eqiad.wmnet with reason: T339340 - hw troubleshooting
13:41 urbanecm@deploy1002: urbanecm and dreamyjazz: Backport for Enable write new on testwiki for CheckUser event tables migration (T330158) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "re-run to fix mw1486 - cgoubert@cumin1001"
13:40 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "re-run to fix mw1486 - cgoubert@cumin1001"
13:40 urbanecm@deploy1002: Started scap: Backport for Enable write new on testwiki for CheckUser event tables migration (T330158)
13:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1486.eqiad.wmnet with OS buster
13:38 cgoubert@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1001"
13:38 urbanecm@deploy1002: Finished scap: Backport for Add support for writing both new and old to Hooks.php (T341934 T341586), Follow-up: Add support for writing both new and old to Hooks.php (T341586) (duration: 07m 28s)
13:30 urbanecm@deploy1002: Started scap: Backport for Add support for writing both new and old to Hooks.php (T341934 T341586), Follow-up: Add support for writing both new and old to Hooks.php (T341586)
13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T342617)', diff saved to https://phabricator.wikimedia.org/P49704 and previous config saved to /var/cache/conftool/dbconfig/20230725-132121-ladsgroup.json
13:20 godog: powercycle parse1002 - T339340
13:17 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P49702 and previous config saved to /var/cache/conftool/dbconfig/20230725-130615-ladsgroup.json
12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P49701 and previous config saved to /var/cache/conftool/dbconfig/20230725-125109-ladsgroup.json
12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T342617)', diff saved to https://phabricator.wikimedia.org/P49700 and previous config saved to /var/cache/conftool/dbconfig/20230725-123602-ladsgroup.json
12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T342617)', diff saved to https://phabricator.wikimedia.org/P49699 and previous config saved to /var/cache/conftool/dbconfig/20230725-120641-ladsgroup.json
12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
11:49 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:49 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:48 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:48 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
11:48 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:47 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:46 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:45 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:37 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:37 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack - aborrero@cumin1001"
11:36 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack - aborrero@cumin1001"
11:34 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:32 akosiaris: T340087 wikidiff2 rollout done. 1 host is unreachable and will need to be reimaged or upgraded manually to pick this up, parse1002.eqiad.wmnet
11:30 akosiaris: T340087 starting wikidiff2 1.41.1 rollout to eqiad. codfw already done.
11:28 akosiaris: restart php on mw1457
11:25 akosiaris: T340087 keep a copy php-wikidiff2_1.13.0-1_amd64.deb in apt1001:/home/akosiaris/wd/ in case of emergency
11:24 akosiaris: T340087 starting wikidiff2 1.41.1 rollout to codfw
10:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 31 days, 0:00:00 on lvs[1013-1015].eqiad.wmnet with reason: test hosts
10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 31 days, 0:00:00 on lvs[1013-1015].eqiad.wmnet with reason: test hosts
09:50 elukey: restart kafka on kafka-main1001 to pick up the new changes - T341558
09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-main1001.eqiad.wmnet with reason: Apply a new setting to the Kafka broker
09:46 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-main1001.eqiad.wmnet with reason: Apply a new setting to the Kafka broker
09:06 slyngs: Restart Tomcat / Apereo CAS on idp1002
09:01 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.19 refs T340247
08:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
08:59 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
08:51 jnuche@deploy1002: Pruned MediaWiki: 1.41.0-wmf.17 (duration: 02m 11s)
08:49 jnuche@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.19 refs T340247 (duration: 52m 35s)
08:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-main1001.eqiad.wmnet with reason: Apply a new setting to the Kafka broker
08:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-main1001.eqiad.wmnet with reason: Apply a new setting to the Kafka broker
08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49696 and previous config saved to /var/cache/conftool/dbconfig/20230725-080326-root.json
08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49695 and previous config saved to /var/cache/conftool/dbconfig/20230725-080315-root.json
07:57 jnuche@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.19 refs T340247
07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49694 and previous config saved to /var/cache/conftool/dbconfig/20230725-074821-root.json
07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49693 and previous config saved to /var/cache/conftool/dbconfig/20230725-074810-root.json
07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49692 and previous config saved to /var/cache/conftool/dbconfig/20230725-073317-root.json
07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49691 and previous config saved to /var/cache/conftool/dbconfig/20230725-073305-root.json
07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49690 and previous config saved to /var/cache/conftool/dbconfig/20230725-071812-root.json
07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49689 and previous config saved to /var/cache/conftool/dbconfig/20230725-071801-root.json
07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49688 and previous config saved to /var/cache/conftool/dbconfig/20230725-070307-root.json
07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49687 and previous config saved to /var/cache/conftool/dbconfig/20230725-070256-root.json
06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49686 and previous config saved to /var/cache/conftool/dbconfig/20230725-064802-root.json
06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49685 and previous config saved to /var/cache/conftool/dbconfig/20230725-064751-root.json
06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49684 and previous config saved to /var/cache/conftool/dbconfig/20230725-063258-root.json
06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49683 and previous config saved to /var/cache/conftool/dbconfig/20230725-063247-root.json
06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49682 and previous config saved to /var/cache/conftool/dbconfig/20230725-061753-root.json
06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49681 and previous config saved to /var/cache/conftool/dbconfig/20230725-061742-root.json
06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1213 (s5, s6)', diff saved to https://phabricator.wikimedia.org/P49680 and previous config saved to /var/cache/conftool/dbconfig/20230725-061319-root.json
06:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs[2004-2006].codfw.wmnet
06:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
06:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
05:52 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
05:46 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
05:09 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs[2004-2006].codfw.wmnet
05:08 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts wdqs[2004-2006].codfw.wmnet
04:56 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs[2004-2006].codfw.wmnet
03:47 eileen: civicrm upgraded from ad642712 to 853c14f3
03:31 eileen: civicrm upgraded from d7c8d77e to ad642712 (back to head as the rollback didn't do anything)
03:11 eileen: civicrm changed from ad642712 to d7c8d77e (locked)
02:59 eileen: civicrm upgraded from 6fd25bf6 to ad642712
01:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1014.eqiad.wmnet with OS bullseye
01:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
01:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
00:50 wfan: civicrm upgraded from d7c8d77e to 6fd25bf6
00:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye

2023-07-24

23:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
22:46 zabe@deploy1002: Finished scap: Backport for client: Avoid dynamically registering hook handlers (T341102), HookContainer: avoid instantiation of handlers when calling register() (T341102 T340113 T339834) (duration: 09m 59s)
22:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
22:37 zabe@deploy1002: zabe: Backport for client: Avoid dynamically registering hook handlers (T341102), HookContainer: avoid instantiation of handlers when calling register() (T341102 T340113 T339834) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experime
22:36 zabe@deploy1002: Started scap: Backport for client: Avoid dynamically registering hook handlers (T341102), HookContainer: avoid instantiation of handlers when calling register() (T341102 T340113 T339834)
22:16 jgleeson: civiproxy upgraded from 99cecb92 to c000fc1e
21:28 maryum: Deployed patch for T341565
21:14 sbassett: Deployed updated mitigation for T336027
20:04 dancy@deploy1002: Installing scap version "4.56.0" for 605 hosts
19:29 krinkle@deploy1002: Synchronized lib/: Iaa0cb0c75d4 (duration: 06m 21s)
19:21 krinkle@deploy1002: Synchronized src/Profiler.php: Idada376134 (duration: 06m 30s)
18:09 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
18:08 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
18:08 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
18:07 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
18:07 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
18:06 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:39 sukhe: restart ATS to pick up CR 940953: T339134
16:00 topranks: Re-enabling disabled transport from knams to esams after fiber cleaning T337997
14:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1014.eqiad.wmnet with OS bullseye
14:44 vgutierrez: Repooling cp4052 (upload) running ATS 9.2.1 - T339134
14:37 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:36 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:36 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:36 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
14:34 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
14:30 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:30 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:29 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:28 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:23 samtar@deploy1002: Finished scap: Backport for Revert "Revert "Run a synthetic test for client side preferences"" (T336527 T339268) (duration: 14m 05s)
14:11 samtar@deploy1002: samtar: Backport for Revert "Revert "Run a synthetic test for client side preferences"" (T336527 T339268) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
14:09 samtar@deploy1002: Started scap: Backport for Revert "Revert "Run a synthetic test for client side preferences"" (T336527 T339268)
14:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49677 and previous config saved to /var/cache/conftool/dbconfig/20230724-140226-root.json
14:02 samtar@deploy1002: Finished scap: Backport for Revert "Run a synthetic test for client side preferences" (duration: 07m 20s)
14:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
13:56 samtar@deploy1002: samtar: Backport for Revert "Run a synthetic test for client side preferences" synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49676 and previous config saved to /var/cache/conftool/dbconfig/20230724-135604-root.json
13:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49675 and previous config saved to /var/cache/conftool/dbconfig/20230724-135557-root.json
13:54 samtar@deploy1002: Started scap: Backport for Revert "Run a synthetic test for client side preferences"
13:52 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
13:52 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
13:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
13:51 TheresNoTime: close UTC afternoon backport window
13:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
13:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
13:49 TheresNoTime: gerrit:939312 not synced. T336527 T339268
13:48 samtar@deploy1002: Sync cancelled.
13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49674 and previous config saved to /var/cache/conftool/dbconfig/20230724-134721-root.json
13:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1014.eqiad.wmnet with OS bullseye
13:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49673 and previous config saved to /var/cache/conftool/dbconfig/20230724-134059-root.json
13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49672 and previous config saved to /var/cache/conftool/dbconfig/20230724-134052-root.json
13:38 taavi: run `taavi@mwmaint1002 ~ $ mwscript namespaceDupes.php mywiktionary --fix` after purging null editing page #131577 for T342516
13:34 samtar@deploy1002: samtar and mabualruz: Backport for Run a synthetic test for client side preferences (T336527 T339268) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:33 samtar@deploy1002: Started scap: Backport for Run a synthetic test for client side preferences (T336527 T339268)
13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49671 and previous config saved to /var/cache/conftool/dbconfig/20230724-133217-root.json
13:31 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
13:31 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
13:30 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
13:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
13:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
13:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
13:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
13:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1014.eqiad.wmnet with OS bullseye
13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49669 and previous config saved to /var/cache/conftool/dbconfig/20230724-132555-root.json
13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49668 and previous config saved to /var/cache/conftool/dbconfig/20230724-132548-root.json
13:25 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript namespaceDupes.php mywiktionary --fix` T342516
13:25 samtar@deploy1002: Finished scap: Backport for add citations, concordance, rhymes, reconstruction, therasus, namespaces for mywiktionary (T342516) (duration: 21m 28s)
13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49667 and previous config saved to /var/cache/conftool/dbconfig/20230724-131712-root.json
13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49665 and previous config saved to /var/cache/conftool/dbconfig/20230724-131050-root.json
13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49664 and previous config saved to /var/cache/conftool/dbconfig/20230724-131043-root.json
13:05 samtar@deploy1002: anzx and samtar: Backport for add citations, concordance, rhymes, reconstruction, therasus, namespaces for mywiktionary (T342516) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:03 vgutierrez: depooling cp4052 for some ATS 9.2.1 testing - T339134
13:03 samtar@deploy1002: Started scap: Backport for add citations, concordance, rhymes, reconstruction, therasus, namespaces for mywiktionary (T342516)
13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49663 and previous config saved to /var/cache/conftool/dbconfig/20230724-130208-root.json
12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49662 and previous config saved to /var/cache/conftool/dbconfig/20230724-125545-root.json
12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49661 and previous config saved to /var/cache/conftool/dbconfig/20230724-125538-root.json
12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49660 and previous config saved to /var/cache/conftool/dbconfig/20230724-124703-root.json
12:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28458
12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49659 and previous config saved to /var/cache/conftool/dbconfig/20230724-124040-root.json
12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49658 and previous config saved to /var/cache/conftool/dbconfig/20230724-124034-root.json
12:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 28458
12:36 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49656 and previous config saved to /var/cache/conftool/dbconfig/20230724-123158-root.json
12:31 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49655 and previous config saved to /var/cache/conftool/dbconfig/20230724-122536-root.json
12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49654 and previous config saved to /var/cache/conftool/dbconfig/20230724-122529-root.json
12:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['rdb1013.eqiad.wmnet']
12:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['rdb1013.eqiad.wmnet']
12:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['rdb1014.eqiad.wmnet']
12:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['rdb1013.eqiad.wmnet']
12:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['rdb1014.eqiad.wmnet']
12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49653 and previous config saved to /var/cache/conftool/dbconfig/20230724-121653-root.json
12:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['rdb1013.eqiad.wmnet']
12:14 dcausse@deploy1002: Finished deploy [airflow-dags/search@e7b9253]: search: fix table name for wmf_raw.mediawiki_page (duration: 00m 12s)
12:14 dcausse@deploy1002: Started deploy [airflow-dags/search@e7b9253]: search: fix table name for wmf_raw.mediawiki_page
12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1187', diff saved to https://phabricator.wikimedia.org/P49652 and previous config saved to /var/cache/conftool/dbconfig/20230724-121329-root.json
12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49651 and previous config saved to /var/cache/conftool/dbconfig/20230724-121031-root.json
12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49650 and previous config saved to /var/cache/conftool/dbconfig/20230724-121024-root.json
12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2169 (s6, s7)', diff saved to https://phabricator.wikimedia.org/P49649 and previous config saved to /var/cache/conftool/dbconfig/20230724-120609-root.json
10:58 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:51 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on releases2002.codfw.wmnet,releases1002.eqiad.wmnet with reason: Decommissioning prep
10:51 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on releases2002.codfw.wmnet,releases1002.eqiad.wmnet with reason: Decommissioning prep
10:48 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:47 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:47 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:46 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:46 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:45 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:44 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:41 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
10:41 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940880 (T342211) to eqiad DC, only one left (disable keepalive on port 80 on A:cp)
10:41 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
10:39 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1005
10:39 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1005
09:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1124.eqiad.wmnet onto db1133.eqiad.wmnet
09:26 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940873 (T342211) to drmrs DC (disable keepalive on port 80 on A:cp-drmrs)
09:26 dcausse@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
09:24 dcausse@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
09:22 vgutierrez: rollback to trafficserver 9.1.4 in cp4052 - T339134
09:15 ladsgroup@cumin1001: START - Cookbook sre.mysql.clone of db1124.eqiad.wmnet onto db1133.eqiad.wmnet
09:13 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
09:12 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
09:08 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
09:08 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
09:03 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
09:01 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
09:00 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
08:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
08:58 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
08:57 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
08:56 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
08:54 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
08:45 vgutierrez: testing trafficserver 9.2.1 in cp4052 (upload node) - T339134
08:39 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
08:38 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
08:36 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
08:36 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
08:33 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
08:33 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
08:32 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
08:31 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
08:30 oblivian@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
08:30 oblivian@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
08:29 oblivian@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
08:28 oblivian@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
08:22 dcausse@deploy1002: Finished deploy [airflow-dags/search@a47bd0f]: search: Fix partition definition for wmf_raw.mediawiki_page_table (duration: 00m 12s)
08:22 dcausse@deploy1002: Started deploy [airflow-dags/search@a47bd0f]: search: Fix partition definition for wmf_raw.mediawiki_page_table
07:40 urbanecm@deploy1002: Finished scap: Backport for ChangeMentor: Refactor the notification conditions (T336875) (duration: 07m 02s)
07:33 urbanecm@deploy1002: Started scap: Backport for ChangeMentor: Refactor the notification conditions (T336875)
07:32 urbanecm@deploy1002: Finished scap: Backport for Add reassignMentees.php maintenance script (T330071) (duration: 14m 39s)
07:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
07:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
07:25 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
07:23 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
07:23 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
07:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
07:17 urbanecm@deploy1002: Started scap: Backport for Add reassignMentees.php maintenance script (T330071)
06:32 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
06:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
06:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
06:23 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.

2023-07-23

19:53 sukhe@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
19:53 sukhe@cumin2002: START - Cookbook sre.network.cf
01:15 sukhe@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
01:15 sukhe@cumin2002: START - Cookbook sre.network.cf

2023-07-21

21:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1149.eqiad.wmnet with OS bullseye
21:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1149.eqiad.wmnet with reason: host reimage
20:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1149.eqiad.wmnet with reason: host reimage
20:21 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1149.eqiad.wmnet with OS bullseye
20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1150.eqiad.wmnet with OS bullseye
20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:15 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:14 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:04 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
20:04 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
20:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1150.eqiad.wmnet with reason: host reimage
19:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1150.eqiad.wmnet with reason: host reimage
19:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1150.eqiad.wmnet with OS bullseye
19:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1151.eqiad.wmnet with OS bullseye
19:41 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1151.eqiad.wmnet with reason: host reimage
19:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1151.eqiad.wmnet with reason: host reimage
19:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1151.eqiad.wmnet with OS bullseye
19:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1152.eqiad.wmnet with OS bullseye
18:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1152.eqiad.wmnet with reason: host reimage
18:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1152.eqiad.wmnet with reason: host reimage
18:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1152.eqiad.wmnet with OS bullseye
18:16 dancy@deploy1002: Finished scap: Backport for Enable the CampaignEvents extension before loading CommonSettings-labs (T342452) (duration: 17m 31s)
18:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:00 dancy@deploy1002: daimona and dancy: Backport for Enable the CampaignEvents extension before loading CommonSettings-labs (T342452) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
17:59 dancy@deploy1002: Started scap: Backport for Enable the CampaignEvents extension before loading CommonSettings-labs (T342452)
17:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1152.eqiad.wmnet with reason: host reimage
17:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1152.eqiad.wmnet with reason: host reimage
17:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1152.eqiad.wmnet with OS bullseye
15:35 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 90 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Read-only DB
15:35 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 90 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Read-only DB
15:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.sonic-ssh (exit_code=0) for network device lsw1-e8-eqiad
15:11 ayounsi@cumin1001: START - Cookbook sre.network.sonic-ssh for network device lsw1-e8-eqiad
15:10 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
15:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
15:06 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:05 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:04 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:04 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:58 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1016.eqiad.wmnet with OS bookworm
14:55 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
14:55 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
14:54 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
14:50 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
14:37 sukhe: sudo ipmitool -I lanplus -H "lvs1016.mgmt.eqiad.wmnet" -U root -E chassis power off
14:17 sukhe: sudo ipmitool -I lanplus -H "lvs1016.mgmt.eqiad.wmnet" -U root -E chassis power cycle
14:14 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1016.eqiad.wmnet with OS bookworm
14:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
13:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb1013.mgmt.eqiad.wmnet with reboot policy FORCED
13:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb1014.mgmt.eqiad.wmnet with reboot policy FORCED
12:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host rdb1014.mgmt.eqiad.wmnet with reboot policy FORCED
12:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host rdb1013.mgmt.eqiad.wmnet with reboot policy FORCED
12:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host rdb1014
12:50 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host rdb1014
12:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host rdb1013
12:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host rdb1013
12:49 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:49 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt rdb101[34] - jclark@cumin1001"
12:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt rdb101[34] - jclark@cumin1001"
12:46 jclark@cumin1001: START - Cookbook sre.dns.netbox
12:39 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts analytics1075.eqiad.wmnet
12:38 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts analytics1075.eqiad.wmnet
12:38 jbond@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts analytics1075.eqiad.wmnet
12:37 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts analytics1075.eqiad.wmnet
12:35 jbond@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts analytics1075.eqiad.wmnet
12:35 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts analytics1075.eqiad.wmnet
12:14 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1016.eqiad.wmnet with OS bookworm
12:03 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
11:47 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1016.eqiad.wmnet with OS bookworm
11:30 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1005.eqiad.wmnet with OS bullseye
10:49 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
10:49 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1016.eqiad.wmnet with OS bookworm
10:34 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
10:33 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1016.eqiad.wmnet with OS bookworm
10:30 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
10:27 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1015.eqiad.wmnet with OS bookworm
10:27 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
10:26 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
10:16 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
10:13 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
10:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1015.eqiad.wmnet with reason: host reimage
10:08 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1015.eqiad.wmnet with reason: host reimage
09:58 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bullseye
09:58 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1005.eqiad.wmnet with OS bullseye
09:58 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1015.eqiad.wmnet with OS bookworm
09:57 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1015.eqiad.wmnet with OS bookworm
09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
09:53 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1015.eqiad.wmnet with OS bookworm
09:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
09:50 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1014.eqiad.wmnet with OS bookworm
09:50 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
09:47 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
09:36 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bullseye
09:33 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1014.eqiad.wmnet with reason: host reimage
09:30 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1014.eqiad.wmnet with reason: host reimage
09:19 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1014.eqiad.wmnet with OS bookworm
09:19 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1014.eqiad.wmnet with OS bookworm
09:09 jayme: enable puppet on C:confd - T341669
09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49645 and previous config saved to /var/cache/conftool/dbconfig/20230721-090625-root.json
09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49644 and previous config saved to /var/cache/conftool/dbconfig/20230721-090003-root.json
08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49643 and previous config saved to /var/cache/conftool/dbconfig/20230721-085955-root.json
08:59 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1014.eqiad.wmnet with OS bookworm
08:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1108.eqiad.wmnet with reason: db1108 has been replaced with db1208 - leaving for a few days before decom
08:56 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1108.eqiad.wmnet with reason: db1108 has been replaced with db1208 - leaving for a few days before decom
08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudcontrol1005 - aborrero@cumin1001 - T341495"
08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49642 and previous config saved to /var/cache/conftool/dbconfig/20230721-085120-root.json
08:48 jayme: ignore "disabling puppet in C:cumin" - was a typo
08:47 jayme: disabling puppet in C:confd - T341669
08:47 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudcontrol1005 - aborrero@cumin1001 - T341495"
08:47 jayme: disabling puppet in C:cumin - T341669
08:45 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudcontrol1005 - aborrero@cumin1001 - T341495"
08:45 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudcontrol1005 - aborrero@cumin1001 - T341495"
08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49641 and previous config saved to /var/cache/conftool/dbconfig/20230721-084459-root.json
08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49640 and previous config saved to /var/cache/conftool/dbconfig/20230721-084450-root.json
08:44 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:43 aborrero@cumin1001: START - Cookbook sre.dns.netbox
08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49639 and previous config saved to /var/cache/conftool/dbconfig/20230721-083616-root.json
08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49638 and previous config saved to /var/cache/conftool/dbconfig/20230721-082954-root.json
08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49637 and previous config saved to /var/cache/conftool/dbconfig/20230721-082946-root.json
08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49636 and previous config saved to /var/cache/conftool/dbconfig/20230721-082111-root.json
08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49635 and previous config saved to /var/cache/conftool/dbconfig/20230721-081449-root.json
08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49634 and previous config saved to /var/cache/conftool/dbconfig/20230721-081441-root.json
08:10 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1005.eqiad.wmnet with OS bullseye
08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49633 and previous config saved to /var/cache/conftool/dbconfig/20230721-080606-root.json
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49632 and previous config saved to /var/cache/conftool/dbconfig/20230721-075944-root.json
07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49631 and previous config saved to /var/cache/conftool/dbconfig/20230721-075936-root.json
07:57 zabe@deploy1002: Finished scap: T342405 (duration: 07m 03s)
07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49630 and previous config saved to /var/cache/conftool/dbconfig/20230721-075101-root.json
07:50 zabe@deploy1002: Started scap: T342405
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49629 and previous config saved to /var/cache/conftool/dbconfig/20230721-074440-root.json
07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49628 and previous config saved to /var/cache/conftool/dbconfig/20230721-074431-root.json
07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49627 and previous config saved to /var/cache/conftool/dbconfig/20230721-073557-root.json
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49626 and previous config saved to /var/cache/conftool/dbconfig/20230721-072935-root.json
07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49625 and previous config saved to /var/cache/conftool/dbconfig/20230721-072927-root.json
07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49624 and previous config saved to /var/cache/conftool/dbconfig/20230721-072052-root.json
07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1201', diff saved to https://phabricator.wikimedia.org/P49623 and previous config saved to /var/cache/conftool/dbconfig/20230721-071623-root.json
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49622 and previous config saved to /var/cache/conftool/dbconfig/20230721-071430-root.json
07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49621 and previous config saved to /var/cache/conftool/dbconfig/20230721-071422-root.json
07:12 marostegui: Upgrade dbstore1005 to mariadb 10.6 T334652
07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2171 (s5 and s6)', diff saved to https://phabricator.wikimedia.org/P49620 and previous config saved to /var/cache/conftool/dbconfig/20230721-070110-root.json
06:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3209
06:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3209
06:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398203
06:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398203
06:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 139418
06:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 139418
06:36 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 139148
06:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 139148
04:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:01:00 on 10 hosts with reason: trying to remove downtime on these new hosts
04:00 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 0:01:00 on 10 hosts with reason: trying to remove downtime on these new hosts
03:51 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs202([1-2])\.codfw\.wmnet
03:50 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=active; selector: name=wdqs222([0-1])\.codfw\.wmnet
00:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1153.eqiad.wmnet with OS bullseye
00:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1153.eqiad.wmnet with reason: host reimage

2023-07-20

23:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1153.eqiad.wmnet with reason: host reimage
23:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1153.eqiad.wmnet with OS bullseye
23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1154.eqiad.wmnet with OS bullseye
23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1154.eqiad.wmnet with reason: host reimage
23:13 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1154.eqiad.wmnet with reason: host reimage
22:54 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1154.eqiad.wmnet with OS bullseye
22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1155.eqiad.wmnet with OS bullseye
22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:47 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1155.eqiad.wmnet with reason: host reimage
22:29 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1155.eqiad.wmnet with reason: host reimage
22:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1155.eqiad.wmnet with OS bullseye
21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1156.eqiad.wmnet with OS bullseye
21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1156.eqiad.wmnet with reason: host reimage
21:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1156.eqiad.wmnet with reason: host reimage
21:18 hashar@deploy1002: Finished deploy [integration/docroot@0e476e5]: Tweak Zuul status page css 🥚 (duration: 00m 07s)
21:18 hashar@deploy1002: Started deploy [integration/docroot@0e476e5]: Tweak Zuul status page css 🥚
21:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1156.eqiad.wmnet with OS bullseye
20:25 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
20:25 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
20:17 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
18:44 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.18 refs T340246
18:44 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2020.codfw.wmnet
18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs20{20}.codfw.wmnet
18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2019.codfw.wmnet
18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2018.codfw.wmnet
18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2017.codfw.wmnet
18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2016.codfw.wmnet
18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2015.codfw.wmnet
18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2014.codfw.wmnet
18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2013.codfw.wmnet
18:41 bking@cumin1001: conftool action : set/pooled=yes,set/weight=10; selector: name=wdqs2013-19.codfw.wmnet
18:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
18:38 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
18:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
18:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk1002.eqiad.wmnet with OS bookworm
17:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye
17:39 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk1002.eqiad.wmnet with reason: host reimage
17:36 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk1002.eqiad.wmnet with reason: host reimage
17:25 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1002.eqiad.wmnet with OS bookworm
17:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
17:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
17:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk1001.eqiad.wmnet with OS bookworm
17:21 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm
17:21 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
17:12 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
17:09 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
17:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk1001.eqiad.wmnet with reason: host reimage
16:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
16:56 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk1001.eqiad.wmnet with reason: host reimage
16:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
16:53 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
16:52 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
16:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
16:49 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts analytics1075.eqiad.wmnet
16:49 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts analytics1075.eqiad.wmnet
16:49 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940190 (T342211) to codfw DC (disable keepalive on port 80 on A:cp-codfw)
16:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1073.eqiad.wmnet with OS bullseye
16:43 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
16:42 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1001.eqiad.wmnet with OS bookworm
16:41 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1013.eqiad.wmnet with OS bookworm
16:40 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
16:38 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
16:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bullseye
16:37 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1005.eqiad.wmnet with OS bullseye
16:31 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bullseye
16:22 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1005
16:21 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1005
16:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye
16:21 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:21 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005 - aborrero@cumin1001"
16:20 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005 - aborrero@cumin1001"
16:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1073.eqiad.wmnet with reason: host reimage
16:18 aborrero@cumin1001: START - Cookbook sre.dns.netbox
16:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1073.eqiad.wmnet with reason: host reimage
16:15 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
16:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1073.eqiad.wmnet with OS bullseye
16:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
16:03 aborrero@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcontrol1005
16:03 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1005
16:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
15:51 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
15:51 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
15:50 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
15:49 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
15:48 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
15:48 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
15:48 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
15:46 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
15:46 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
15:46 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
15:31 elukey: stop kafka main eqiad maintenance - T341558
15:20 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
15:16 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk1003.eqiad.wmnet with OS bookworm
15:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1073.eqiad.wmnet with reason: host reimage
15:08 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1073.eqiad.wmnet with reason: host reimage
15:07 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
15:06 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bullseye
14:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1075.eqiad.wmnet with OS bullseye
14:58 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940150 (T342211) to ulsfo DC (disable keepalive on port 80 on A:cp-ulsfo)
14:56 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:56 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk1003.eqiad.wmnet with reason: host reimage
14:51 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk1003.eqiad.wmnet with reason: host reimage
14:51 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
14:50 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1073.eqiad.wmnet with OS bullseye
14:45 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts analytics1073.eqiad.wmnet
14:45 herron: roll restart codfw/eqiad low-traffic pybals to add prometheus-https T326657
14:45 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts analytics1073.eqiad.wmnet
14:41 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:41 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:38 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
14:38 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1003.eqiad.wmnet with OS bookworm
14:37 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
14:36 sukhe: run agent on cumin -b1 -s30 'A:dns-rec and not P{dns4004*}'
14:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1075.eqiad.wmnet with reason: host reimage
14:32 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host flink-zk1003.eqiad.wmnet with OS bookworm
14:31 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1075.eqiad.wmnet with reason: host reimage
14:30 sukhe: disable puppet on A:dns-rec to slowly roll out CR 937991
14:15 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1075.eqiad.wmnet with OS bullseye
14:14 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
14:13 sukhe: dns1004 upgrade to pdns-rec 4.8.4: T341611
14:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
14:04 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
14:04 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
14:01 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
13:59 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
13:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:57 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
13:57 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
13:55 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
13:55 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
13:54 bking@cumin1001: START - Cookbook sre.dns.netbox
13:54 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
13:53 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:52 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:51 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:48 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:47 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:45 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
13:32 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
13:31 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
13:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for vlan ints lsw1-f8-eqiad - cmooney@cumin1001"
13:21 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for vlan ints lsw1-f8-eqiad - cmooney@cumin1001"
13:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:18 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
13:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:17 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
13:12 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
13:10 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:10 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for vlan ints lsw1-f8-eqiad - cmooney@cumin1001"
13:09 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for vlan ints lsw1-f8-eqiad - cmooney@cumin1001"
13:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
13:00 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
12:56 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
12:44 topranks: LDAP - adding user ifrahkh to groups wmde & nda
12:43 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
12:43 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
12:24 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
12:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for assigned switch gw ips. - cmooney@cumin1001"
12:20 zabe@deploy1002: Finished scap: Backport for SpecialUserRights: Check for username to be temporary (T340468 T342322) (duration: 08m 22s)
12:15 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for assigned switch gw ips. - cmooney@cumin1001"
12:13 zabe@deploy1002: zabe and dreamyjazz: Backport for SpecialUserRights: Check for username to be temporary (T340468 T342322) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
12:13 cmooney@cumin1001: START - Cookbook sre.dns.netbox
12:12 zabe@deploy1002: Started scap: Backport for SpecialUserRights: Check for username to be temporary (T340468 T342322)
11:53 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:53 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for assigned switch loopbacks. - cmooney@cumin1001"
11:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for assigned switch loopbacks. - cmooney@cumin1001"
11:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
11:35 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940101 (T342211) to eqsin DC (disable keepalive on port 80 on A:cp-eqsin)
10:53 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:40 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
10:33 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
09:50 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:50 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove reverse dns for IP allocated in error. - cmooney@cumin1001"
09:26 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw1357.eqiad.wmnet
09:25 filippo@cumin1001: conftool action : set/weight=10; selector: name=mw1357.eqiad.wmnet
09:25 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw1356.eqiad.wmnet
09:25 filippo@cumin1001: conftool action : set/weight=10; selector: name=mw1356.eqiad.wmnet
09:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove reverse dns for IP allocated in error. - cmooney@cumin1001"
09:19 filippo@cumin1001: conftool action : set/pooled=no; selector: name=mw1357.eqiad.wmnet
09:19 filippo@cumin1001: conftool action : set/pooled=no; selector: name=mw1356.eqiad.wmnet
09:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
09:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:17 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove reverse dns for IP allocated in error. - cmooney@cumin1001"
09:15 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove reverse dns for IP allocated in error. - cmooney@cumin1001"
09:12 cmooney@cumin1001: START - Cookbook sre.dns.netbox
08:31 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940091 (T342211) to esams DC (disable keepalive on port 80)
08:29 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
08:27 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
08:25 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
08:24 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
08:24 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
08:23 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
08:21 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
08:21 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
07:56 apergos: UTC morning backport and config training window really complete
07:50 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
07:50 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
07:40 apergos: UTC morning backport and config training window reopened for fix to the last noc patch
07:37 apergos: UTC morning backport and config training window complete
07:36 ariel@deploy1002: Finished scap: Backport for noc/db.php: use the new etcd fetch function (T341859) (duration: 09m 14s)
07:29 ariel@deploy1002: oblivian and ariel: Backport for noc/db.php: use the new etcd fetch function (T341859) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:27 ariel@deploy1002: Started scap: Backport for noc/db.php: use the new etcd fetch function (T341859)
07:25 ariel@deploy1002: Finished scap: Backport for noc: add script to dump etcd db config (T341859) (duration: 09m 35s)
07:17 ariel@deploy1002: oblivian and ariel: Backport for noc: add script to dump etcd db config (T341859) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:16 ariel@deploy1002: Started scap: Backport for noc: add script to dump etcd db config (T341859)
07:12 ariel@deploy1002: Finished scap: Backport for Enable EditInSequence in pawikisource (duration: 09m 52s)
07:04 ariel@deploy1002: ariel and soda: Backport for Enable EditInSequence in pawikisource synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:02 ariel@deploy1002: Started scap: Backport for Enable EditInSequence in pawikisource
06:37 elukey: start kafka main eqiad maintenance (partitions rebalancing) - T341558
04:33 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=parse1002.*
04:28 eileen: civicrm upgraded from 0cde2608 to d7c8d77e
01:46 tstarling@deploy1002: Synchronized php-1.41.0-wmf.18/includes/diff/DifferenceEngine.php: fix prod error T342099, T341961 (duration: 08m 32s)
01:35 tstarling@deploy1002: Synchronized php-1.41.0-wmf.17/includes/diff/DifferenceEngine.php: fix prod error T342099, T341961 (duration: 09m 20s)

2023-07-19

22:36 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
22:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1003.eqiad.wmnet with OS bookworm
22:08 eileen: civicrm upgraded from 7642b3d9 to 0cde2608
21:41 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
21:38 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
21:37 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
21:37 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
21:37 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
21:37 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:37 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
21:36 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
21:32 bking@cumin1001: START - Cookbook sre.dns.netbox
21:32 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
21:27 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 01m 05s)
21:26 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
21:21 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 01m 47s)
21:20 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
20:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk1003.eqiad.wmnet
20:55 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:55 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
20:54 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
20:43 bking@cumin1001: START - Cookbook sre.dns.netbox
20:39 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk1003.eqiad.wmnet
20:39 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
20:39 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
20:38 bking@cumin1001: START - Cookbook sre.dns.netbox
20:38 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
20:33 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
20:33 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1003.eqiad.wmnet with OS bookworm
20:31 TheresNoTime: backport window closed
20:28 samtar@deploy1002: Finished scap: Backport for Replace underscores with spaces in 4 Arabic sitenames (T337725) (duration: 17m 09s)
20:26 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:26 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse dns for spine linknets eqiad - cmooney@cumin1001"
20:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse dns for spine linknets eqiad - cmooney@cumin1001"
20:22 cmooney@cumin1001: START - Cookbook sre.dns.netbox
20:12 samtar@deploy1002: samtar and hubaishan: Backport for Replace underscores with spaces in 4 Arabic sitenames (T337725) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:11 samtar@deploy1002: Started scap: Backport for Replace underscores with spaces in 4 Arabic sitenames (T337725)
19:45 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling reboot on A:durum and A:durum
19:42 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
19:41 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
19:40 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
19:40 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
19:40 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
19:40 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:40 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
19:39 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
19:37 bking@cumin1001: START - Cookbook sre.dns.netbox
19:37 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
19:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk1003.eqiad.wmnet
19:36 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:36 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
19:35 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
19:28 bking@cumin1001: START - Cookbook sre.dns.netbox
19:24 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk1003.eqiad.wmnet
19:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
18:58 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
18:41 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling reboot on A:durum and A:durum
18:29 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough and A:wikidough
18:25 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.18 refs T340246
17:58 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
17:50 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1013.eqiad.wmnet with OS bullseye
17:49 Amir1: powercycled db1218 (T342284)
17:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1218.eqiad.wmnet with reason: Maint
17:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1218.eqiad.wmnet with reason: Maint
17:41 robh@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bullseye
17:40 sukhe@cumin2002: dbctl commit (dc=all): 'Depool db1218', diff saved to https://phabricator.wikimedia.org/P49603 and previous config saved to /var/cache/conftool/dbconfig/20230719-174019-sukhe.json
17:40 sukhe: depool db1218
17:32 sukhe: dummy run of authdns-update
17:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1011.eqiad.wmnet
17:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1011.eqiad.wmnet
17:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1010.eqiad.wmnet
17:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5004.wikimedia.org
17:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1010.eqiad.wmnet
17:15 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns5004.wikimedia.org
17:09 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough and A:wikidough
17:02 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
17:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1009.eqiad.wmnet
16:56 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
16:56 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
16:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1009.eqiad.wmnet
16:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
16:47 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
16:46 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
16:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
16:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
16:30 joal@deploy1002: Finished deploy [airflow-dags/analytics@4c06501]: Fix bug introduced in cassandra loading jobs (duration: 00m 15s)
16:29 joal@deploy1002: Started deploy [airflow-dags/analytics@4c06501]: Fix bug introduced in cassandra loading jobs
16:26 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:26 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for e4 mgmt entries - cmooney@cumin1001"
16:25 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for e4 mgmt entries - cmooney@cumin1001"
16:21 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
16:20 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
16:20 cmooney@cumin1001: START - Cookbook sre.dns.netbox
16:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for e4 mgmt entries - cmooney@cumin1001"
16:17 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for e4 mgmt entries - cmooney@cumin1001"
16:14 cmooney@cumin1001: START - Cookbook sre.dns.netbox
15:55 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
15:53 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
15:46 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
15:44 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
15:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
15:43 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
15:40 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1073.eqiad.wmnet with OS bullseye
15:35 apergos: dumpsdata1007 is now the fallback host for sql/xml dumps and for misc dumps. dumpsdata1004, the former fallback host, is now a spare.
15:34 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1075.eqiad.wmnet with OS bullseye
15:32 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
15:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
15:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
15:28 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
15:26 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:26 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: trying to resolve netbox issues - sukhe@cumin2002"
15:25 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: trying to resolve netbox issues - sukhe@cumin2002"
15:23 sukhe@cumin2002: START - Cookbook sre.dns.netbox
15:23 sukhe@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:23 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
15:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
15:19 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
15:18 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
15:18 sukhe@cumin2002: START - Cookbook sre.dns.netbox
15:14 robh: mw140[89] downtime for relocation per T308339
15:13 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
15:11 robh: mw141[01] returned to service per T308339
15:11 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
15:11 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:11 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw1411
15:11 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host mw1411
15:09 bking@cumin1001: START - Cookbook sre.dns.netbox
15:09 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
15:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
15:03 fabfur: disabling keepalive on port 80 for cp5024 https://gerrit.wikimedia.org/r/939707 (T342211)
14:59 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
14:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
14:58 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
14:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
14:54 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcontrol1005 - jclark@cumin1001"
14:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcontrol1005 - jclark@cumin1001"
14:51 jclark@cumin1001: START - Cookbook sre.dns.netbox
14:49 robh: mw141[23] returned to service per T308339. ignore typo of mw1414 it is uninvolved
14:48 robh: mw141[34] returned to service per T308339
14:40 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw1412
14:39 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host mw1412
14:39 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw1413
14:38 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host mw1413
14:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5003.wikimedia.org
14:35 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:33 jclark@cumin1001: START - Cookbook sre.dns.netbox
14:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns5003.wikimedia.org
14:30 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
14:30 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:28 bking@cumin1001: START - Cookbook sre.dns.netbox
14:28 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
14:21 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
14:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
14:19 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1073.eqiad.wmnet with OS bullseye
14:16 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
14:16 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:14 bking@cumin1001: START - Cookbook sre.dns.netbox
14:14 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1075.eqiad.wmnet with OS bullseye
14:14 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
14:07 robh: mw141[23] downtimes and relocating per T308339
13:54 Lucas_WMDE: pulled tests: Test setting names (T342249) to deploy1002 (no scap sync needed, tests-only change)
13:46 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
13:46 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
13:46 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
13:46 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:46 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
13:45 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
13:42 bking@cumin1001: START - Cookbook sre.dns.netbox
13:42 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
13:42 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
13:42 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:42 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
13:42 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
13:39 bking@cumin1001: START - Cookbook sre.dns.netbox
13:39 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
13:31 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
13:29 fabfur: aborted previous operations, no need to disable puppet to apply that CR (https://gerrit.wikimedia.org/r/c/operations/puppet/+/939661) (T342211)
13:27 fabfur: temporary disable puppet on cp3052 to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/939661 (T342211)
13:26 Lucas_WMDE: UTC afternoon backport+config window done
13:15 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
13:13 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix) (T318433) (duration: 10m 47s)
13:04 lucaswerkmeister-wmde@deploy1002: ssastry and lucaswerkmeister-wmde: Backport for Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix) (T318433) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:02 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix) (T318433)
12:43 joal@deploy1002: Finished deploy [airflow-dags/analytics@87be328]: Refactor cassandra loading jobs (duration: 00m 14s)
12:43 joal@deploy1002: Started deploy [airflow-dags/analytics@87be328]: Refactor cassandra loading jobs
12:27 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/services/ipoid: apply
12:27 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/services/ipoid: apply
12:22 jbond: switch puppertboard.wikimedia.oreg to use puppet7 infrastructre
12:22 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
12:22 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
12:17 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet on all recursors
12:17 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet on all recursors
12:17 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard-next.discovery.wmnet on all recursors
12:17 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard-next.discovery.wmnet on all recursors
11:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1016.eqiad.wmnet
11:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
11:47 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
11:45 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
11:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1016.eqiad.wmnet
11:13 jebe@deploy1002: Finished deploy [analytics/refinery@eaabff2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@eaabff2] (duration: 01m 43s)
11:12 jebe@deploy1002: Started deploy [analytics/refinery@eaabff2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@eaabff2]
11:11 jebe@deploy1002: Finished deploy [analytics/refinery@eaabff2] (thin): Regular analytics weekly train THIN [analytics/refinery@eaabff2] (duration: 00m 04s)
11:11 jebe@deploy1002: Started deploy [analytics/refinery@eaabff2] (thin): Regular analytics weekly train THIN [analytics/refinery@eaabff2]
11:09 jebe@deploy1002: Finished deploy [analytics/refinery@eaabff2]: Regular analytics weekly train [analytics/refinery@eaabff2] (duration: 10m 24s)
10:59 jebe@deploy1002: Started deploy [analytics/refinery@eaabff2]: Regular analytics weekly train [analytics/refinery@eaabff2]
10:02 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:54 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
09:54 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
09:50 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
09:48 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
09:43 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:14 btullis@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 04s)
09:14 btullis@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49599 and previous config saved to /var/cache/conftool/dbconfig/20230719-091205-root.json
09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49598 and previous config saved to /var/cache/conftool/dbconfig/20230719-090328-root.json
08:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49597 and previous config saved to /var/cache/conftool/dbconfig/20230719-085700-root.json
08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49596 and previous config saved to /var/cache/conftool/dbconfig/20230719-084823-root.json
08:45 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49595 and previous config saved to /var/cache/conftool/dbconfig/20230719-084156-root.json
08:38 dcausse: closing the UTC morning backport window
08:37 dcausse@deploy1002: Finished scap: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate (duration: 07m 59s)
08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49594 and previous config saved to /var/cache/conftool/dbconfig/20230719-083319-root.json
08:30 dcausse@deploy1002: dcausse: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
08:29 dcausse@deploy1002: Started scap: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate
08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49593 and previous config saved to /var/cache/conftool/dbconfig/20230719-082651-root.json
08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49592 and previous config saved to /var/cache/conftool/dbconfig/20230719-081814-root.json
08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49591 and previous config saved to /var/cache/conftool/dbconfig/20230719-081146-root.json
08:10 dcausse@deploy1002: Finished scap: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate (duration: 07m 36s)
08:04 dcausse@deploy1002: dcausse: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49590 and previous config saved to /var/cache/conftool/dbconfig/20230719-080309-root.json
08:02 dcausse@deploy1002: Started scap: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate
07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49589 and previous config saved to /var/cache/conftool/dbconfig/20230719-075642-root.json
07:54 _joe_: ran scap pull, pool on parse1002 after powercycling
07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49588 and previous config saved to /var/cache/conftool/dbconfig/20230719-074804-root.json
07:47 _joe_: powercycling parse1002, console blank, unreachable to network
07:46 dcausse@deploy1002: Backport cancelled.
07:45 oblivian@cumin1001: conftool action : set/pooled=inactive; selector: name=parse1002.eqiad.wmnet
07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49587 and previous config saved to /var/cache/conftool/dbconfig/20230719-074137-root.json
07:36 dcausse@deploy1002: Finished scap: Backport for Add channel for TtmServerMessageUpdate of Translate extension (duration: 17m 44s)
07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49586 and previous config saved to /var/cache/conftool/dbconfig/20230719-073300-root.json
07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49585 and previous config saved to /var/cache/conftool/dbconfig/20230719-072632-root.json
07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P49584 and previous config saved to /var/cache/conftool/dbconfig/20230719-072207-root.json
07:20 dcausse@deploy1002: dcausse and abi: Backport for Add channel for TtmServerMessageUpdate of Translate extension synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
07:18 dcausse@deploy1002: Started scap: Backport for Add channel for TtmServerMessageUpdate of Translate extension
07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49583 and previous config saved to /var/cache/conftool/dbconfig/20230719-071755-root.json
07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2158', diff saved to https://phabricator.wikimedia.org/P49582 and previous config saved to /var/cache/conftool/dbconfig/20230719-071204-root.json
06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49581 and previous config saved to /var/cache/conftool/dbconfig/20230719-062313-root.json
06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49580 and previous config saved to /var/cache/conftool/dbconfig/20230719-060809-root.json
05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49579 and previous config saved to /var/cache/conftool/dbconfig/20230719-055304-root.json
05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49578 and previous config saved to /var/cache/conftool/dbconfig/20230719-053759-root.json
05:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49577 and previous config saved to /var/cache/conftool/dbconfig/20230719-052254-root.json
05:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49576 and previous config saved to /var/cache/conftool/dbconfig/20230719-050750-root.json
04:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49575 and previous config saved to /var/cache/conftool/dbconfig/20230719-045245-root.json
04:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49574 and previous config saved to /var/cache/conftool/dbconfig/20230719-043740-root.json
00:16 eileen: civicrm upgraded from 67c526e7 to 7642b3d9

2023-07-18

22:51 brett@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on P{doh5002*} and A:wikidough
22:44 brett@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on P{doh5002*} and A:wikidough
22:34 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
22:34 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
22:32 bking@cumin1001: START - Cookbook sre.dns.netbox
22:32 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
22:24 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
22:24 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
22:24 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
22:24 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:24 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
22:23 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
22:18 bking@cumin1001: START - Cookbook sre.dns.netbox
22:18 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
22:18 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
22:18 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:18 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
22:16 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
22:12 bking@cumin1001: START - Cookbook sre.dns.netbox
22:12 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
22:06 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
22:06 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1003.eqiad.wmnet with OS bookworm
21:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host analytics1073.eqiad.wmnet
21:22 urbanecm@deploy1002: Finished scap: Backport for Don't log for documentElement (nodeType 9) (T340081) (duration: 07m 42s)
21:15 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Don't log for documentElement (nodeType 9) (T340081) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
21:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
21:15 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
21:14 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
21:14 urbanecm@deploy1002: Started scap: Backport for Don't log for documentElement (nodeType 9) (T340081)
21:14 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
21:14 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
21:14 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:14 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
21:13 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
21:10 bking@cumin1001: START - Cookbook sre.dns.netbox
21:10 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
21:03 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1002.eqiad.wmnet
21:03 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1002.eqiad.wmnet on all recursors
21:03 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1002.eqiad.wmnet on all recursors
21:03 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:03 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
21:02 urbanecm@deploy1002: Finished scap: Backport for Don't log for documentElement (nodeType 9) (T340081) (duration: 10m 01s)
21:02 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
20:54 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Don't log for documentElement (nodeType 9) (T340081) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:52 urbanecm@deploy1002: Started scap: Backport for Don't log for documentElement (nodeType 9) (T340081)
20:48 btullis@cumin1001: START - Cookbook sre.hosts.dhcp for host analytics1073.eqiad.wmnet
20:43 bking@cumin1001: START - Cookbook sre.dns.netbox
20:43 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1002.eqiad.wmnet on all recursors
20:43 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1002.eqiad.wmnet on all recursors
20:43 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:43 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
20:41 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
20:28 urbanecm@deploy1002: Finished scap: Backport for Add additional debugging closest bug (T340081), Add additional debugging closest bug (T340081), Fixes: Mobile login watermark large and uncentered (T341812) (duration: 10m 28s)
20:26 bking@cumin1001: START - Cookbook sre.dns.netbox
20:26 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1002.eqiad.wmnet
20:19 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Add additional debugging closest bug (T340081), Add additional debugging closest bug (T340081), Fixes: Mobile login watermark large and uncentered (T341812) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes de
20:17 urbanecm@deploy1002: Started scap: Backport for Add additional debugging closest bug (T340081), Add additional debugging closest bug (T340081), Fixes: Mobile login watermark large and uncentered (T341812)
20:16 urbanecm@deploy1002: Finished scap: Backport for Deploy new logos (T341260 T341243 T341912) (duration: 09m 50s)
20:07 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Deploy new logos (T341260 T341243 T341912) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:06 urbanecm@deploy1002: Started scap: Backport for Deploy new logos (T341260 T341243 T341912)
19:53 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1075.eqiad.wmnet']
19:53 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1075.eqiad.wmnet']
19:53 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
19:52 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
19:50 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
19:49 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
19:49 btullis@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['analytics1073.mgmt.eqiad.wmnet']
19:49 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.mgmt.eqiad.wmnet']
18:57 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 18s)
18:57 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
18:54 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
18:54 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
18:51 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 17s)
18:51 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
18:16 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1002.eqiad.wmnet
18:16 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1002.eqiad.wmnet with OS bookworm
18:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.18 refs T340246
17:46 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
17:45 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1075.eqiad.wmnet with OS bullseye
17:30 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1016
17:30 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs1016
17:29 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:29 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016 relocation - robh@cumin1001"
17:29 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016 relocation - robh@cumin1001"
17:27 robh@cumin1001: START - Cookbook sre.dns.netbox
17:25 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1002.eqiad.wmnet with OS bookworm
17:21 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
17:20 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
17:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1002.eqiad.wmnet on all recursors
17:19 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1002.eqiad.wmnet on all recursors
17:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:19 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
17:19 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough-drmrs and A:wikidough
17:19 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
17:16 bking@cumin1001: START - Cookbook sre.dns.netbox
17:16 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1002.eqiad.wmnet
17:07 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
17:04 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough-drmrs and A:wikidough
17:02 dancy@deploy1002: Installation of scap version "4.55.0" completed for 605 hosts
17:01 dancy@deploy1002: Installing scap version "4.55.0" for 605 hosts
16:33 dancy@deploy1002: Pruned MediaWiki: 1.41.0-wmf.16 (duration: 02m 11s)
16:30 dancy@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.18 refs T340246 (duration: 46m 15s)
16:28 elukey: maintenance finished for kafka main-codfw
16:24 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1075.eqiad.wmnet with OS bullseye
16:03 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1015
16:03 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs1015
16:03 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1014
16:03 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:03 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs10145 relocation - robh@cumin1001"
16:03 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs1014
16:02 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs10145 relocation - robh@cumin1001"
16:01 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bullseye
16:00 robh@cumin1001: START - Cookbook sre.dns.netbox
15:44 dancy@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.18 refs T340246
15:31 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs1016.eqiad.wmnet
15:31 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:31 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
15:31 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
15:28 fabfur@cumin1001: START - Cookbook sre.dns.netbox
15:23 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts lvs1016.eqiad.wmnet
15:22 robh@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bullseye
15:21 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1013
15:21 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs1013
15:20 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bullseye
15:18 robh@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bullseye
15:08 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host lvs1013.mgmt.eqiad.wmnet with reboot policy FORCED
15:02 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs1015.eqiad.wmnet
15:02 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:01 fabfur@cumin1001: START - Cookbook sre.dns.netbox
15:00 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs1013.mgmt.eqiad.wmnet with reboot policy FORCED
14:57 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:57 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1013 relocation - robh@cumin1001"
14:56 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1013 relocation - robh@cumin1001"
14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1198', diff saved to https://phabricator.wikimedia.org/P49571 and previous config saved to /var/cache/conftool/dbconfig/20230718-145529-root.json
14:54 robh@cumin1001: START - Cookbook sre.dns.netbox
14:54 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts lvs1015.eqiad.wmnet
14:45 sukhe: dns2004 upgrade to pdns-rec 4.8.4: T341611
14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.3 - ayounsi@cumin1001
14:37 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs1014.eqiad.wmnet
14:37 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:37 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
14:36 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
14:36 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.3 - ayounsi@cumin1001
14:35 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
14:35 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
14:34 fabfur@cumin1001: START - Cookbook sre.dns.netbox
14:34 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
14:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
14:33 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
14:30 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
14:29 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts lvs1014.eqiad.wmnet
14:22 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs1013.eqiad.wmnet
14:22 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:22 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
14:22 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
14:19 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1001.eqiad.wmnet
14:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1001.eqiad.wmnet on all recursors
14:19 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1001.eqiad.wmnet on all recursors
14:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:19 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
14:18 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
14:16 bking@cumin1001: START - Cookbook sre.dns.netbox
14:16 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1001.eqiad.wmnet on all recursors
14:16 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1001.eqiad.wmnet on all recursors
14:16 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:16 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
14:12 fabfur@cumin1001: START - Cookbook sre.dns.netbox
14:10 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
14:05 XioNoX: asw2-esams# set interfaces xe-4/0/4 disable - T342121
14:04 jforrester@deploy1002: Finished scap: Backport for private/readme.php: Add $wgCampaignEventsProgramsAndEventsDashboardAPISecret (T320260) (duration: 08m 04s)
14:04 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts lvs1013.eqiad.wmnet
13:58 jforrester@deploy1002: jforrester and daimona: Backport for private/readme.php: Add $wgCampaignEventsProgramsAndEventsDashboardAPISecret (T320260) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:56 bking@cumin1001: START - Cookbook sre.dns.netbox
13:56 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1001.eqiad.wmnet
13:56 jforrester@deploy1002: Started scap: Backport for private/readme.php: Add $wgCampaignEventsProgramsAndEventsDashboardAPISecret (T320260)
13:55 jforrester@deploy1002: Finished scap: Backport for prod: Enable wgCampaignEventsProgramsAndEventsDashboardInstance (T320260) (duration: 21m 19s)
13:43 jbond: upload python3-conftool_2.2.2-1+deb12u1
13:43 jbond: upload python3-conftool_2.2.2-1
13:43 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
13:42 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
13:42 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
13:41 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
13:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:39 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:35 jforrester@deploy1002: daimona and jforrester: Backport for prod: Enable wgCampaignEventsProgramsAndEventsDashboardInstance (T320260) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:35 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 03s)
13:35 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
13:34 jforrester@deploy1002: Started scap: Backport for prod: Enable wgCampaignEventsProgramsAndEventsDashboardInstance (T320260)
13:26 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host analytics1073.eqiad.wmnet
13:26 jforrester@deploy1002: Finished scap: Backport for Add wikifunctions.org to wgCentralNoticeContentSecurityPolicy (T275945) (duration: 07m 52s)
13:20 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 03s)
13:20 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
13:20 jforrester@deploy1002: jforrester: Backport for Add wikifunctions.org to wgCentralNoticeContentSecurityPolicy (T275945) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:18 jforrester@deploy1002: Started scap: Backport for Add wikifunctions.org to wgCentralNoticeContentSecurityPolicy (T275945)
13:17 jforrester@deploy1002: sync-world aborted: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219) (duration: 00m 06s)
13:17 jforrester@deploy1002: Started scap: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219)
13:16 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
13:15 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
13:13 jforrester@deploy1002: Finished scap: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219) (duration: 08m 40s)
13:12 derick@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
13:10 derick@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
13:09 derick@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
13:08 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.mgmt.eqiad.wmnet']
13:08 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.mgmt.eqiad.wmnet']
13:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
13:07 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
13:07 btullis@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
13:07 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
13:07 derick@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
13:06 derick@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
13:06 jforrester@deploy1002: jforrester: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
13:04 derick@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
13:04 jforrester@deploy1002: Started scap: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219)
13:00 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
12:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1074.eqiad.wmnet with OS bullseye
12:42 btullis@cumin1001: START - Cookbook sre.hosts.dhcp for host analytics1073.eqiad.wmnet
12:40 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1073.eqiad.wmnet with OS bullseye
12:37 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
12:37 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
12:37 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
12:28 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
12:23 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
12:23 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
12:09 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
12:09 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
12:08 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
12:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1074.eqiad.wmnet with reason: host reimage
12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1015.eqiad.wmnet
12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
12:04 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1074.eqiad.wmnet with reason: host reimage
12:04 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
12:01 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1015.eqiad.wmnet
11:51 jbond@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-f8-eqiad
11:50 jbond@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f8-eqiad
11:39 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f8-eqiad
11:39 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
11:27 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-f8-eqiad
11:27 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
11:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
11:27 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
11:25 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-f8-eqiad
11:25 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
11:24 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-f8-eqiad
11:24 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
11:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
11:22 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
11:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
11:22 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
11:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1074.eqiad.wmnet with OS bullseye
11:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
11:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
11:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
11:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1071.eqiad.wmnet with OS bullseye
11:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
11:03 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
10:57 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
10:56 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
10:55 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:55 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
10:55 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
10:55 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
10:54 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
10:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
10:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
10:42 topranks: repool esams after successful move of cr3-knams to new rack T337997
10:41 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
10:40 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
10:39 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:39 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
10:38 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
10:38 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
10:38 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
10:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
10:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1003.eqiad.wmnet
10:24 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr3-knams,cr3-knams IPv6
10:24 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr3-knams,cr3-knams IPv6
10:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1071.eqiad.wmnet with reason: host reimage
10:17 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1071.eqiad.wmnet with reason: host reimage
10:11 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
10:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
10:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
10:02 fabfur: fix last entry: correct CR is https://gerrit.wikimedia.org/r/939242
10:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
10:02 fabfur: enable puppet on A:cp-esams for https://gerrit.wikimedia.org/r/939235 (T340983) (hosts will run puppet with the usual schedule)
10:02 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
10:00 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1071.eqiad.wmnet with OS bullseye
09:52 fabfur: disable puppet on A:cp-esams to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/939242 (T340983)
09:51 arturo: deploying https://gerrit.wikimedia.org/r/c/operations/homer/public/+/938819 via homer to cr-eqiad & cr-codfw
09:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1075.eqiad.wmnet
09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
09:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:28 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
09:28 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:27 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:24 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1075.eqiad.wmnet
09:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1074.eqiad.wmnet
09:24 XioNoX: remove asw-b1-codfw from asw-b-codfw VC - T342076
09:21 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
09:21 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
09:20 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
09:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
09:17 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
09:16 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
09:16 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1074.eqiad.wmnet
09:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
09:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2073.codfw.wmnet
09:09 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1073.eqiad.wmnet
09:08 ladsgroup@deploy1002: Finished scap: Backport for ores: use envoy proxy for Lift Wing (T319170) (duration: 14m 56s)
09:07 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:02 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1073.eqiad.wmnet
09:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2073.codfw.wmnet
08:58 fabfur: enable puppet on A:cp-eqiad for https://gerrit.wikimedia.org/r/939235 (T340983) (hosts will run puppet with the usual schedule)
08:57 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores: use envoy proxy for Lift Wing (T319170) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
08:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2072.codfw.wmnet
08:56 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1072.eqiad.wmnet
08:55 fabfur: disable puppet on A:cp-eqiad to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/939235 (T340983)
08:53 ladsgroup@deploy1002: Started scap: Backport for ores: use envoy proxy for Lift Wing (T319170)
08:48 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1072.eqiad.wmnet
08:48 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2072.codfw.wmnet
08:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2071.codfw.wmnet
08:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
08:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2071.codfw.wmnet
08:37 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
08:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2070.codfw.wmnet
08:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
08:28 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2070.codfw.wmnet
08:27 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
08:25 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
08:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
08:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
08:17 fabfur: enable puppet on A:cp-drmrs for https://gerrit.wikimedia.org/r/c/operations/puppet/+/938902/ (T340983) (hosts will run puppet with the usual schedule)
08:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
08:13 fabfur: disable puppet on A:cp-drmrs to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938902/ (T340983)
08:09 topranks: cr3-knams going offline for move
08:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
08:08 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
07:16 elukey: restart kafka main-codfw rebalances (long maintenance) - T341558
06:48 XioNoX: disable asw-b-codfw:ae0 (to cloudsw1-b1-codfw) - T342076
06:36 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cr3-knams,cr3-knams IPv6 with reason: Downtime cr3-knams ahead of remote hands moving router
06:36 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cr3-knams,cr3-knams IPv6 with reason: Downtime cr3-knams ahead of remote hands moving router

2023-07-17

21:57 btullis@deploy1002: Finished deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint (duration: 02m 10s)
21:55 btullis@deploy1002: Started deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint
21:55 btullis@deploy1002: Finished deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint (duration: 136m 46s)
21:53 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1001.eqiad.wmnet
21:52 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1001.eqiad.wmnet with OS bookworm
21:37 eileen: civicrm upgraded from 2c60d58d to 67c526e7
21:19 jgleeson: payments-wiki upgraded from d76b9085 to c9e298c9
21:16 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 47s)
21:15 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
21:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1001.eqiad.wmnet with OS bookworm
21:00 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
21:00 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
20:59 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1001.eqiad.wmnet on all recursors
20:59 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1001.eqiad.wmnet on all recursors
20:59 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:59 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
20:51 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
20:43 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
20:19 taavi: taavi@mwmaint1002 ~ $ echo "https://en.wikipedia.org/static/images/mobile/copyright/wikiquote-wordmark-bn.svg" | mwscript purgeList.php --wiki enwiki
20:13 taavi@deploy1002: Finished scap: Backport for bnwikiquote: Update wordmark (T341910) (duration: 08m 34s)
20:06 taavi@deploy1002: taavi and stang: Backport for bnwikiquote: Update wordmark (T341910) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
20:05 taavi@deploy1002: Started scap: Backport for bnwikiquote: Update wordmark (T341910)
20:03 bking@cumin1001: START - Cookbook sre.dns.netbox
20:03 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1001.eqiad.wmnet
19:38 btullis@deploy1002: Started deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint
18:58 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
18:58 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
18:48 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
17:34 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 02m 41s)
17:31 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
17:19 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 01m 50s)
17:18 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
16:42 urbanecm@deploy1002: Finished scap: Backport for Fix UserDatabaseHelper::hasMainspaceEdits (T341994) (duration: 08m 43s)
16:34 urbanecm@deploy1002: urbanecm: Backport for Fix UserDatabaseHelper::hasMainspaceEdits (T341994) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
16:33 urbanecm@deploy1002: Started scap: Backport for Fix UserDatabaseHelper::hasMainspaceEdits (T341994)
16:29 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
16:29 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
16:28 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
16:12 elukey: stop kafka-main codfw maintenance - T341558
16:08 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
16:08 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
16:07 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
16:05 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
16:05 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
16:04 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
16:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
16:02 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
15:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
15:56 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
15:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
15:50 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
15:49 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
15:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
15:49 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
15:49 fabfur@cumin1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
15:48 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
15:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
15:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
15:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
15:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
15:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
15:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
15:22 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
15:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
15:14 dancy@deploy1002: Installing scap version "4.54.0" for 605 hosts
15:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
15:09 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
15:09 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
15:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
15:04 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
15:02 sukhe: dns5003 upgrade to pdns-rec 4.8.4: T341611
14:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
14:57 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:57 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
14:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1072.eqiad.wmnet with OS bullseye
14:41 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
14:39 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
14:38 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
14:36 elukey: restart rsyslog on centrallog1002 ("peer did not provide a certificate, not permitted to talk to it")
14:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
14:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
14:24 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ores2003.codfw.wmnet
14:21 klausman@puppetmaster1001: conftool action : set/pooled=no; selector: name=ores2003.codfw.wmnet
14:20 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
14:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
14:20 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
14:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
14:17 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
14:16 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
14:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1072.eqiad.wmnet with reason: host reimage
14:12 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
14:12 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
14:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
14:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
14:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
14:10 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1072.eqiad.wmnet with reason: host reimage
14:10 elukey: start kafka partitions rebalance for main-codfw (long running maintenance, see https://phabricator.wikimedia.org/T341558)
14:09 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1003.eqiad.wmnet
14:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
14:03 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
14:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
13:54 lucaswerkmeister-wmde: Deployed security patch for T340217
13:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
13:53 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1072.eqiad.wmnet with OS bullseye
13:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
13:50 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1072.eqiad.wmnet with OS bullseye
13:48 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
13:47 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
13:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
13:46 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
13:45 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
13:43 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
13:43 akosiaris: deploy removal of nutcracker from thumbor. T318695
13:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
13:42 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
13:42 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
13:40 fabfur: reimaging cp4037 as preparatory test for knams migration
13:38 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
13:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
13:37 taavi@deploy1002: Finished scap: Backport for NewImpact: fix undefined log function (T341865) (duration: 10m 19s)
13:33 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1072.eqiad.wmnet with OS bullseye
13:32 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
13:28 taavi@deploy1002: taavi and urbanecm: Backport for NewImpact: fix undefined log function (T341865) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:27 taavi: taavi@mwmaint1002 ~ $ mwscript namespaceDupes.php --wiki huwiktionary --fix # T341926
13:27 taavi@deploy1002: Started scap: Backport for NewImpact: fix undefined log function (T341865)
13:26 taavi@deploy1002: Finished scap: Backport for change wgExtraNamespaces , wgNamespaceAliases for mnwwiktionary (T341940), Add appendix namespace aliases on huwiktionary (T341926), robots.txt: Disable indexing draft-related pages on knwiki (T341958) (duration: 19m 48s)
13:25 taavi: taavi@deploy1002 ~ $ mwscript namespaceDupes.php --wiki mnwwiktionary --fix # T341940
13:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
13:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
13:16 taavi@deploy1002: taavi and anzx: Backport for change wgExtraNamespaces , wgNamespaceAliases for mnwwiktionary (T341940), Add appendix namespace aliases on huwiktionary (T341926), robots.txt: Disable indexing draft-related pages on knwiki (T341958) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqia
13:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
13:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it
13:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it
13:09 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
13:08 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
13:07 fabfur: enabled puppet on A:cp-codfw to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938840 (T340983) (hosts will run puppet with the usual schedule)
13:06 taavi@deploy1002: Started scap: Backport for change wgExtraNamespaces , wgNamespaceAliases for mnwwiktionary (T341940), Add appendix namespace aliases on huwiktionary (T341926), robots.txt: Disable indexing draft-related pages on knwiki (T341958)
13:04 fabfur: run puppet on cp2027 to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/938840 (T340983)
13:03 sukhe: run authdns-update to depool esams
12:58 fabfur: disabled puppet on A:cp-codfw to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938840 (T340983)
12:54 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
12:54 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
12:53 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
12:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
12:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1005.wikimedia.org
12:01 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:01 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
11:53 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
11:46 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
11:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
11:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
11:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
11:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
11:38 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
11:36 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
11:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
11:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1070.eqiad.wmnet with OS bullseye
11:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
11:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
11:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
11:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
11:26 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
11:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
11:18 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
11:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
11:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
11:12 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1005.wikimedia.org
11:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1070.eqiad.wmnet with reason: host reimage
11:10 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
11:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
11:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
11:08 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
11:07 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1070.eqiad.wmnet with reason: host reimage
10:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
10:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
10:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
10:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
10:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
10:45 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
10:45 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
10:44 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1070.eqiad.wmnet with OS bullseye
10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
10:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1010.eqiad.wmnet
10:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
10:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
10:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1010.eqiad.wmnet
10:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2048.codfw.wmnet
10:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2002.codfw.wmnet
10:30 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
10:24 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
10:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
10:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2002.codfw.wmnet
10:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
10:21 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
10:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
10:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
10:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2001.codfw.wmnet
10:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
10:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2001.codfw.wmnet
09:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1002.eqiad.wmnet
09:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1002.eqiad.wmnet
09:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1001.eqiad.wmnet
09:48 fabfur: enabled puppet on A:cp hosts in ulsfo to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938807 (T340983) (hosts will run puppet with the usual schedule)
09:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
09:44 fabfur: disabled puppet on A:cp hosts in ulsfo to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938807 (T340983)
09:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1001.eqiad.wmnet
09:42 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-stretch1001.eqiad.wmnet
09:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1001.eqiad.wmnet
09:39 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
09:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
09:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
09:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
09:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
09:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
09:27 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
09:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
09:22 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
09:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
09:18 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
09:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
09:18 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
09:17 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
09:17 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
09:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
09:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
09:01 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
08:51 fabfur: enable puppet on A:cp-eqsin to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938002 (T340983)
08:37 fabfur: enable puppet on cp5024 and cp5032 to deploy 938002
08:30 fabfur: disable puppet on all cp* hosts in eqsin to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938002 (T340983)
04:33 hashar@deploy1002: Finished deploy [gerrit/gerrit@1153a16]: wm-checks-api: check undefined real_author (2) - T328484 (duration: 00m 08s)
04:33 hashar@deploy1002: Started deploy [gerrit/gerrit@1153a16]: wm-checks-api: check undefined real_author (2) - T328484
04:09 hashar@deploy1002: Finished deploy [gerrit/gerrit@cad3002]: wm-checks-api: check undefined real_author - T328484 (duration: 00m 08s)
04:08 hashar@deploy1002: Started deploy [gerrit/gerrit@cad3002]: wm-checks-api: check undefined real_author - T328484

2023-07-16

23:23 eileen: civicrm upgraded from 562224c1 to 2c60d58d
17:20 apergos: starting rsync of sql/xml dumps files, pulling from dumpsdata1004, running in ariel screen session on dumpsdata1007, bw limited to 1G

2023-07-14

19:57 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
19:56 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
19:55 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
19:55 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
19:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-worker1153.eqiad.wmnet with OS bullseye
19:19 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@37d3ad6]: Run page_content_change_to_wikitext_raw DAG serially. T335860 (duration: 00m 14s)
19:19 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@37d3ad6]: Run page_content_change_to_wikitext_raw DAG serially. T335860
18:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1153.eqiad.wmnet with OS bullseye
16:05 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
16:04 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
14:25 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
14:22 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
10:43 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
10:42 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
10:41 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
10:40 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
10:38 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
10:38 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
09:02 klausman@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=ores2003.codfw.wmnet
09:02 klausman: Setting ores2003 to pooled=inactive wheile we attempt repairs/decide on decom
08:51 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
08:48 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
08:48 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
08:47 _joe_: deploying to mw on k8s for T341825
08:47 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
07:16 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
07:13 hashar@deploy1002: Finished deploy [integration/docroot@56b5745]: Add mwbot-rs to doc.wikimedia.org - T341543 (duration: 00m 08s)
07:13 hashar@deploy1002: Started deploy [integration/docroot@56b5745]: Add mwbot-rs to doc.wikimedia.org - T341543
07:12 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
07:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
07:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
07:06 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
07:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
07:04 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
07:04 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
06:24 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
06:16 oblivian@deploy1002: Started scap: (no justification provided)

2023-07-13

20:59 inflatador: bking@cumin1001 'disable puppet on hosts using zookeeper class T341792'
20:37 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@889e13f]: (no justification provided) (duration: 00m 23s)
20:37 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@889e13f]: (no justification provided)
20:29 taavi@deploy1002: Finished scap: Backport for Avoid calling wfMessage in the hook handler constructor (T339272) (duration: 07m 38s)
20:23 taavi@deploy1002: func and taavi: Backport for Avoid calling wfMessage in the hook handler constructor (T339272) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:22 taavi@deploy1002: Started scap: Backport for Avoid calling wfMessage in the hook handler constructor (T339272)
20:19 taavi@deploy1002: func and taavi: Backport for Avoid calling wfMessage in the hook handler constructor (T339272) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:17 taavi@deploy1002: Started scap: Backport for Avoid calling wfMessage in the hook handler constructor (T339272)
20:16 taavi@deploy1002: Finished scap: Backport for Set default for UseLegacyMediaStyles and disable on officewiki (T318433) (duration: 07m 47s)
20:10 taavi@deploy1002: taavi and arlolra: Backport for Set default for UseLegacyMediaStyles and disable on officewiki (T318433) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:08 taavi@deploy1002: Started scap: Backport for Set default for UseLegacyMediaStyles and disable on officewiki (T318433)
18:11 milimetric@deploy1002: Finished deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint (duration: 03m 39s)
18:08 milimetric@deploy1002: Started deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint
18:06 milimetric@deploy1002: Finished deploy [analytics/aqs/deploy@91f8d92]: Deploying new AQS endpoint (duration: 00m 05s)
18:06 milimetric@deploy1002: Started deploy [analytics/aqs/deploy@91f8d92]: Deploying new AQS endpoint
16:40 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
16:40 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
16:14 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
16:13 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
15:56 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
15:56 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
15:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
15:54 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sretest2002
15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2002 decommissioned, removing all IPs except the asset tag one - pt1979@cumin2002"
15:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2002 decommissioned, removing all IPs except the asset tag one - pt1979@cumin2002"
15:30 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:25 pt1979@cumin2002: START - Cookbook sre.hosts.decommission for hosts sretest2002
15:18 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:17 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:45 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
14:44 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
14:43 elukey: depool ores2003 to allow DCops maintenance work
14:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it
14:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it
14:32 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:32 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:29 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
14:28 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:28 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
14:26 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:19 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
14:16 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
13:21 urbanecm: UTC afternoon B&C window done
13:19 urbanecm: Run `mwscript namespaceDupes.php --wiki=mnwwiktionary --fix` (T330689)
13:17 urbanecm@deploy1002: Finished scap: Backport for Create Reconstruction and Rhymes namespaces in mnwwiktionary (T330689) (duration: 09m 46s)
13:09 urbanecm@deploy1002: anzx and urbanecm: Backport for Create Reconstruction and Rhymes namespaces in mnwwiktionary (T330689) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:08 urbanecm@deploy1002: Started scap: Backport for Create Reconstruction and Rhymes namespaces in mnwwiktionary (T330689)
12:47 Lucas_WMDE: Start `mwscript DiscussionTools:persistRevisionThreadItems ruwiki --current --all --start '["10086120"]'; touch ~/T315510-ruwiki-exited-$?` in tmux on mwmaint1002 (T315510)
11:38 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:38 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
11:32 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1014.eqiad.wmnet
11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:30 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
11:29 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1014.eqiad.wmnet
10:57 vgutierrez: restarting pybal on lvs1020
10:35 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
10:34 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
10:19 fabfur: puppet enabled on cp3052 and cp5017 and new configuration applied (https://gerrit.wikimedia.org/r/c/operations/puppet/+/936701)
10:15 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
10:15 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
10:13 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
10:12 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
10:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
10:11 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
10:07 fabfur: disable puppet on cp3052 and cp5017 to safely monitor https://gerrit.wikimedia.org/r/c/operations/puppet/+/936701
10:04 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
10:04 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
10:03 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
10:03 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
09:11 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
09:11 elukey: increased kafka partitions for mediawiki.job.cirrusSearchLinksUpdate and mediawiki.job.cirrusSearchLinksUpdate (eqiad/codfw) - T341558
09:10 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
09:09 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
09:09 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
09:04 XioNoX: update NAT on pfw3-eqiad - T340252
08:14 hashar: Restarting CI Jenkins for plugin installation
07:55 apergos: UTC morning backport and config deployment window done
07:53 ariel@deploy1002: Finished scap: Backport for [idwikiquote] Change the logo and add a wordmark (T341177) (duration: 08m 20s)
07:47 ariel@deploy1002: ariel and superpes: Backport for [idwikiquote] Change the logo and add a wordmark (T341177) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
07:45 ariel@deploy1002: Started scap: Backport for [idwikiquote] Change the logo and add a wordmark (T341177)
07:40 ariel@deploy1002: Finished scap: Backport for [idwikiquote] Change the sitename and the project namespace (T341177) (duration: 09m 43s)
07:32 ariel@deploy1002: ariel and superpes: Backport for [idwikiquote] Change the sitename and the project namespace (T341177) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
07:31 ariel@deploy1002: Started scap: Backport for [idwikiquote] Change the sitename and the project namespace (T341177)
07:27 ariel@deploy1002: Finished scap: Backport for [mywiki] Create 'autopatrolled' and 'patroller' usergroups (T341026) (duration: 08m 39s)
07:20 ariel@deploy1002: ariel and superpes: Backport for [mywiki] Create 'autopatrolled' and 'patroller' usergroups (T341026) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
07:18 ariel@deploy1002: Started scap: Backport for [mywiki] Create 'autopatrolled' and 'patroller' usergroups (T341026)
07:14 ariel@deploy1002: Finished scap: Backport for [knwiki] Reverting the temporary logo and updating logo/wordmark/tagline (T338136) (duration: 08m 35s)
07:07 ariel@deploy1002: superpes and ariel: Backport for [knwiki] Reverting the temporary logo and updating logo/wordmark/tagline (T338136) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
07:05 ariel@deploy1002: Started scap: Backport for [knwiki] Reverting the temporary logo and updating logo/wordmark/tagline (T338136)
04:22 eileen: civicrm upgraded from 4521c00a to 562224c1
03:05 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[25,26,27,30,33].eqiad.wmnet: Applying JVM update - eevans@cumin1001
02:52 eileen: civicrm upgraded from 882e2310 to 4521c00a
02:33 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[25,26,27,30,33].eqiad.wmnet: Applying JVM update - eevans@cumin1001
01:59 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase10[18,25,26,27,30,33].eqiad.wmnet: Applying JVM update - eevans@cumin1001
01:50 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[18,25,26,27,30,33].eqiad.wmnet: Applying JVM update - eevans@cumin1001
01:31 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[17,22,23,24,29,32].eqiad.wmnet: Applying JVM update - eevans@cumin1001
00:53 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[17,22,23,24,29,32].eqiad.wmnet: Applying JVM update - eevans@cumin1001
00:52 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase10[17,22,23,24,29,32].eqiad.wmnet: Applying JVM update - eevans@cumin1001
00:50 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[17,22,23,24,29,32].eqiad.wmnet: Applying JVM update - eevans@cumin1001
00:47 eileen: config revision changed from ccc33b1e to e3e5a11d - renabled jobs
00:33 eileen: civicrm upgraded from 1bfc3b17 to 882e2310
00:17 eileen: drush @wmff civicrm-upgrade-db
00:08 eileen: config revision changed from 6ac88ea8 to ccc33b1e (I pushed the upgrade code out)

2023-07-12

23:59 eileen: config revision changed from c543419d to ccc33b1e
23:57 eileen: config revision changed from 6ac88ea8 to c543419d
23:47 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[16,19,20,21,28,31].eqiad.wmnet: Applying JVM update - eevans@cumin1001
23:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
23:07 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[16,19,20,21,28,31].eqiad.wmnet: Applying JVM update - eevans@cumin1001
22:18 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[12,17,18,23,26,27].codfw.wmnet: Applying JVM update - eevans@cumin1001
21:49 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:43 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[12,17,18,23,26,27].codfw.wmnet: Applying JVM update - eevans@cumin1001
21:33 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[16,20,22,25].codfw.wmnet: Applying JVM update - eevans@cumin1001
21:32 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
21:32 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
21:09 TheresNoTime: close UTC late backport window
21:09 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[16,20,22,25].codfw.wmnet: Applying JVM update - eevans@cumin1001
21:08 samtar@deploy1002: Finished scap: Backport for ruwikibooks: Add NS104 to wgNamespacesToBeSearchedDefault (T341708) (duration: 08m 10s)
21:02 samtar@deploy1002: stang and samtar: Backport for ruwikibooks: Add NS104 to wgNamespacesToBeSearchedDefault (T341708) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
21:00 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
21:00 samtar@deploy1002: Started scap: Backport for ruwikibooks: Add NS104 to wgNamespacesToBeSearchedDefault (T341708)
21:00 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
20:59 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase20[15-16,20,22,25].codfw.wmnet: Applying JVM update - eevans@cumin1001
20:59 samtar@deploy1002: Finished scap: Backport for Fix mediawiki.special_diff_interactions configuration (duration: 08m 47s)
20:52 samtar@deploy1002: samtar and urbanecm: Backport for Fix mediawiki.special_diff_interactions configuration synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:50 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[15-16,20,22,25].codfw.wmnet: Applying JVM update - eevans@cumin1001
20:50 samtar@deploy1002: Started scap: Backport for Fix mediawiki.special_diff_interactions configuration
20:49 samtar@deploy1002: Finished scap: Backport for log additional events on Special:Diff|MobileDiff (T326212) (duration: 26m 41s)
20:48 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[14,21,24].codfw.wmnet: Applying JVM update - eevans@cumin1001
20:29 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[14,21,24].codfw.wmnet: Applying JVM update - eevans@cumin1001
20:29 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[13,19].codfw.wmnet: Applying JVM update - eevans@cumin1001
20:24 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
20:24 samtar@deploy1002: jsn and samtar: Backport for log additional events on Special:Diff|MobileDiff (T326212) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
20:23 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
20:22 samtar@deploy1002: Started scap: Backport for log additional events on Special:Diff|MobileDiff (T326212)
20:21 samtar@deploy1002: Finished scap: Backport for Fix Error: Module "./ext.pageTriage.defaultTagsOptions.js" is not loaded (T340112) (duration: 09m 27s)
20:20 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
20:20 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
20:20 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
20:20 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
20:17 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[13,19].codfw.wmnet: Applying JVM update - eevans@cumin1001
20:13 samtar@deploy1002: samtar: Backport for Fix Error: Module "./ext.pageTriage.defaultTagsOptions.js" is not loaded (T340112) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:11 samtar@deploy1002: Started scap: Backport for Fix Error: Module "./ext.pageTriage.defaultTagsOptions.js" is not loaded (T340112)
20:10 samtar@deploy1002: Finished scap: Backport for [ruwiki] Add permissions to 'editor' usergroup (T341707) (duration: 08m 04s)
20:08 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2013.codfw.wmnet: Applying JVM update - eevans@cumin1001
20:04 samtar@deploy1002: samtar: Backport for [ruwiki] Add permissions to 'editor' usergroup (T341707) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:02 samtar@deploy1002: Started scap: Backport for [ruwiki] Add permissions to 'editor' usergroup (T341707)
19:57 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2013.codfw.wmnet: Applying JVM update - eevans@cumin1001
18:39 dduvall@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.17 refs T340245 (duration: 06m 16s)
18:33 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.17 refs T340245
18:24 dduvall@deploy1002: Finished scap: Backport for QueryMessageGroupActionApi: Apply sorting to groups only (T341627) (duration: 08m 22s)
18:17 dduvall@deploy1002: abi and dduvall: Backport for QueryMessageGroupActionApi: Apply sorting to groups only (T341627) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
18:16 dduvall@deploy1002: Started scap: Backport for QueryMessageGroupActionApi: Apply sorting to groups only (T341627)
17:10 sukhe: restart pybal on lvs1018
17:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6001.drmrs.wmnet with OS bullseye
16:59 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
16:59 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
16:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
16:47 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
16:42 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
16:42 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
16:41 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
16:41 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1013.eqiad.wmnet
16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
16:38 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
16:37 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
16:37 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
16:35 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
16:32 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@a0e00cb] (releasing): (no justification provided) (duration: 00m 58s)
16:31 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@a0e00cb] (releasing): (no justification provided)
16:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1013.eqiad.wmnet
16:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bullseye
16:21 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host durum6001.drmrs.wmnet with OS bookworm
16:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
15:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
15:43 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:43 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
15:42 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
15:42 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
15:35 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
15:34 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
15:11 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
15:08 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
15:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bookworm
15:07 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
15:05 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
15:03 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
14:49 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Temporarily allow OAuth on non-API entry points again (T341656) (duration: 08m 03s)
14:48 sukhe: upgrade dns2004 to gdnsd 3.99.0~alpha2
14:42 lucaswerkmeister-wmde@deploy1002: tgr and lucaswerkmeister-wmde: Backport for Temporarily allow OAuth on non-API entry points again (T341656) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
14:41 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Temporarily allow OAuth on non-API entry points again (T341656)
14:17 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
14:11 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
14:07 sukhe: dns4003: upgrade to pdns-rec 4.8.4: T341611
13:59 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
13:57 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
13:56 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
13:56 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
13:46 sukhe: doh6001: upgrade to pdns-rec 4.8.4: T341611
13:44 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
13:43 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
13:42 sukhe: reprepro -C main include bullseye-wikimedia pdns-recursor_4.8.4-1+wmf11u1_amd64.changes: T341611
13:40 Lucas_WMDE: UTC afternoon backport+config window done
13:38 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Add new campaign_events.event_answers_status column (T341142) (duration: 07m 59s)
13:34 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
13:31 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
13:31 lucaswerkmeister-wmde@deploy1002: daimona and lucaswerkmeister-wmde: Backport for Add new campaign_events.event_answers_status column (T341142) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
13:30 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Add new campaign_events.event_answers_status column (T341142)
13:29 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Html: Support more attr types in getTextInputAttributes() (T341566) (duration: 07m 40s)
13:28 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 47s)
13:27 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
13:27 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 02m 30s)
13:24 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
13:24 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 02m 09s)
13:23 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Html: Support more attr types in getTextInputAttributes() (T341566) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
13:22 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
13:21 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Html: Support more attr types in getTextInputAttributes() (T341566)
13:06 moritzm: installing node-tough-cookie security updates
12:54 moritzm: rebalance ganeti codfw/D following reboots
12:52 moritzm: imported wikidiff2 1.14.1-0+wmf1+buster1+icu67u1 to component/icu67 T340087 T329491
12:44 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
12:30 akosiaris: upgrade wikidiff2 1.13.0-1+wmf1+buster1 -> 1.14.1-0+wmf1+buster1 on mw-canary hosts T340087
12:11 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
11:50 moritzm: installing apache2 security updates on Bullseye
11:48 moritzm: installing wireshark security updates
11:44 moritzm: rebalance ganeti codfw/C following reboots
11:43 hnowlan: rebuilding fluent-bit image to include wmf-certificates
11:33 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
11:26 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
11:00 ladsgroup@deploy1002: Finished scap: Backport for fix: add request headers properly (T319170) (duration: 10m 20s)
10:51 ladsgroup@deploy1002: ladsgroup: Backport for fix: add request headers properly (T319170) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
10:50 ladsgroup@deploy1002: Started scap: Backport for fix: add request headers properly (T319170)
10:49 ladsgroup@deploy1002: Finished scap: Backport for Externallinks: Keep domain wildcard if path is not specified (T326251) (duration: 08m 09s)
10:42 ladsgroup@deploy1002: ladsgroup: Backport for Externallinks: Keep domain wildcard if path is not specified (T326251) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
10:40 ladsgroup@deploy1002: Started scap: Backport for Externallinks: Keep domain wildcard if path is not specified (T326251)
09:35 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
09:35 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
09:34 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
09:33 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
08:55 moritzm: move secondary instances away from ganeti2014 T341546
07:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0)
07:29 ayounsi@cumin1001: START - Cookbook sre.network.tls
07:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0)
07:26 ayounsi@cumin1001: START - Cookbook sre.network.tls
06:55 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
06:55 ayounsi@cumin1001: START - Cookbook sre.network.tls
06:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0)
06:47 ayounsi@cumin1001: START - Cookbook sre.network.tls
06:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0)
06:45 ayounsi@cumin1001: START - Cookbook sre.network.tls
06:37 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
06:37 ayounsi@cumin1001: START - Cookbook sre.network.tls
06:18 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
06:18 ayounsi@cumin1001: START - Cookbook sre.network.tls
06:16 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
06:16 ayounsi@cumin1001: START - Cookbook sre.network.tls
06:11 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
06:11 ayounsi@cumin1001: START - Cookbook sre.network.tls
06:06 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
06:06 ayounsi@cumin1001: START - Cookbook sre.network.tls
04:36 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
04:35 tchin@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
03:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
03:41 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
03:39 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
03:29 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
03:29 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
03:16 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
03:15 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
03:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
03:14 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
01:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 19:00:00 on wdqs[2013,2022].codfw.wmnet with reason: new host
01:46 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 19:00:00 on wdqs[2013,2022].codfw.wmnet with reason: new host

2023-07-11

21:51 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
20:45 urbanecm@deploy1002: Finished scap: Backport for Logos: Fixes grantswiki and idwiktionary, Drop idwiktionary wordmark, Always return the class as string from Html::getTextInputAttributes (T341566) (duration: 11m 10s)
20:35 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Logos: Fixes grantswiki and idwiktionary, Drop idwiktionary wordmark, Always return the class as string from Html::getTextInputAttributes (T341566) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:33 urbanecm@deploy1002: Started scap: Backport for Logos: Fixes grantswiki and idwiktionary, Drop idwiktionary wordmark, Always return the class as string from Html::getTextInputAttributes (T341566)
20:32 urbanecm@deploy1002: Sync cancelled.
20:26 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Logos: Fixes grantswiki and idwiktionary synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:25 urbanecm@deploy1002: Started scap: Backport for Logos: Fixes grantswiki and idwiktionary
20:16 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
20:14 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
19:49 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:48 denisse@cumin1001: START - Cookbook sre.dns.netbox
18:57 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.17 refs T340245
18:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
17:53 dduvall@deploy1002: Pruned MediaWiki: 1.41.0-wmf.15 (duration: 02m 16s)
17:50 dduvall@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.17 refs T340245 (duration: 45m 50s)
17:05 dduvall@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.17 refs T340245
16:52 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
16:28 vgutierrez: reenabling puppet in cp6002
16:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
16:08 sukhe: upgrade dns1004 to gdnsd 3.99.0~alpha2
16:04 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase20[13-27].codfw.wmnet: Applying JVM update - eevans@cumin1001
16:03 Lucas_WMDE: previous backport also included Remove oversampling for Navigation Timing extension. (T337858)
16:02 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Add option for html label in Menu template (T340217) (duration: 09m 15s)
15:54 lucaswerkmeister-wmde@deploy1002: jdlrobson and lucaswerkmeister-wmde: Backport for Add option for html label in Menu template (T340217) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
15:54 Krinkle: Deployed https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/930712 ("Remove oversampling for Navigation Timing extension.")
15:53 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Add option for html label in Menu template (T340217)
15:48 krinkle@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: pending security problem, see mediawiki_security IRC (duration: 17m 03s)
15:31 krinkle@deploy1002: Locking from deployment [ALL REPOSITORIES]: pending security problem, see mediawiki_security IRC
15:26 krinkle@deploy1002: Sync cancelled.
15:24 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[13-27].codfw.wmnet: Applying JVM update - eevans@cumin1001
15:22 krinkle@deploy1002: phedenskog and krinkle: Backport for Remove oversampling for Navigation Timing extension. (T337858) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
15:21 krinkle@deploy1002: Started scap: Backport for Remove oversampling for Navigation Timing extension. (T337858)
15:17 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching A:restbase-codfw: Applying JVM update - eevans@cumin1001
15:09 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Applying JVM update - eevans@cumin1001
14:49 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
14:21 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
14:19 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
14:17 moritzm: restarting apache on mw canaries
14:17 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
14:15 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
14:12 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
14:02 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:00 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
13:59 moritzm: installing yajl security updates
13:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
13:57 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
13:49 moritzm: rebalance ganeti group eqiad/d after reboots
13:42 jgiannelos@deploy1002: Finished deploy [restbase/deploy@930f075]: (no justification provided) (duration: 19m 50s)
13:33 urbanecm@deploy1002: Finished scap: Backport for Enable tabs for non loggedin mobile users on knwikisource (T340276) (duration: 11m 33s)
13:23 urbanecm@deploy1002: urbanecm and anzx: Backport for Enable tabs for non loggedin mobile users on knwikisource (T340276) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
13:22 jgiannelos@deploy1002: Started deploy [restbase/deploy@930f075]: (no justification provided)
13:21 urbanecm@deploy1002: Started scap: Backport for Enable tabs for non loggedin mobile users on knwikisource (T340276)
13:21 urbanecm@deploy1002: Finished scap: Backport for Growth: Increase mentorship percentage to 25% on enwiki (T341399) (duration: 07m 15s)
13:14 urbanecm@deploy1002: Started scap: Backport for Growth: Increase mentorship percentage to 25% on enwiki (T341399)
13:13 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137) (duration: 09m 45s)
13:05 urbanecm@deploy1002: sgimeno and urbanecm: Backport for GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:03 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137)
13:00 jbond@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
12:59 jbond@cumin1001: START - Cookbook sre.postgresql.postgres-init
12:59 jbond@cumin1001: END (ERROR) - Cookbook sre.postgresql.postgres-init (exit_code=97)
12:53 jbond@cumin1001: START - Cookbook sre.postgresql.postgres-init
12:00 XioNoX: decom datahop in knams - T340049
11:42 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
11:38 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
11:37 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
11:27 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
11:17 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
11:06 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
10:46 moritzm: installing libx11 security updates
10:44 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:44 ladsgroup@deploy1002: Sync cancelled.
10:39 ladsgroup@deploy1002: ladsgroup: Backport for ExternalLinks: Make oneWildcard avoid adding wildcard to domain (T326251) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
10:38 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
10:37 ladsgroup@deploy1002: Started scap: Backport for ExternalLinks: Make oneWildcard avoid adding wildcard to domain (T326251)
10:19 moritzm: rebalance ganeti group codfw/C after reboots
10:03 ladsgroup@deploy1002: Finished scap: Backport for Override liftwing hostname (T319170) (duration: 14m 34s)
09:52 ladsgroup@deploy1002: ladsgroup: Backport for Override liftwing hostname (T319170) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
09:52 jbond: disable puppet fleet wide to deploy 936273
09:49 ladsgroup@deploy1002: Started scap: Backport for Override liftwing hostname (T319170)
09:47 jbond: renable puppet
09:43 hashar: Updating Zuul configuration which was stall to a version from March 29th after the switchover from contint2001 to contint2002 | T324659 T341556
09:36 jbond: deploy gerrit:936273 enable submitting data to puppetdb7
09:30 jbond: disable puppet fleet wide to deploy 936273
09:08 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kafkamon1003.eqiad.wmnet
09:06 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
09:06 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
09:06 jayme: enabled puppet on 'P{R:Package = envoyproxy}'
09:01 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
09:01 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
08:59 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
08:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1003.eqiad.wmnet
08:59 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
08:43 volans: previous downtiming completed
08:40 volans: downtiming service 'Check no envoy runtime configuration is left persistent' on envoy hosts
08:39 jayme: disabled puppet on 'P{R:Package = envoyproxy}'
08:19 godog: upgrade prometheus to 2.24.1+ds-1+wmf2 on cloudmetrics*
08:03 hashar: Stopping Jenkins and Zuul for server switch over
08:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on contint2002.wikimedia.org with reason: Switch contint hosts for hardware replacement
08:01 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on contint2002.wikimedia.org with reason: Switch contint hosts for hardware replacement
08:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on contint2001.wikimedia.org with reason: Switch contint hosts for hardware replacement
08:01 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on contint2001.wikimedia.org with reason: Switch contint hosts for hardware replacement
07:55 kart_: Updated MinT to 2023-07-10-051738-production (T341335, T333969)
07:54 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
07:49 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
07:47 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
07:42 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
07:38 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
07:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
07:36 moritzm: failover broken ganeti2014 node
07:28 moritzm: powercycle ganeti2014
07:22 moritzm: installing libxpm security updates
07:08 moritzm: rebalance ganeti in drmrs after reboots
06:59 elukey: restart kube-apiserver on ml-serve-ctrl1* as attempt to resolve spikes in latencies
06:36 moritzm: rebalance ganeti group eqiad/B after reboots
05:24 rzl: imported otelcol-contrib 0.81.0 to buster-wikimedia and bullseye-wikimedia in component thirdparty/otelcol-contrib
04:34 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
02:05 mutante: LDAP - added urbanecm to wmf group, removed from nda group (conversion volunteer to staff) T341443

2023-07-10

23:11 Krinkle: krinkle@xhgui1001$ Define new `xhgui.watches` table via xhguiadmin@m2-master.eqiad.wmnet database, ref T341499
22:12 maryum: Deployed security patch for T340200
21:42 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
21:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
21:37 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 52s)
21:36 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
20:46 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
20:43 TheresNoTime: close UTC late backport window
20:42 samtar@deploy1002: Finished scap: Backport for Revert "log additional events on Special:Diff|MobileDiff" (duration: 07m 27s)
20:42 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
20:36 samtar@deploy1002: samtar: Backport for Revert "log additional events on Special:Diff|MobileDiff" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
20:34 samtar@deploy1002: Started scap: Backport for Revert "log additional events on Special:Diff|MobileDiff"
20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49544 and previous config saved to /var/cache/conftool/dbconfig/20230710-202536-ladsgroup.json
20:23 samtar@deploy1002: Finished scap: Backport for log additional events on Special:Diff|MobileDiff (T326212) (duration: 21m 42s)
20:23 inflatador: bking@wdqs1006 Restart wdqs-blazegraph to hopefully clear the free allocators alerts
20:19 TheresNoTime: syncing https://gerrit.wikimedia.org/r/c/936748 untested (T326212) for test after sync
20:14 mutante: miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/static-tendril
20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49541 and previous config saved to /var/cache/conftool/dbconfig/20230710-201031-ladsgroup.json
20:07 eileen: civicrm upgraded from 0ddd1a51 to 7caf5274
20:03 samtar@deploy1002: samtar and jsn: Backport for log additional events on Special:Diff|MobileDiff (T326212) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:02 samtar@deploy1002: Started scap: Backport for log additional events on Special:Diff|MobileDiff (T326212)
20:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1124.eqiad.wmnet with reason: Reboot
19:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1124.eqiad.wmnet with reason: Reboot
19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49540 and previous config saved to /var/cache/conftool/dbconfig/20230710-195527-ladsgroup.json
19:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
19:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49538 and previous config saved to /var/cache/conftool/dbconfig/20230710-194022-ladsgroup.json
19:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
19:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
19:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
19:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2112 T341511', diff saved to https://phabricator.wikimedia.org/P49537 and previous config saved to /var/cache/conftool/dbconfig/20230710-191511-ladsgroup.json
19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2103 to s1 primary T341511', diff saved to https://phabricator.wikimedia.org/P49536 and previous config saved to /var/cache/conftool/dbconfig/20230710-191259-ladsgroup.json
19:12 Amir1: Starting s1 codfw failover from db2112 to db2103 - T341511
18:59 sukhe: running authdns-update
18:57 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbproxy1012.eqiad.wmnet
18:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:56 sukhe: finished commissionioning new DNS hosts in eqiad: dns100[4-6]. decomissioned dns100[1-3].
18:55 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
18:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1012.eqiad.wmnet
18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns[1002-1003].wikimedia.org
18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns[1002-1003].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
18:49 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbproxy1012.eqiad.wmnet
18:49 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
18:49 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns[1002-1003].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
18:46 sukhe@cumin2002: START - Cookbook sre.dns.netbox
18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2103 with weight 0 T341511', diff saved to https://phabricator.wikimedia.org/P49535 and previous config saved to /var/cache/conftool/dbconfig/20230710-184521-ladsgroup.json
18:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: Primary switchover s1 T341511
18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: Primary switchover s1 T341511
18:43 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
18:38 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns[1002-1003].wikimedia.org
18:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1012.eqiad.wmnet
18:32 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
18:31 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
18:29 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
18:26 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
18:03 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
17:55 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
17:51 sukhe: homer "mr*" commit "update ntp_servers (remove dns100[2-3], add dns100[5-6])"
17:26 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
17:24 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
16:52 sukhe: rolling restart of ntp.service on A:dns-rec
16:44 sukhe: homer "cr*-eqiad*" commit "Gerrit: 936757 remove DNS hosts dns1002 and dns1003"
16:26 sukhe: ns0: set routing-options static route 208.80.154.238/32 next-hop [ 208.80.154.6 208.80.154.153 208.80.154.77 ]
16:26 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@8fa416b]: T328276: Change articletopic source to the outlink model (duration: 00m 20s)
16:25 ebernhardson@deploy1002: Started deploy [airflow-dags/search@8fa416b]: T328276: Change articletopic source to the outlink model
16:07 taavi@deploy1002: Finished scap: Backport for wikitech: Update codfw1dev LDAP server hostname, Disable UrlShortener on wikitech (T341470) (duration: 07m 47s)
16:00 taavi@deploy1002: taavi: Backport for wikitech: Update codfw1dev LDAP server hostname, Disable UrlShortener on wikitech (T341470) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
15:59 taavi@deploy1002: Started scap: Backport for wikitech: Update codfw1dev LDAP server hostname, Disable UrlShortener on wikitech (T341470)
15:53 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 30s)
15:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 31s)
15:30 sukhe: homer "cr*-eqiad*" commit "Gerrit: 936720 add new DNS host dns1006"
15:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
15:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
15:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1006.wikimedia.org with OS bullseye
15:25 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
15:23 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
15:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
15:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
15:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
15:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
15:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1006.wikimedia.org with reason: host reimage
15:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1006.wikimedia.org with reason: host reimage
15:00 moritzm: rebalance ganeti group eqiad/A after reboots
14:57 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:57 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
14:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
14:51 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:51 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:49 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1006.wikimedia.org with OS bullseye
14:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:46 tchin@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
14:33 fabfur: add new dns host dns1005
14:28 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:28 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:27 sukhe@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "running manually for dns1005 - sukhe@cumin1001"
14:26 sukhe@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "running manually for dns1005 - sukhe@cumin1001"
14:23 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1005.wikimedia.org with OS bullseye
14:23 sukhe@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1001"
14:22 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:22 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
14:22 sukhe@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1001"
14:19 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:19 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
14:15 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:15 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
14:10 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:10 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
14:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2002.wikimedia.org
14:02 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1005.wikimedia.org with reason: host reimage
14:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
14:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
14:01 ladsgroup@deploy1002: Finished scap: Backport for Set commons to READ_NEW for externallinks migration (T335343) (duration: 09m 22s)
13:59 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1005.wikimedia.org with reason: host reimage
13:58 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint2002.wikimedia.org
13:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
13:53 ladsgroup@deploy1002: ladsgroup: Backport for Set commons to READ_NEW for externallinks migration (T335343) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
13:52 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host karapace1002.eqiad.wmnet
13:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host karapace1002.eqiad.wmnet with OS bullseye
13:51 ladsgroup@deploy1002: Started scap: Backport for Set commons to READ_NEW for externallinks migration (T335343)
13:47 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host dns1005.wikimedia.org with OS bullseye
13:46 ladsgroup@deploy1002: Finished scap: Backport for ExternalLinks: Make order by and continue only rely on el_id in READ NEW (T341000 T47237) (duration: 11m 03s)
13:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
13:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on karapace1002.eqiad.wmnet with reason: host reimage
13:36 ladsgroup@deploy1002: ladsgroup: Backport for ExternalLinks: Make order by and continue only rely on el_id in READ NEW (T341000 T47237) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
13:36 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on karapace1002.eqiad.wmnet with reason: host reimage
13:35 ladsgroup@deploy1002: Started scap: Backport for ExternalLinks: Make order by and continue only rely on el_id in READ NEW (T341000 T47237)
13:28 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wdqs2020.codfw.wmnet
13:28 ladsgroup@deploy1002: Finished scap: Backport for ores extension: deploy LiftWing usage on testwiki (T319170) (duration: 09m 02s)
13:27 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
13:27 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
13:27 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host karapace1002.eqiad.wmnet with OS bullseye
13:22 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM karapace1002.eqiad.wmnet - btullis@cumin1001"
13:21 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM karapace1002.eqiad.wmnet - btullis@cumin1001"
13:21 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) karapace1002.eqiad.wmnet on all recursors
13:21 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache karapace1002.eqiad.wmnet on all recursors
13:21 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:21 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM karapace1002.eqiad.wmnet - btullis@cumin1001"
13:20 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM karapace1002.eqiad.wmnet - btullis@cumin1001"
13:20 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores extension: deploy LiftWing usage on testwiki (T319170) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
13:19 ladsgroup@deploy1002: Started scap: Backport for ores extension: deploy LiftWing usage on testwiki (T319170)
13:16 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on group2 wikis (T314318) (duration: 10m 26s)
13:16 btullis@cumin1001: START - Cookbook sre.dns.netbox
13:16 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host karapace1002.eqiad.wmnet
13:07 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and arlolra: Backport for Disable wgParserEnableLegacyMediaDOM on group2 wikis (T314318) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on group2 wikis (T314318)
12:34 claime: Running puppet on cp-text hosts - T341463
12:33 claime: Sending 1% of global traffic to mw-on-k8s - T341463
12:04 moritzm: failover ganeti masters in drmrs
12:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
11:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
11:55 moritzm: installing avahi security updates
11:52 vgutierrez: repool cp2037 (debugging finished) - T320967
11:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
11:34 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
11:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
11:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 28 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
11:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 28 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
11:14 moritzm: remove unused VM netflow6002 T330884
11:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti6003.drmrs.wmnet
11:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
10:55 moritzm: failover ganeti master in eqiad to ganeti1029
10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
10:50 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
10:50 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
10:49 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb1002.eqiad.wmnet with OS bullseye
10:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
10:45 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
10:44 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
10:44 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
10:43 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp2037.codfw.wmnet
10:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp2037.codfw.wmnet
10:34 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb1002.eqiad.wmnet with reason: host reimage
10:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
10:25 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb1002.eqiad.wmnet with reason: host reimage
10:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
10:13 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1002.eqiad.wmnet with OS bullseye
10:12 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,name=parse1012.*
10:12 claime: repooling parse1012.eqiad.wmnet
10:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
10:11 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb1002.eqiad.wmnet with OS bullseye
10:05 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1002.eqiad.wmnet with OS bullseye
10:03 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb1001.eqiad.wmnet with OS bullseye
10:03 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin1001"
10:02 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin1001"
09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1033.eqiad.wmnet
09:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
09:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
09:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1033.eqiad.wmnet
09:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2037.codfw.wmnet with reason: vgutierrez debugging
09:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2037.codfw.wmnet with reason: vgutierrez debugging
09:44 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb1002.private.eqiad.wikimedia.cloud on all recursors
09:44 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache cloudlb1002.private.eqiad.wikimedia.cloud on all recursors
09:39 moritzm: rebalance ganeti group codfw/B after reboots
09:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb1001.eqiad.wmnet with reason: host reimage
09:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
09:35 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb1001.eqiad.wmnet with reason: host reimage
09:35 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:35 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb1002 - aborrero@cumin1001"
09:33 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb1002 - aborrero@cumin1001"
09:31 aborrero@cumin1001: START - Cookbook sre.dns.netbox
09:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
09:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
09:29 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:29 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb1001 - aborrero@cumin1001"
09:28 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb1001 - aborrero@cumin1001"
09:25 aborrero@cumin1001: START - Cookbook sre.dns.netbox
09:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
09:23 moritzm: rebalance ganeti group codfw/A after reboots
09:14 vgutierrez: depool cp2037 (debugging ATS cacheability issues) - T320967
09:12 moritzm: restarting mw canaries to pick up libxpm security update
09:08 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-eqiad
09:07 moritzm: installing cups security updates (libs only)
09:06 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye
09:04 moritzm: rebalance ganeti clusters in esams/ulsfo/eqsin following reboots
09:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
08:58 lucaswerkmeister-wmde:: Deployed security patch for T340220
08:57 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-eqiad
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-codfw
08:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
08:48 moritzm: installing libxpm security updates
08:47 kart_: Updated cxserver to 2023-07-10-065135-production (T337719, T340989)
08:46 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
08:45 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
08:44 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-codfw
08:41 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
08:40 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
08:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
08:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
08:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
08:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
08:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
08:24 claime: Running puppet on cp-text hosts - T337489
08:11 hashar: UTC morning backport window completed.
08:11 hashar@deploy1002: Finished scap: Backport for Deploy action blocks on bnwiki (T340904) (duration: 08m 15s)
08:04 moritzm: installing c-ares security updates on buster
08:04 hashar@deploy1002: hashar and mdsshakil: Backport for Deploy action blocks on bnwiki (T340904) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
08:03 hashar@deploy1002: Started scap: Backport for Deploy action blocks on bnwiki (T340904)
08:02 hashar@deploy1002: Finished scap: Backport for thwiki: Update logos from commons (T341407) (duration: 25m 32s)
08:00 moritzm: installing flask security updates on bullseye
07:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
07:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
07:45 hashar@deploy1002: func and hashar: Backport for thwiki: Update logos from commons (T341407) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
07:36 hashar@deploy1002: Started scap: Backport for thwiki: Update logos from commons (T341407)
07:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
07:30 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
07:30 moritzm: installing libgstreamer-plugins-base1.0-0 security updates
07:29 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
07:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
07:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
07:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
07:22 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
07:21 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
07:21 hashar: deploy1002: removed empty untracked directory from MediaWiki staging area: `rmdir /srv/mediawiki-staging/wmf-config/scap/log/ && rmdir /srv/mediawiki-staging/wmf-config/scap/` | T341292
07:20 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
07:20 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
07:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
07:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
07:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
06:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
06:43 godog: add 100G to prometheus/k8s in codfw
01:06 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
01:06 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply

2023-07-09

14:51 apergos: swapped dumpsdata1003 in as the new nfs share for misc dumps; dumpsdata1002 is now a spare, to be decommissioned. 1003 is running bullseye.
04:04 apergos: rsync misc dumps output files from dumpsdata1002 to 1003, in ariel screen session on 1003, bwlimit to 1G

2023-07-08

03:21 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply

2023-07-07

22:55 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
22:55 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
22:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
22:21 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
22:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1156.eqiad.wmnet with OS bullseye
21:59 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
21:24 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 57s)
21:23 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
21:23 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
20:53 dwisehaupt@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:53 dwisehaupt@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"
20:52 dwisehaupt@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"
20:50 dwisehaupt@cumin1001: START - Cookbook sre.dns.netbox
20:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1156.eqiad.wmnet with OS bullseye
19:33 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:33 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
19:32 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
19:12 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
18:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
18:08 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
17:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
17:57 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb1001.eqiad.wmnet with OS bullseye
17:56 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
17:40 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
16:44 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye
16:38 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wdqs2020.codfw.wmnet
16:23 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
16:20 hashar: Restarting CI Jenkins due to a confusion in the next build number leading to intermittent 404 when browsing console links | T341348
16:00 bking@cumin1001: conftool action : set/pooled=no; selector: name=wdqs2020.codfw.wmnet
15:53 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb1001.mgmt.eqiad.wmnet with reboot policy FORCED
15:51 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wdqs2020.codfw.wmnet
15:50 bking@cumin1001: conftool action : set/weight=10; selector: name=wdqs2020.codfw.wmnet
15:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:47 aborrero@cumin1001: START - Cookbook sre.hosts.provision for host cloudlb1001.mgmt.eqiad.wmnet with reboot policy FORCED
15:46 bking@cumin1001: conftool action : set/pooled=yes; selector: service=(wdqs|wdqs-ssl|wdqs-heavy-queries),name=wdqs2020.codfw.wmnet
15:45 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
15:43 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb1001.eqiad.wmnet with OS bullseye
15:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
15:05 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 50s)
15:04 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
14:58 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 49s)
14:57 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye
14:57 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
14:50 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
14:50 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
14:49 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
14:49 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
14:47 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
14:26 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
13:59 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 07s)
13:59 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
13:58 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
13:58 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
12:50 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
12:17 hashar: Re-enabled zuul-merger on contint2001 and removed the Icinga maintenance window
12:02 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:02 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
12:01 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
11:58 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:48 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:48 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
11:47 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
11:45 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:42 hashar: Enabled zuul-merger contint1002, disabled it on contint2001 and marked that host as under maintenance in Icinga for the next two hours
11:27 hashar: Stopped zuul-merger contint1002
11:17 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
11:04 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
11:02 aborrero@cumin1001: START - Cookbook sre.dns.netbox
10:13 moritzm: rebooting puppetdb1003
10:09 moritzm: rebooting puppetserver1001
10:06 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb2003.codfw.wmnet
10:05 moritzm: rebooting puppetserver2001
10:05 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
10:03 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet
09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
09:52 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host debmonitor2003.codfw.wmnet
09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet
09:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet
09:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
09:45 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
09:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
09:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
09:34 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host lists1003.wikimedia.org
09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
09:29 stevemunene@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
09:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
09:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1003.wikimedia.org
09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1004.eqiad.wmnet
09:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1004.eqiad.wmnet
09:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
09:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2003.codfw.wmnet
09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2003.codfw.wmnet
09:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
08:53 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
08:50 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
08:48 moritzm: installing bookworm kernel updates
08:47 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: xhgui2002.codfw.wmnet
08:47 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: xhgui2002.codfw.wmnet
08:46 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: xhgui1002.eqiad.wmnet
08:46 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: xhgui1002.eqiad.wmnet
08:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-test[1006-1010].eqiad.wmnet with reason: resetting cluster
08:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-test[1006-1010].eqiad.wmnet with reason: resetting cluster
01:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
00:28 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer

2023-07-06

23:14 mutante: mx1001 - rm /usr/local/bin/otrs_aliases ; rm /lib/systemd/system/generate_otrs_aliases.* after deploying gerrit:932316 which renamed script and timer without absenting them
23:08 mutante: mx2001 - rm /usr/local/bin/otrs_aliases ; rm /lib/systemd/system/generate_otrs_aliases.* after deploying gerrit:932316 which renamed script and timer without absenting them
21:12 thcipriani@deploy1002: Finished scap: Clean up font directory gerrit:723652 (duration: 06m 33s)
21:10 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 14m 56s)
21:06 thcipriani@deploy1002: Started scap: Clean up font directory gerrit:723652
21:04 thcipriani@deploy1002: Finished scap: Backport for pawikibooks: Install Quiz extension (T340613) (duration: 12m 19s)
20:55 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
20:54 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
20:54 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
20:53 thcipriani@deploy1002: stang and thcipriani: Backport for pawikibooks: Install Quiz extension (T340613) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
20:51 thcipriani@deploy1002: Started scap: Backport for pawikibooks: Install Quiz extension (T340613)
20:48 thcipriani@deploy1002: Finished scap: Backport for Update more logos with available SVGs (T338162) (duration: 12m 41s)
20:37 thcipriani@deploy1002: jdlrobson and thcipriani: Backport for Update more logos with available SVGs (T338162) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:35 thcipriani@deploy1002: Started scap: Backport for Update more logos with available SVGs (T338162)
20:16 thcipriani@deploy1002: Finished scap: Backport for Disable purging of old client hint data by default (T340959 T341076) (duration: 10m 08s)
20:07 thcipriani@deploy1002: thcipriani and dreamyjazz: Backport for Disable purging of old client hint data by default (T340959 T341076) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:06 thcipriani@deploy1002: Started scap: Backport for Disable purging of old client hint data by default (T340959 T341076)
19:24 urbanecm@deploy1002: Finished scap: Backport for PageView: Fix base URL when using service proxy (T341191) (duration: 07m 16s)
19:17 urbanecm@deploy1002: Started scap: Backport for PageView: Fix base URL when using service proxy (T341191)
19:06 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
19:03 urbanecm@deploy1002: Finished scap: Backport for PageView: Route requests through restbase service proxy (T341191) (duration: 07m 27s)
18:57 urbanecm@deploy1002: urbanecm: Backport for PageView: Route requests through restbase service proxy (T341191) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
18:56 urbanecm@deploy1002: Started scap: Backport for PageView: Route requests through restbase service proxy (T341191)
17:33 cstone: tools upgraded from 2ca83336 to 10972e59
17:24 sukhe: sudo cumin -b1 -s300 'A:dns-rec' 'systemctl restart ntp.service'
17:17 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
17:15 sukhe: homer "mr*" commit "update ntp_servers (add dns1004, remove dns1001)"
17:07 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
17:06 cstone: SmashPig upgraded from db23b998 to 95181a1b
17:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns1001.wikimedia.org
17:04 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:04 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
17:02 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
17:00 sukhe@cumin2002: START - Cookbook sre.dns.netbox
16:58 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
16:58 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
16:54 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns1001.wikimedia.org
16:49 sukhe: sudo cumin A:netbox 'run-puppet-agent': removing dns1001 before decomm cookbook
16:44 sukhe: homer "cr*-eqiad*" commit "decommission DNS host dns1001 (replaced by dns1004)"
16:31 sukhe: ns0: set routing-options static route 208.80.154.238/32 next-hop [ 208.80.154.6 208.80.155.108 208.80.154.134 ]
16:30 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
16:30 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
16:16 sukhe: homer "cr*-eqiad*" commit "Gerrit: 933917 add new DNS host dns1004"
16:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
16:11 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
15:54 elukey: changeprop's kafka linger.ms set to 20s - T338357 (was 5ms, now changeprop waits a bit more to batch messages to send to kafka in one go)
15:53 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
15:53 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
15:47 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
15:47 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
15:45 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
15:45 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
15:36 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
15:36 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
15:35 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
15:29 sukhe: restart ntp.service on A:dns-rec
15:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
15:25 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
15:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1004.wikimedia.org with OS bullseye
15:21 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
15:20 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
15:16 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
15:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-worker1003.eqiad.wmnet with OS bullseye
15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1004.wikimedia.org with reason: host reimage
14:58 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1004.wikimedia.org with reason: host reimage
14:55 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:54 aborrero@cumin1001: START - Cookbook sre.dns.netbox
14:51 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: host reimage
14:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1069.eqiad.wmnet
14:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1069.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
14:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: host reimage
14:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bullseye
14:45 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1004.wikimedia.org with OS bullseye
14:42 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1069.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
14:37 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
14:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
14:35 hnowlan: reenabling puppet on A:cp
14:31 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1069.eqiad.wmnet
14:29 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1068.eqiad.wmnet
14:29 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:29 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1068.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
14:28 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
14:27 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1068.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
14:27 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
14:25 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
14:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
14:22 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
14:22 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
14:20 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1068.eqiad.wmnet
14:19 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1001
14:19 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001
14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1067.eqiad.wmnet
14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
14:16 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
14:15 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
14:14 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
14:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bullseye
14:13 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1004.wikimedia.org with OS bullseye
14:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:12 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
14:09 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
14:06 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1067.eqiad.wmnet
14:05 hnowlan: disabling puppet on A:cp-text to test 935464
14:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
14:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bullseye
14:02 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1004.wikimedia.org with OS bullseye
14:01 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
13:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bullseye
13:55 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
13:54 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
13:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudlb1001.eqiad.wmnet
13:42 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
13:38 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudlb1001.eqiad.wmnet
13:34 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
13:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zookeeper-test1002.eqiad.wmnet with OS bookworm
13:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-test-worker1003.eqiad.wmnet
13:30 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1066.eqiad.wmnet
13:30 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:30 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1066.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
13:29 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
13:29 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host an-test-worker1003.eqiad.wmnet
13:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-test-worker1003.eqiad.wmnet
13:24 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1066.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
13:22 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
13:18 urbanecm@deploy1002: Finished scap: Backport for Enable global abuse filters on almost all projects (T341159) (duration: 10m 07s)
13:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1095.eqiad.wmnet with reason: Replacing RAID controller battery
13:17 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker1095.eqiad.wmnet with reason: Replacing RAID controller battery
13:14 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1066.eqiad.wmnet
13:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1065.eqiad.wmnet
13:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1065.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
13:10 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
13:10 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1065.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
13:10 urbanecm@deploy1002: urbanecm: Backport for Enable global abuse filters on almost all projects (T341159) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
13:08 urbanecm@deploy1002: Started scap: Backport for Enable global abuse filters on almost all projects (T341159)
13:08 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
13:02 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1065.eqiad.wmnet
13:00 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb1001.eqiad.wmnet with OS bullseye
12:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zookeeper-test1002.eqiad.wmnet with reason: host reimage
12:58 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1064.eqiad.wmnet
12:58 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:58 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1064.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
12:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on zookeeper-test1002.eqiad.wmnet with reason: host reimage
12:43 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1064.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
12:42 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host zookeeper-test1002.eqiad.wmnet with OS bookworm
12:40 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
12:35 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1064.eqiad.wmnet
12:32 samtar@deploy1002: Finished scap: Backport for Revert "Add tag when reference added to the page" (T341202) (duration: 24m 04s)
12:21 samtar@deploy1002: matmarex and samtar: Backport for Revert "Add tag when reference added to the page" (T341202) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
12:15 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host zookeeper-test1002.eqiad.wmnet with OS bookworm
12:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host zookeeper-test1002.eqiad.wmnet with OS bookworm
12:08 samtar@deploy1002: Started scap: Backport for Revert "Add tag when reference added to the page" (T341202)
11:56 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye
11:56 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:56 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts analytics1063.eqiad.wmnet
11:56 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:56 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1002
11:56 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1002
11:56 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:56 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
11:55 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
11:55 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
11:54 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb1001.eqiad.wmnet on all recursors
11:54 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudlb1001.eqiad.wmnet on all recursors
11:53 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:53 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
11:52 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
11:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
11:50 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:50 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1063.eqiad.wmnet
11:50 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb1001.eqiad.wmnet on all recursors
11:50 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudlb1001.eqiad.wmnet on all recursors
11:49 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
11:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
11:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
11:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
11:48 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1001
11:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
11:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
11:48 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001
11:47 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts analytics1063.eqiad.wmnet
11:47 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:46 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
11:43 aborrero@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb1001
11:42 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001
11:41 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1063.eqiad.wmnet
11:41 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts analytics1063.eqiad.wmnet
11:41 stevemunene@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
11:39 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:39 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
11:38 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
11:35 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Beta-Wikidata: Always show mul on desktop Termbox (T339104) (duration: 07m 37s)
11:30 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudswift1002.eqiad.wmnet
11:30 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:30 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudswift1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
11:29 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudswift1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
11:27 lucaswerkmeister-wmde@deploy1002: migr and lucaswerkmeister-wmde: Backport for Beta-Wikidata: Always show mul on desktop Termbox (T339104) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
11:27 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
11:27 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:26 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Beta-Wikidata: Always show mul on desktop Termbox (T339104)
11:25 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for foundationwiki: Enable WikibaseClient (T321967) (duration: 08m 58s)
11:24 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
11:24 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudswift1001.eqiad.wmnet
11:24 aborrero@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
11:23 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudswift1002.eqiad.wmnet
11:22 aborrero@cumin1001: START - Cookbook sre.dns.netbox
11:19 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1063.eqiad.wmnet
11:18 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
11:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
11:17 lucaswerkmeister-wmde@deploy1002: varnent and lucaswerkmeister-wmde: Backport for foundationwiki: Enable WikibaseClient (T321967) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
11:16 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for foundationwiki: Enable WikibaseClient (T321967)
11:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
11:14 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
11:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for outreachwiki: Set wmgWikibaseSiteGroup (duration: 07m 35s)
11:12 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
11:12 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
11:10 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudswift1001.eqiad.wmnet
11:07 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for outreachwiki: Set wmgWikibaseSiteGroup synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
11:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for outreachwiki: Set wmgWikibaseSiteGroup
11:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1062.eqiad.wmnet
11:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1062.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
11:04 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
11:03 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on A:swift-fe
10:58 taavi@deploy1002: Finished scap: Backport for extdist: REL1_40 is stable, REL1_38 is EOL (duration: 08m 21s)
10:54 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1062.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
10:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:51 taavi@deploy1002: taavi: Backport for extdist: REL1_40 is stable, REL1_38 is EOL synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
10:49 taavi@deploy1002: Started scap: Backport for extdist: REL1_40 is stable, REL1_38 is EOL
10:47 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
10:41 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1062.eqiad.wmnet
10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1061.eqiad.wmnet
10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1061.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
10:08 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1061.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
10:05 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
09:58 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1061.eqiad.wmnet
09:35 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on A:swift-fe
09:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
09:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
09:11 elukey: restart kube-apiserver on ml-serve-ctrl2* as attempt to fix LIST-related latency issues
09:10 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.16 refs T340244
08:55 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
08:55 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
08:51 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
08:50 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
08:49 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
08:49 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
08:45 fabfur: reenabled puppet on cp1075.eqiad.wmnet, cp2027.codfw.wmnet, cp3050.esams.wmnet
08:39 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
08:17 fabfur: disabling puppet temporary on cp1075.eqiad.wmnet, cp2027.codfw.wmnet, cp3050.esams.wmnet to apply 935760 (T340983)
08:03 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade
07:31 kart_: Updated MinT to 2023-07-06-051402-production
07:29 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
07:29 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
07:25 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
07:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
07:17 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
07:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
07:09 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
07:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 9 hosts with reason: Stopping puppet and hadoop-hdfs-datanode services then decommissioning the hosts
07:04 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 9 hosts with reason: Stopping puppet and hadoop-hdfs-datanode services then decommissioning the hosts
06:54 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade
02:17 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
02:16 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
02:06 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
02:05 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
02:05 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
02:05 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
00:22 eileen: civicrm upgraded from 4ca2008d to 0ddd1a51
00:03 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
00:02 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply

2023-07-05

22:52 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
22:38 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host an-test-worker1003.eqiad.wmnet
22:36 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts xhgui2002
22:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:36 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
22:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-test-worker1003.eqiad.wmnet
22:35 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
22:33 denisse@cumin1001: START - Cookbook sre.dns.netbox
22:28 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts xhgui2002
22:28 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts xhgui1002
22:28 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:28 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui1002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
22:27 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui1002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
22:24 denisse@cumin1001: START - Cookbook sre.dns.netbox
22:23 mutante: registry1003 - sudo systemctl start build-hompage
22:17 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts xhgui1002
21:23 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
21:04 urbanecm@deploy1002: Finished scap: Backport for Optimize SVG wordmarks, enable Wikimania wordmark, fix techconduct (T338162) (duration: 08m 22s)
20:57 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Optimize SVG wordmarks, enable Wikimania wordmark, fix techconduct (T338162) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
20:55 urbanecm@deploy1002: Started scap: Backport for Optimize SVG wordmarks, enable Wikimania wordmark, fix techconduct (T338162)
20:55 urbanecm@deploy1002: Finished scap: Backport for Update various logos where SVGs are available (T338162) (duration: 11m 10s)
20:45 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Update various logos where SVGs are available (T338162) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
20:44 urbanecm@deploy1002: Started scap: Backport for Update various logos where SVGs are available (T338162)
20:31 urbanecm@deploy1002: Finished scap: Backport for Add language button at the top of the Main page of Italian Wikivoyage (T337666) (duration: 12m 38s)
20:20 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Add language button at the top of the Main page of Italian Wikivoyage (T337666) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
20:19 urbanecm@deploy1002: Started scap: Backport for Add language button at the top of the Main page of Italian Wikivoyage (T337666)
20:17 urbanecm@deploy1002: Finished scap: Backport for Disable the Nearby feature on some sister projects (T341133) (duration: 13m 12s)
20:05 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Disable the Nearby feature on some sister projects (T341133) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
20:04 urbanecm@deploy1002: Started scap: Backport for Disable the Nearby feature on some sister projects (T341133)
18:36 sukhe: re-enable puppet in A:dns-rec to finish merging CR 933497 and run-agent: T340479
18:25 denisse: disable puppet on webperf1003 to test PHP memory changes for XHGui
18:25 denisse: disable puppet on webperf1003
18:20 sukhe: disable puppet on A:dns-rec to merge CR 933497
17:40 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
17:07 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
17:07 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
17:03 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
17:02 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
16:33 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
16:25 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host an-test-worker1003.eqiad.wmnet
16:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-test-worker1003.eqiad.wmnet
16:06 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
15:55 fabfur: re-enabled puppet in all cp- hosts (done @2023-07-05 14:22:57 UTC)
15:38 mlitn@deploy1002: Finished deploy [airflow-dags/platform_eng@a97da10]: (no justification provided) (duration: 00m 25s)
15:38 mlitn@deploy1002: Started deploy [airflow-dags/platform_eng@a97da10]: (no justification provided)
15:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
15:26 sukhe: reprepro -C component/dnsdist include bullseye-wikimedia dnsdist_1.8.0-1+wmf11u1_amd64.changes
15:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
15:23 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
15:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1024.eqiad.wmnet
15:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
15:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
15:05 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host an-test-worker1003.eqiad.wmnet
15:00 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
14:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
14:48 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
14:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet
14:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet
14:42 sukhe: re-enable puppet and start pybal on lvs2013
14:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
14:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
14:38 vgutierrez: pool cdn service in cp2027.codfw.wmnet,cp1075.eqiad.wmnet,cp3050.esams.wmnet
14:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
14:29 Lucas_WMDE: UTC afternoon backport+config window done
14:27 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:27 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:24 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:24 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
14:22 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:22 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:19 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:18 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
14:18 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:18 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:18 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
14:18 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:17 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
14:17 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
14:16 vgutierrez: depool cdn service in cp2027.codfw.wmnet,cp1075.eqiad.wmnet,cp3050.esams.wmnet
14:16 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
14:16 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
14:15 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:14 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:11 fabfur: disabling puppet in all cp- hosts for error in configuration
14:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
14:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
14:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
14:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
13:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
13:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1021.eqiad.wmnet
13:55 elukey: expand kafka topic partitions from 1 to 5 for {codfw,eqiad}.mediawiki.job.RecordLintJob and {eqiad,codfw}.mediawiki.job.refreshLinks on kafka-main eqiad/codfw - T338357
13:53 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable Extension:RelatedArticles for desktop on frwikinews (T341105) (duration: 08m 54s)
13:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: mgmt interface issues
13:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: mgmt interface issues
13:45 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and anzx: Backport for Enable Extension:RelatedArticles for desktop on frwikinews (T341105) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
13:44 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable Extension:RelatedArticles for desktop on frwikinews (T341105)
13:41 sukhe: disable puppet and stop pybal on lvs2013: T340960
13:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
13:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
13:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
13:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
13:02 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
12:54 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1019.eqiad.wmnet
12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1019.eqiad.wmnet
12:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
12:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1019.eqiad.wmnet
12:34 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudsw-b1.private.codfw.wikimedia.cloud on all recursors
12:34 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudsw-b1.private.codfw.wikimedia.cloud on all recursors
12:31 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:31 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-b1 codfw - aborrero@cumin2002"
12:30 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-b1 codfw - aborrero@cumin2002"
12:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet
12:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
12:28 aborrero@cumin2002: START - Cookbook sre.dns.netbox
12:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
12:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
11:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts urldownloader1001.wikimedia.org
11:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:58 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:51 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts urldownloader1001.wikimedia.org
11:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts urldownloader1002.wikimedia.org
11:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:47 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:47 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
11:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet
11:45 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:45 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
11:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:39 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
11:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts urldownloader1002.wikimedia.org
11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts urldownloader2002.wikimedia.org
11:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:32 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
11:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet
11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
11:25 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts urldownloader2002.wikimedia.org
11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts urldownloader2001.wikimedia.org
11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
11:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
11:06 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:01 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
11:00 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe
11:00 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts urldownloader2001.wikimedia.org
10:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet
10:52 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe
10:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet
10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
10:41 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
10:40 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
10:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet
10:22 godog: restore US business hours escalation - T340763
10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet
10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
10:14 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2003.wikimedia.org
10:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet
10:05 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw - aborrero@cumin1001"
09:47 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw - aborrero@cumin1001"
09:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet
09:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
09:43 aborrero@cumin1001: START - Cookbook sre.dns.netbox
09:41 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:41 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw - aborrero@cumin1001"
09:41 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:41 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:40 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw - aborrero@cumin1001"
09:39 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:39 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
09:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
09:38 aborrero@cumin1001: START - Cookbook sre.dns.netbox
09:37 claime: running puppet on 'A:cp-text and P:trafficserver::backend' - T341078
09:36 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:36 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
09:35 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:35 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
09:31 claime: Sending 0.5% of global traffic to mw-on-k8s - T341078
09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
09:29 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
09:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow1002.eqiad.wmnet to drbd
09:27 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:27 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:26 cgoubert@deploy1002: Finished scap: (no justification provided) (duration: 02m 19s)
09:25 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:24 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:24 claime: redeploy mw-on-k8s following quota update - T341114
09:24 cgoubert@deploy1002: Started scap: (no justification provided)
09:23 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
09:22 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:22 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
09:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
09:21 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:21 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
09:19 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:19 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:18 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:18 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:17 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow1002.eqiad.wmnet to drbd
09:15 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet
09:11 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:10 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:10 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:10 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:09 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:09 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:08 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:08 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:08 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:08 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:07 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
09:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
09:07 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:07 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:07 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
09:06 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:06 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:04 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:04 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:03 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
09:02 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:02 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
09:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
09:01 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
09:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
09:01 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
08:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
08:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
08:53 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
08:52 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
08:52 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
08:45 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
08:45 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
08:44 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.16 refs T340244
08:40 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
08:40 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
08:39 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
08:34 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
08:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
08:26 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
08:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
08:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
08:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
08:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
08:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
08:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
08:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet
00:09 zabe@deploy1002: Finished scap: update interwiki cache (duration: 06m 51s)
00:02 zabe@deploy1002: Started scap: update interwiki cache

2023-07-04

23:58 zabe@deploy1002: Finished scap: T335969 (duration: 07m 40s)
23:52 zabe@deploy1002: zabe: T335969 synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
23:50 zabe@deploy1002: Started scap: T335969
23:50 zabe: create Wikipedia Ghanaian Pidgin # T335969
22:57 zabe@deploy1002: Finished scap: Backport for Remove migrateStewards.php reference (duration: 07m 23s)
22:52 zabe@deploy1002: taavi and zabe: Backport for Remove migrateStewards.php reference synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
22:50 zabe@deploy1002: Started scap: Backport for Remove migrateStewards.php reference
22:46 zabe@deploy1002: Finished scap: Backport for Stop setting $wgCommentTempTableSchemaMigrationStage (T299954) (duration: 07m 56s)
22:39 zabe@deploy1002: zabe: Backport for Stop setting $wgCommentTempTableSchemaMigrationStage (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
22:38 zabe@deploy1002: Started scap: Backport for Stop setting $wgCommentTempTableSchemaMigrationStage (T299954)
19:38 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:38 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
19:37 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:37 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
19:29 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:29 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49512 and previous config saved to /var/cache/conftool/dbconfig/20230704-192646-ladsgroup.json
19:23 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:23 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
19:21 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49511 and previous config saved to /var/cache/conftool/dbconfig/20230704-191142-ladsgroup.json
19:09 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
19:07 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
19:07 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
19:01 jgleeson: payments-wiki upgraded from cbc0b454 to d76b9085
18:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49510 and previous config saved to /var/cache/conftool/dbconfig/20230704-185637-ladsgroup.json
18:56 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49509 and previous config saved to /var/cache/conftool/dbconfig/20230704-184132-ladsgroup.json
18:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
18:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
18:38 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2165 T339223', diff saved to https://phabricator.wikimedia.org/P49508 and previous config saved to /var/cache/conftool/dbconfig/20230704-183748-ladsgroup.json
18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 primary T339223', diff saved to https://phabricator.wikimedia.org/P49507 and previous config saved to /var/cache/conftool/dbconfig/20230704-183434-ladsgroup.json
18:32 Amir1: Starting s8 codfw failover from db2165 to db2161 - T339223
18:31 sukhe: finished running homer for adding fabfur [pushed to all 55 devices successfully]
18:25 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
18:06 sukhe: enable puppet on A:wikidough to roll out CR 863295
18:01 sukhe: disable puppet on A:wikidough to roll out CR 863295
17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 T339223', diff saved to https://phabricator.wikimedia.org/P49506 and previous config saved to /var/cache/conftool/dbconfig/20230704-175604-ladsgroup.json
17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T339223
17:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T339223
17:36 sukhe: [correction] homer "*" commit "Gerrit: 935479 add fabfur"
17:36 sukhe: homer "*" commit "Gerrit: 935479 add fabur"
16:14 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard-next.discovery.wmnet on all recursors
16:14 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard-next.discovery.wmnet on all recursors
16:14 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet on all recursors
16:14 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet on all recursors
16:03 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet on all recursors
16:03 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet on all recursors
16:03 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetboard,name=codfw
16:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet on all recursors
16:02 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet on all recursors
15:57 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.wikimedia.org on all recursors
15:57 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.wikimedia.org on all recursors
15:56 jbond@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=puppetboard,name=codfw
15:46 Emperor: delete swift container global-data-elastic-backups in AUTH_search account T341081
15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
15:26 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
15:26 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
15:25 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
15:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
15:24 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
15:19 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
15:12 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
15:08 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard-next.wikimedia.org on all recursors
15:08 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard-next.wikimedia.org on all recursors
15:04 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:04 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-c8.private.eqiad.wikimedia.cloud - aborrero@cumin1001"
15:03 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-c8.private.eqiad.wikimedia.cloud - aborrero@cumin1001"
15:01 aborrero@cumin1001: START - Cookbook sre.dns.netbox
15:01 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:01 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-c8.eqiad.codfw.wikimedia.cloud - aborrero@cumin1001"
15:00 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-c8.eqiad.codfw.wikimedia.cloud - aborrero@cumin1001"
14:58 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:58 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:58 aborrero@cumin1001: START - Cookbook sre.dns.netbox
14:56 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:56 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
14:53 claime: Deploying encrypted rsync to deployment servers - T289857
14:52 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:52 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
14:50 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:50 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
14:49 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:49 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
14:46 jbond@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=puppetboard-next,name=codfw
14:43 cgoubert@deploy1002: Finished scap: (no justification provided) (duration: 02m 12s)
14:42 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:42 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
14:41 cgoubert@deploy1002: Started scap: (no justification provided)
14:41 claime: redeploying mw-on-k8s
14:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
14:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
14:40 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
14:40 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
14:39 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:38 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:38 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
14:37 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
14:36 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:36 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
14:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
14:33 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:33 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:32 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
14:31 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
14:29 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:29 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:29 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:29 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:27 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
14:27 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
14:23 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:22 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:20 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:20 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
14:18 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:18 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:16 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
14:16 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
14:16 Lucas_WMDE: UTC afternoon backport+config window done
14:16 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002) (duration: 18m 41s)
14:03 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
14:03 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
14:03 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
14:02 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
14:02 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
14:02 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
14:02 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
14:01 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
14:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
14:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
14:00 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
14:00 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
14:00 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
14:00 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
13:59 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and urbanecm: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:58 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
13:58 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
13:57 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002)
13:55 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
13:55 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
13:51 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
13:50 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
13:50 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
13:50 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox: apply
13:50 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
13:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
13:45 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventstreams: apply
13:44 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002) (duration: 08m 25s)
13:40 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/zotero: apply
13:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
13:38 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/wikifeeds: apply
13:37 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/toolhub: apply
13:37 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/termbox: apply
13:37 lucaswerkmeister-wmde@deploy1002: urbanecm and lucaswerkmeister-wmde: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
13:36 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/tegola-vector-tiles: apply
13:36 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/similar-users: apply
13:35 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002)
13:35 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox-timeline: apply
13:34 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox-syntaxhighlight: apply
13:34 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox-media: apply
13:34 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox-constraints: apply
13:33 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/recommendation-api: apply
13:32 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
13:32 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/rdf-streaming-updater: apply
13:32 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
13:31 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
13:31 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
13:28 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
13:28 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
13:27 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/push-notifications: apply
13:26 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/proton: apply
13:22 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/mobileapps: apply
13:18 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/miscweb: apply
13:17 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/machinetranslation: apply
13:11 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/linkrecommendation: apply
13:10 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/image-suggestion: apply
13:09 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventstreams-internal: apply
13:08 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventgate-main: apply
13:05 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventgate-logging-external: apply
13:03 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventgate-analytics-external: apply
13:02 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventgate-analytics: apply
13:01 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/zotero: apply
13:01 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/wikifeeds: apply
13:01 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/echostore: apply
13:00 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/device-analytics: apply
12:59 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/developer-portal: apply
12:57 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/cxserver: apply
12:56 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/citoid: apply
12:55 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/toolhub: apply
12:55 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/blubberoid: apply
12:54 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/termbox: apply
12:53 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/tegola-vector-tiles: apply
12:51 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
12:51 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
12:50 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
12:50 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/similar-users: apply
12:49 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
12:49 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox-timeline: apply
12:48 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
12:48 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
12:48 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
12:48 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox-syntaxhighlight: apply
12:48 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
12:48 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox-media: apply
12:47 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox-constraints: apply
12:47 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox: apply
12:47 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/recommendation-api: apply
12:46 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/rdf-streaming-updater: apply
12:45 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/push-notifications: apply
12:44 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/proton: apply
12:42 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/mobileapps: apply
12:41 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/miscweb: apply
12:40 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/machinetranslation: apply
12:34 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/linkrecommendation: apply
12:31 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
12:31 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
12:30 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/image-suggestion: apply
12:29 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventstreams-internal: apply
12:29 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventstreams: apply
12:28 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventgate-main: apply
12:27 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventgate-logging-external: apply
12:27 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
12:26 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
12:25 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventgate-analytics-external: apply
12:24 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventgate-analytics: apply
12:23 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/echostore: apply
12:22 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/device-analytics: apply
12:21 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/developer-portal: apply
12:20 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/cxserver: apply
12:20 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/citoid: apply
12:19 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/blubberoid: apply
12:18 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/apertium: apply
12:13 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
12:12 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
11:55 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/similar-users: apply
11:53 jayme@deploy1002: helmfile [staging] FAIL (1) helmfile.d/services/miscweb: apply
11:48 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/toolhub: apply
11:48 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox: apply
11:47 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventstreams-internal: apply
11:47 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventgate-main: apply
11:46 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox-syntaxhighlight: apply
11:46 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventgate-analytics: apply
11:45 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/linkrecommendation: apply
11:45 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/mobileapps: apply
11:43 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/device-analytics: apply
11:43 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/citoid: apply
11:42 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox-media: apply
11:41 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/zotero: apply
11:41 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/cxserver: apply
11:40 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventgate-analytics-external: apply
11:33 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/apertium: apply
11:32 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/tegola-vector-tiles: apply
11:31 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox-timeline: apply
11:29 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox-constraints: apply
11:28 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/api-gateway: apply
11:28 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/wikifeeds: apply
11:27 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/termbox: apply
11:26 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/blubberoid: apply
11:26 jayme@deploy1002: helmfile [staging] FAIL (1) helmfile.d/services/similar-users: apply
11:17 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:17 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
11:12 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/push-notifications: apply
11:10 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/rdf-streaming-updater: apply
11:08 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:08 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
11:04 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:04 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
11:03 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:03 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
10:53 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventstreams: -i apply
10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/echostore: -i apply
10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/image-suggestion: -i apply
10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/recommendation-api: -i apply
10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/machinetranslation: -i apply
10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/sessionstore: -i apply
10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/developer-portal: -i apply
10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventgate-logging-external: -i apply
10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/rest-gateway: -i apply
10:20 jayme@deploy1002: helmfile [staging] FAIL (3) helmfile.d/services/mw-api-int: -i apply
10:20 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/rest-gateway: -i apply
10:05 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
10:05 jayme@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
10:05 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
10:05 jayme@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
10:04 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.16 refs T340244
09:55 jnuche@deploy1002: Pruned MediaWiki: 1.41.0-wmf.13 (duration: 02m 11s)
09:52 jnuche@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.16 refs T340244 (duration: 50m 51s)
09:48 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
09:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
09:47 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
09:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
09:46 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
09:46 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
09:43 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
09:42 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
09:42 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
09:42 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
09:42 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
09:41 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
09:38 jayme: updated envoyproxy to 1.23.10 on all nodes - T300324
09:37 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
09:37 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
09:36 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
09:36 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
09:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-workers (exit_code=99) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
09:02 jnuche@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.16 refs T340244
08:56 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.

2023-07-03

22:00 eileen: civicrm upgraded from 9e04c92d to 4ca2008d
20:18 jiji@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
20:15 effie: restarting swift proxies
20:14 jiji@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
19:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
19:11 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
19:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
19:09 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
19:09 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
19:09 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
16:10 effie: restarting pybal on lvs2013
16:04 effie: restarting pybal on lvs2014
15:57 effie: restarting pybal on lvs2014
15:52 effie: restarting pybal on lvs1019
15:49 effie: restarting pybal on lvs1020
15:34 jiji@cumin1001: conftool action : set/weight=10; selector: name=kubestagemaster1002.eqiad.wmnet
15:34 jiji@cumin1001: conftool action : set/weight=10; selector: name=kubestagemaster2002.codfw.wmnet
15:34 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=kubestagemaster1002.eqiad.wmnet
15:34 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=kubestagemaster2002.codfw.wmnet
15:14 jiji@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=kubernetes-staging,service=kubemaster
15:12 jiji@cumin1001: conftool action : set/weight=10; selector: name=kubestagemaster1002.eqiad.wmnet
15:12 jiji@cumin1001: conftool action : set/weight=10; selector: name=kubestagemaster1001.eqiad.wmnet
15:09 moritzm: installing Java 8 security updates on Hadoop systems
14:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2002.codfw.wmnet with OS bullseye
14:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
14:01 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
13:50 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubestagemaster2002.codfw.wmnet with OS bullseye
13:37 moritzm: installing openjdk-8 security updates
13:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet
13:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
13:28 urbanecm: UTC afternoon B&C window done
13:27 urbanecm@deploy1002: Finished scap: Backport for SpecialLog: Fix issues related to IP users (T338042 T340929) (duration: 08m 32s)
13:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
13:22 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
13:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
13:22 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
13:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1011.eqiad.wmnet
13:21 urbanecm: Run `wikiadmin2023@10.64.16.184(idwiki)> DELETE FROM `category` WHERE cat_title = ; ` (T336780)
13:20 urbanecm@deploy1002: func and urbanecm: Backport for SpecialLog: Fix issues related to IP users (T338042 T340929) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
13:19 urbanecm@deploy1002: Started scap: Backport for SpecialLog: Fix issues related to IP users (T338042 T340929)
13:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
13:18 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1011.eqiad.wmnet
13:10 urbanecm@deploy1002: Finished scap: Backport for Set wgCollectionDisableSidebarLink for nowiki (T340981) (duration: 08m 09s)
13:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
13:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
13:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
13:04 urbanecm@deploy1002: jhsoby and urbanecm: Backport for Set wgCollectionDisableSidebarLink for nowiki (T340981) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
13:02 urbanecm@deploy1002: Started scap: Backport for Set wgCollectionDisableSidebarLink for nowiki (T340981)
13:00 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
13:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
12:59 kart_: Updated MinT to 2023-06-29-061037-production (T340709 + Fixed repeatation with Santali)
12:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
12:51 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
12:50 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
12:46 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
12:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
12:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
12:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
12:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
12:33 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
12:31 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
12:23 kart_: Updated cxserver to 2023-07-03-045311-production (T285217)
12:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
12:18 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
12:17 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
12:12 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Awjrichards out of all services on: 1271 hosts
12:12 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
12:12 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Awjrichards out of all services on: 1271 hosts
12:12 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Awjrichards out of all services on: 20 hosts
12:12 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Awjrichards out of all services on: 20 hosts
12:12 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
12:11 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Awjrichards out of all services on: 760 hosts
12:11 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Awjrichards out of all services on: 760 hosts
12:09 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
12:08 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
12:04 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging JMinor out of all services on: 760 hosts
12:04 jmm@cumin2002: START - Cookbook sre.idm.logout Logging JMinor out of all services on: 760 hosts
12:04 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging JMinor out of all services on: 1271 hosts
12:03 jmm@cumin2002: START - Cookbook sre.idm.logout Logging JMinor out of all services on: 1271 hosts
12:03 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging JMinor out of all services on: 20 hosts
12:03 jmm@cumin2002: START - Cookbook sre.idm.logout Logging JMinor out of all services on: 20 hosts
11:38 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:35 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:35 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:33 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
11:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
11:26 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jameel Kaisar out of all services on: 20 hosts
11:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jameel Kaisar out of all services on: 20 hosts
11:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jameel Kaisar out of all services on: 1271 hosts
11:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jameel Kaisar out of all services on: 1271 hosts
11:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jameel Kaisar out of all services on: 760 hosts
11:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jameel Kaisar out of all services on: 760 hosts
11:16 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:16 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add VIP for kubestagemaster - jiji@cumin1001"
11:15 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add VIP for kubestagemaster - jiji@cumin1001"
11:11 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 760 hosts
11:10 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 760 hosts
11:10 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 1271 hosts
11:09 jiji@cumin1001: START - Cookbook sre.dns.netbox
11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 1271 hosts
11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 20 hosts
11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 20 hosts
11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Appledora out of all services on: 20 hosts
11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Appledora out of all services on: 20 hosts
10:53 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
10:53 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
10:53 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
10:53 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
10:53 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
10:53 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
10:52 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
10:52 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
10:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
10:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
10:51 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:51 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
10:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
10:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:49 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
10:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:48 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
10:48 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
10:48 topranks: Re-activating Vodafone DE peering at AMS-IX T340670
10:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:47 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:47 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:46 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:42 jayme: imported envoyproxy 1.23.10 to buster-wikimedia, bullseye-wikimedia, bookworm-wikimedia - T300324
10:19 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
10:18 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:18 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
10:17 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
10:03 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
09:58 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
09:58 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
09:58 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
09:58 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
09:57 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
09:57 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
09:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
09:45 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Appledora out of all services on: 1271 hosts
09:44 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Appledora out of all services on: 1271 hosts
09:37 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Appledora out of all services on: 760 hosts
09:37 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Appledora out of all services on: 760 hosts
09:36 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
09:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
09:35 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
09:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
09:34 ladsgroup@deploy1002: Finished scap: Backport for Set externallinks migration to read new everywhere except commons (T335343) (duration: 10m 46s)
09:34 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP host
09:34 volans@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP host
09:25 ladsgroup@deploy1002: ladsgroup: Backport for Set externallinks migration to read new everywhere except commons (T335343) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
09:24 ladsgroup@deploy1002: Started scap: Backport for Set externallinks migration to read new everywhere except commons (T335343)
09:19 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bruno Scarone out of all services on: 760 hosts
09:19 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bruno Scarone out of all services on: 760 hosts
09:18 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bruno Scarone out of all services on: 1271 hosts
09:18 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bruno Scarone out of all services on: 1271 hosts
09:18 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bruno Scarone out of all services on: 20 hosts
09:17 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bruno Scarone out of all services on: 20 hosts
09:13 lucaswerkmeister-wmde:: Deployed security patch for T339016
09:05 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Barakat Ajadi out of all services on: 4 hosts
09:04 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Barakat Ajadi out of all services on: 4 hosts
08:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
08:56 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Michael.hay out of all services on: 20 hosts
08:56 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Michael.hay out of all services on: 20 hosts
08:56 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Michael.hay out of all services on: 1271 hosts
08:55 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Michael.hay out of all services on: 1271 hosts
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Michael.hay out of all services on: 760 hosts
08:55 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Michael.hay out of all services on: 760 hosts
08:54 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Skye Berghel out of all services on: 760 hosts
08:54 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Skye Berghel out of all services on: 760 hosts
08:54 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Skye Berghel out of all services on: 1271 hosts
08:53 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Skye Berghel out of all services on: 1271 hosts
08:53 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Skye Berghel out of all services on: 20 hosts
08:53 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Skye Berghel out of all services on: 20 hosts
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tom Magerlein out of all services on: 20 hosts
08:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Tom Magerlein out of all services on: 20 hosts
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tom Magerlein out of all services on: 1271 hosts
08:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Tom Magerlein out of all services on: 1271 hosts
08:51 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tom Magerlein out of all services on: 760 hosts
08:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Tom Magerlein out of all services on: 760 hosts
08:51 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Damiendf out of all services on: 760 hosts
08:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Damiendf out of all services on: 760 hosts
08:50 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Damiendf out of all services on: 1271 hosts
08:50 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Damiendf out of all services on: 1271 hosts
08:50 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Damiendf out of all services on: 20 hosts
08:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Damiendf out of all services on: 20 hosts
08:49 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dasm out of all services on: 20 hosts
08:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dasm out of all services on: 20 hosts
08:49 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging David.pujol out of all services on: 1271 hosts
08:48 jmm@cumin2002: START - Cookbook sre.idm.logout Logging David.pujol out of all services on: 1271 hosts
08:48 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging David.pujol out of all services on: 760 hosts
08:48 jmm@cumin2002: START - Cookbook sre.idm.logout Logging David.pujol out of all services on: 760 hosts
08:48 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
08:47 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dasm out of all services on: 760 hosts
08:47 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dasm out of all services on: 760 hosts
08:47 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dasm out of all services on: 1271 hosts
08:46 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dasm out of all services on: 1271 hosts
08:46 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dasm out of all services on: 20 hosts
08:46 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dasm out of all services on: 20 hosts
08:45 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Aranyap out of all services on: 20 hosts
08:45 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Aranyap out of all services on: 20 hosts
08:45 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Aranyap out of all services on: 1271 hosts
08:45 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Aranyap out of all services on: 1271 hosts
08:44 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Aranyap out of all services on: 760 hosts
08:44 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Aranyap out of all services on: 760 hosts
08:44 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
08:33 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
07:54 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: sync
07:54 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: sync
07:32 taavi@deploy1002: Finished scap: Backport for Update plwiki autopromote per consensus (T340397) (duration: 07m 48s)
07:25 taavi@deploy1002: msz2001 and taavi: Backport for Update plwiki autopromote per consensus (T340397) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
07:24 taavi@deploy1002: Started scap: Backport for Update plwiki autopromote per consensus (T340397)
07:22 taavi@deploy1002: Finished scap: Backport for Enable edit-in-sequence in Italian Wikisource (T340847) (duration: 18m 21s)
07:13 taavi@deploy1002: soda and taavi: Backport for Enable edit-in-sequence in Italian Wikisource (T340847) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
07:04 taavi@deploy1002: Started scap: Backport for Enable edit-in-sequence in Italian Wikisource (T340847)

2023-07-01

08:20 hashar@deploy1002: Finished scap: Backport for DeleteAction: Call setAction for file revision delete (T340821) (duration: 09m 17s)
08:12 hashar@deploy1002: hashar: Backport for DeleteAction: Call setAction for file revision delete (T340821) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
08:11 hashar@deploy1002: Started scap: Backport for DeleteAction: Call setAction for file revision delete (T340821)

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020s