Server Admin Log/Archive 68

From Wikitech

2023-07-31

  • 23:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T342617)', diff saved to https://phabricator.wikimedia.org/P49860 and previous config saved to /var/cache/conftool/dbconfig/20230731-235442-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T342617)', diff saved to https://phabricator.wikimedia.org/P49859 and previous config saved to /var/cache/conftool/dbconfig/20230731-233039-ladsgroup.json
  • 23:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T342617)', diff saved to https://phabricator.wikimedia.org/P49858 and previous config saved to /var/cache/conftool/dbconfig/20230731-233018-ladsgroup.json
  • 23:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P49857 and previous config saved to /var/cache/conftool/dbconfig/20230731-231512-ladsgroup.json
  • 23:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P49856 and previous config saved to /var/cache/conftool/dbconfig/20230731-230006-ladsgroup.json
  • 22:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T342617)', diff saved to https://phabricator.wikimedia.org/P49855 and previous config saved to /var/cache/conftool/dbconfig/20230731-224500-ladsgroup.json
  • 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1223 (T342617)', diff saved to https://phabricator.wikimedia.org/P49854 and previous config saved to /var/cache/conftool/dbconfig/20230731-223547-ladsgroup.json
  • 22:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 22:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 22:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T342617)', diff saved to https://phabricator.wikimedia.org/P49853 and previous config saved to /var/cache/conftool/dbconfig/20230731-223526-ladsgroup.json
  • 22:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P49852 and previous config saved to /var/cache/conftool/dbconfig/20230731-222020-ladsgroup.json
  • 22:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P49851 and previous config saved to /var/cache/conftool/dbconfig/20230731-220514-ladsgroup.json
  • 21:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T342617)', diff saved to https://phabricator.wikimedia.org/P49850 and previous config saved to /var/cache/conftool/dbconfig/20230731-215008-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T342617)', diff saved to https://phabricator.wikimedia.org/P49849 and previous config saved to /var/cache/conftool/dbconfig/20230731-213017-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 21:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 21:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 21:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T342617)', diff saved to https://phabricator.wikimedia.org/P49848 and previous config saved to /var/cache/conftool/dbconfig/20230731-212941-ladsgroup.json
  • 21:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P49847 and previous config saved to /var/cache/conftool/dbconfig/20230731-211435-ladsgroup.json
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P49846 and previous config saved to /var/cache/conftool/dbconfig/20230731-205928-ladsgroup.json
  • 20:45 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T342617)', diff saved to https://phabricator.wikimedia.org/P49845 and previous config saved to /var/cache/conftool/dbconfig/20230731-204422-ladsgroup.json
  • 20:37 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1212 (T342617)', diff saved to https://phabricator.wikimedia.org/P49844 and previous config saved to /var/cache/conftool/dbconfig/20230731-203451-ladsgroup.json
  • 20:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T342617)', diff saved to https://phabricator.wikimedia.org/P49843 and previous config saved to /var/cache/conftool/dbconfig/20230731-203413-ladsgroup.json
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P49842 and previous config saved to /var/cache/conftool/dbconfig/20230731-201907-ladsgroup.json
  • 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P49841 and previous config saved to /var/cache/conftool/dbconfig/20230731-200401-ladsgroup.json
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T342617)', diff saved to https://phabricator.wikimedia.org/P49840 and previous config saved to /var/cache/conftool/dbconfig/20230731-194854-ladsgroup.json
  • 19:03 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@47f9458]: (no justification provided) (duration: 00m 16s)
  • 19:03 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@47f9458]: (no justification provided)
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T342617)', diff saved to https://phabricator.wikimedia.org/P49839 and previous config saved to /var/cache/conftool/dbconfig/20230731-184200-ladsgroup.json
  • 18:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T342617)', diff saved to https://phabricator.wikimedia.org/P49838 and previous config saved to /var/cache/conftool/dbconfig/20230731-184140-ladsgroup.json
  • 18:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P49837 and previous config saved to /var/cache/conftool/dbconfig/20230731-182633-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T342617)', diff saved to https://phabricator.wikimedia.org/P49836 and previous config saved to /var/cache/conftool/dbconfig/20230731-182114-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 18:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P49835 and previous config saved to /var/cache/conftool/dbconfig/20230731-181127-ladsgroup.json
  • 18:00 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 17:59 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 17:59 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 17:57 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 17:57 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:56 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T342617)', diff saved to https://phabricator.wikimedia.org/P49834 and previous config saved to /var/cache/conftool/dbconfig/20230731-175621-ladsgroup.json
  • 17:04 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1108.eqiad.wmnet
  • 17:04 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:04 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1108.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
  • 17:02 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1108.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
  • 17:00 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 17:00 btullis@cumin1001: Added views for new wiki: gpewiki T338678
  • 16:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T342617)', diff saved to https://phabricator.wikimedia.org/P49833 and previous config saved to /var/cache/conftool/dbconfig/20230731-164759-ladsgroup.json
  • 16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 16:47 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 16:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T342617)', diff saved to https://phabricator.wikimedia.org/P49832 and previous config saved to /var/cache/conftool/dbconfig/20230731-164738-ladsgroup.json
  • 16:42 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts db1108.eqiad.wmnet
  • 16:34 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P49831 and previous config saved to /var/cache/conftool/dbconfig/20230731-163232-ladsgroup.json
  • 16:30 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-airflow1003.eqiad.wmnet
  • 16:30 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:30 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
  • 16:28 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1001"
  • 16:25 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 16:19 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts an-airflow1003.eqiad.wmnet
  • 16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P49830 and previous config saved to /var/cache/conftool/dbconfig/20230731-161726-ladsgroup.json
  • 16:08 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 24s)
  • 16:07 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 16:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49829 and previous config saved to /var/cache/conftool/dbconfig/20230731-160500-ladsgroup.json
  • 16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T342617)', diff saved to https://phabricator.wikimedia.org/P49828 and previous config saved to /var/cache/conftool/dbconfig/20230731-160220-ladsgroup.json
  • 16:01 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 44s)
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P49827 and previous config saved to /var/cache/conftool/dbconfig/20230731-154954-ladsgroup.json
  • 15:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P49826 and previous config saved to /var/cache/conftool/dbconfig/20230731-153448-ladsgroup.json
  • 15:20 volans: deploying python3-wmflib fleet wide
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49825 and previous config saved to /var/cache/conftool/dbconfig/20230731-151942-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T342617)', diff saved to https://phabricator.wikimedia.org/P49824 and previous config saved to /var/cache/conftool/dbconfig/20230731-145252-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T342617)', diff saved to https://phabricator.wikimedia.org/P49823 and previous config saved to /var/cache/conftool/dbconfig/20230731-145232-ladsgroup.json
  • 14:47 sukhe: finished rolling out gdnsd 3.99.0~alpha2 upgrade
  • 14:45 fabfur: imported prometheus-rdkafka-exporter package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/software/prometheus-rdkafka-exporter/+/942613) T342154
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P49821 and previous config saved to /var/cache/conftool/dbconfig/20230731-143725-ladsgroup.json
  • 14:32 fabfur: imported file-read-backwards package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/debs/file-read-backwards/+/942491) T342154
  • 14:31 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 14:26 volans: uploaded python3-wmflib_1.2.3 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
  • 14:25 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P49820 and previous config saved to /var/cache/conftool/dbconfig/20230731-142220-ladsgroup.json
  • 14:21 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 14:10 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:09 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) (T342964) (duration: 07m 27s)
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T342617)', diff saved to https://phabricator.wikimedia.org/P49819 and previous config saved to /var/cache/conftool/dbconfig/20230731-140713-ladsgroup.json
  • 14:05 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:03 jforrester@deploy1002: jforrester: Continuing with sync
  • 14:03 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) (T342964) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:02 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Add WF as alias for NS_PROJECT (and WT for its talk) (T342964)
  • 13:59 sukhe: reprepro -C component/dnsdist include bookworm-wikimedia dnsdist_1.8.0-1+wmf12u1_amd64.changes: T342154
  • 13:57 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Disable the Collection extension for now, broken (T342931) (duration: 07m 38s)
  • 13:57 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:52 sukhe: reprepro -C main include bullseye-wikimedia gdnsd_3.99.0~alpha2-1_amd64.changes
  • 13:51 jforrester@deploy1002: jforrester: Continuing with sync
  • 13:51 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Disable the Collection extension for now, broken (T342931) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:50 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Disable the Collection extension for now, broken (T342931)
  • 13:49 jforrester@deploy1002: Synchronized wmf-config/interwiki.php: T325908 (duration: 06m 25s)
  • 13:45 moritzm: install gtk+3.0 bugfix updates from Bullseye 11.7 point release
  • 13:44 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:42 fabfur: imported fifo-log-demux package into bookworm-wikimedia (https://gerrit.wikimedia.org/r/c/operations/software/fifo-log-demux/+/942414) T342154
  • 13:38 jforrester@deploy1002: Finished scap: Backport for Remove F: namespace alias (T325910) (duration: 24m 24s)
  • 13:37 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 100%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49818 and previous config saved to /var/cache/conftool/dbconfig/20230731-133707-root.json
  • 13:29 jforrester@deploy1002: jforrester and epicpupper: Continuing with sync
  • 13:29 jforrester@deploy1002: jforrester and epicpupper: Backport for Remove F: namespace alias (T325910) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:24 moritzm: imported jenkins 2.401.3 to thirdparty/ci for bullseye-wikimedia T342572
  • 13:22 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 75%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49817 and previous config saved to /var/cache/conftool/dbconfig/20230731-132201-root.json
  • 13:14 jforrester@deploy1002: Started scap: Backport for Remove F: namespace alias (T325910)
  • 13:13 James_F: WikiLambda backport verified for T342891 T342687 T341500 T343006 T342901 and T343041
  • 13:09 jforrester@deploy1002: Synchronized php-1.41.0-wmf.19/extensions/WikiLambda/: (no justification provided) (duration: 07m 16s)
  • 13:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 50%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49816 and previous config saved to /var/cache/conftool/dbconfig/20230731-130657-root.json
  • 13:00 jnuche: CI Jenkins upgraded to 2.401.3: https://phabricator.wikimedia.org/T342572
  • 12:57 moritzm: installing 6.1.38 kernels on Bookworm hosts
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T342617)', diff saved to https://phabricator.wikimedia.org/P49815 and previous config saved to /var/cache/conftool/dbconfig/20230731-125513-ladsgroup.json
  • 12:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1196.eqiad.wmnet with reason: Maint
  • 12:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1196.eqiad.wmnet with reason: Maint
  • 12:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1196 T342284', diff saved to https://phabricator.wikimedia.org/P49814 and previous config saved to /var/cache/conftool/dbconfig/20230731-125252-ladsgroup.json
  • 12:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 25%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49813 and previous config saved to /var/cache/conftool/dbconfig/20230731-125152-root.json
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2127 (T342617)', diff saved to https://phabricator.wikimedia.org/P49812 and previous config saved to /var/cache/conftool/dbconfig/20230731-124912-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 12:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 12:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T342617)', diff saved to https://phabricator.wikimedia.org/P49811 and previous config saved to /var/cache/conftool/dbconfig/20230731-124851-ladsgroup.json
  • 12:36 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 10%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49810 and previous config saved to /var/cache/conftool/dbconfig/20230731-123647-root.json
  • 12:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P49809 and previous config saved to /var/cache/conftool/dbconfig/20230731-123345-ladsgroup.json
  • 12:32 moritzm: installing xapian-core bugfix updates on Bullseye
  • 12:23 moritzm: installing mariadb-10.5 updates from Bullseye 11.7 point release (libs/tools, unrelated to wmf-mariadb packages)
  • 12:21 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 5%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49808 and previous config saved to /var/cache/conftool/dbconfig/20230731-122142-root.json
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P49807 and previous config saved to /var/cache/conftool/dbconfig/20230731-121839-ladsgroup.json
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 3%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49806 and previous config saved to /var/cache/conftool/dbconfig/20230731-120638-root.json
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T342617)', diff saved to https://phabricator.wikimedia.org/P49805 and previous config saved to /var/cache/conftool/dbconfig/20230731-120332-ladsgroup.json
  • 11:51 marostegui@cumin1001: dbctl commit (dc=all): 'db2114 (re)pooling @ 1%: Repooling after migration', diff saved to https://phabricator.wikimedia.org/P49804 and previous config saved to /var/cache/conftool/dbconfig/20230731-115133-root.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2114 T334650', diff saved to https://phabricator.wikimedia.org/P49803 and previous config saved to /var/cache/conftool/dbconfig/20230731-114645-root.json
  • 11:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:11 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 11:11 btullis@cumin1001: Added views for new wiki: wikifunctionswiki T289316
  • 11:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:55 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 10:45 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:45 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 10:36 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,name=parse1002.eqiad.wmnet
  • 10:36 claime: Repooling parse1002 following CPU replacement - T339340
  • 10:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for parse1002.eqiad.wmnet
  • 10:34 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for parse1002.eqiad.wmnet
  • 10:28 _joe_: disabling puppet on mwdebug2002, testing noc.wikimedia.org
  • 10:20 moritzm: installing bind9 security updates (client-side tools/libs)
  • 10:11 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:02 btullis@cumin1001: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0)
  • 10:02 btullis@cumin1001: Added views for new wiki: btmwiktionary T342670
  • 09:56 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
  • 09:54 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
  • 09:51 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable Lift Wing for most wikis (T342115) (duration: 23m 00s)
  • 09:50 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:45 ladsgroup@deploy1002: ladsgroup and isaranto: Continuing with sync
  • 09:41 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,name=parse1002.eqiad.wmnet
  • 09:37 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 09:36 btullis@cumin1001: START - Cookbook sre.wikireplicas.add-wiki
  • 09:32 urbanecm: Unblock stuck global rename by running `extensions/CentralAuth/maintenance/fixStuckGlobalRename.php` (T343099)
  • 09:29 ladsgroup@deploy1002: ladsgroup and isaranto: Backport for ores-extension: enable Lift Wing for most wikis (T342115) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 09:29 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1016.eqiad.wmnet with OS bookworm
  • 09:29 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 09:28 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable Lift Wing for most wikis (T342115)
  • 09:28 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 09:27 ladsgroup@deploy1002: Finished scap: Backport for Remove ak from wgImportSources (T333765) (duration: 08m 10s)
  • 09:21 ladsgroup@deploy1002: amire80 and ladsgroup: Continuing with sync
  • 09:20 ladsgroup@deploy1002: amire80 and ladsgroup: Backport for Remove ak from wgImportSources (T333765) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 09:19 ladsgroup@deploy1002: Started scap: Backport for Remove ak from wgImportSources (T333765)
  • 09:13 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
  • 09:10 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1016.eqiad.wmnet with reason: host reimage
  • 08:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 08:53 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 08:52 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1016.eqiad.wmnet with OS bookworm
  • 08:42 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T342617)', diff saved to https://phabricator.wikimedia.org/P49802 and previous config saved to /var/cache/conftool/dbconfig/20230731-083941-ladsgroup.json
  • 08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 08:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 07:21 taavi@deploy1002: Finished scap: Backport for ruwikibooks: Set wgRestrictDisplayTitle to false (T342800) (duration: 17m 02s)
  • 07:15 taavi@deploy1002: anzx and taavi: Continuing with sync
  • 07:13 taavi@deploy1002: anzx and taavi: Backport for ruwikibooks: Set wgRestrictDisplayTitle to false (T342800) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:04 taavi@deploy1002: Started scap: Backport for ruwikibooks: Set wgRestrictDisplayTitle to false (T342800)
  • 06:06 moritzm: imported jenkins 2.401.3 to thirdparty/ci for buster-wikimedia T342572

2023-07-29

  • 16:59 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1130 T343077', diff saved to https://phabricator.wikimedia.org/P49801 and previous config saved to /var/cache/conftool/dbconfig/20230729-165954-root.json
  • 16:58 marostegui@cumin1001: dbctl commit (dc=all): 'Promote db1183 to s5 primary T343077', diff saved to https://phabricator.wikimedia.org/P49800 and previous config saved to /var/cache/conftool/dbconfig/20230729-165813-root.json
  • 16:57 marostegui@cumin1001: dbctl commit (dc=all): 'Emergency switchover T343077', diff saved to https://phabricator.wikimedia.org/P49799 and previous config saved to /var/cache/conftool/dbconfig/20230729-165748-root.json
  • 16:57 marostegui: Starting emergency s5 eqiad failover from db1130 to db1183 - T343077 T343076
  • 16:36 marostegui@cumin1001: dbctl commit (dc=all): 'Set db1183 with weight 0 T343077', diff saved to https://phabricator.wikimedia.org/P49798 and previous config saved to /var/cache/conftool/dbconfig/20230729-163621-root.json
  • 16:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T343077
  • 16:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T343077
  • 16:19 _joe_: set read_only=0 on db1130
  • 16:15 _joe_: systemctl start mariadb.service on db1130

2023-07-28

  • 22:17 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@1ff1629]: Updating webrequest refine to include wikifunctions (duration: 00m 21s)
  • 22:16 milimetric@deploy1002: Started deploy [airflow-dags/analytics@1ff1629]: Updating webrequest refine to include wikifunctions
  • 22:03 milimetric@deploy1002: Finished deploy [analytics/refinery@f7e74ae] (thin): Fix wikifunction special page (duration: 00m 03s)
  • 22:03 milimetric@deploy1002: Started deploy [analytics/refinery@f7e74ae] (thin): Fix wikifunction special page
  • 22:00 milimetric@deploy1002: Finished deploy [analytics/refinery@f7e74ae]: Fix wikifunction special page (duration: 10m 18s)
  • 21:50 milimetric@deploy1002: Started deploy [analytics/refinery@f7e74ae]: Fix wikifunction special page
  • 20:12 milimetric@deploy1002: Finished deploy [analytics/refinery@53db2ca]: Publish refinery-source-0.2.19 (duration: 16m 53s)
  • 19:55 milimetric@deploy1002: Started deploy [analytics/refinery@53db2ca]: Publish refinery-source-0.2.19
  • 19:37 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@4d8c3db]: Deploying T342926 and https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/469 (duration: 00m 14s)
  • 19:37 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@4d8c3db]: Deploying T342926 and https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/469
  • 18:12 zabe: zabe@mwmaint1002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=brwikimedia --logwiki=metawiki 'Viniciuspontesoficial' 'Eusouvinipontes' # T343013
  • 16:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:54 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 15:44 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:42 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 15:34 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:34 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:32 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:27 kamila@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:26 kamila@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:26 kamila_: k8s: delete and recreate the benthos-cache-invalidator namespace
  • 15:25 kamila@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:25 kamila@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:40 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:25 milimetric@deploy1002: Finished deploy [analytics/refinery@1523f12] (thin): Patch sqoop of wikifunctions (duration: 00m 03s)
  • 14:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 14:25 milimetric@deploy1002: Started deploy [analytics/refinery@1523f12] (thin): Patch sqoop of wikifunctions
  • 14:23 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:23 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 14:21 milimetric@deploy1002: Finished deploy [analytics/refinery@1523f12]: Patch sqoop of wikifunctions (duration: 06m 11s)
  • 14:15 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 14:15 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 14:15 milimetric@deploy1002: Started deploy [analytics/refinery@1523f12]: Patch sqoop of wikifunctions
  • 14:14 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 14:08 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 14:03 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 13:30 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 12:28 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 12:28 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 10:51 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:50 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:45 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:45 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:44 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 10:41 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 10:41 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:41 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:39 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 10:38 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 10:36 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 10:36 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 10:33 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 10:33 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 10:29 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:28 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:25 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:18 dcausse: T342924: created search indices for wikifunctions
  • 10:00 aikochou@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:57 aikochou@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1108.eqiad.wmnet with reason: db1108 has been replaced with db1208 - leaving for a few days before decom
  • 09:07 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1108.eqiad.wmnet with reason: db1108 has been replaced with db1208 - leaving for a few days before decom
  • 00:20 tgr@deploy1002: Finished scap: Backport for help: Fix navigation in the help panel (T342927) (duration: 10m 09s)
  • 00:14 tgr@deploy1002: tgr: Continuing with sync
  • 00:11 tgr@deploy1002: tgr: Backport for help: Fix navigation in the help panel (T342927) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 00:10 tgr@deploy1002: Started scap: Backport for help: Fix navigation in the help panel (T342927)

2023-07-27

  • 21:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 21:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 21:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T342617)', diff saved to https://phabricator.wikimedia.org/P49790 and previous config saved to /var/cache/conftool/dbconfig/20230727-214302-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P49789 and previous config saved to /var/cache/conftool/dbconfig/20230727-212756-ladsgroup.json
  • 21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P49788 and previous config saved to /var/cache/conftool/dbconfig/20230727-211250-ladsgroup.json
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T342617)', diff saved to https://phabricator.wikimedia.org/P49787 and previous config saved to /var/cache/conftool/dbconfig/20230727-205744-ladsgroup.json
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1224 (T342617)', diff saved to https://phabricator.wikimedia.org/P49786 and previous config saved to /var/cache/conftool/dbconfig/20230727-203435-ladsgroup.json
  • 20:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49785 and previous config saved to /var/cache/conftool/dbconfig/20230727-203415-ladsgroup.json
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P49784 and previous config saved to /var/cache/conftool/dbconfig/20230727-201908-ladsgroup.json
  • 20:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316', diff saved to https://phabricator.wikimedia.org/P49783 and previous config saved to /var/cache/conftool/dbconfig/20230727-200402-ladsgroup.json
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1213:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49782 and previous config saved to /var/cache/conftool/dbconfig/20230727-194856-ladsgroup.json
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb1014.eqiad.wmnet with OS bullseye
  • 19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1014.eqiad.wmnet with reason: host reimage
  • 19:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1014.eqiad.wmnet with reason: host reimage
  • 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1213:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49781 and previous config saved to /var/cache/conftool/dbconfig/20230727-190637-ladsgroup.json
  • 19:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 19:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T342617)', diff saved to https://phabricator.wikimedia.org/P49780 and previous config saved to /var/cache/conftool/dbconfig/20230727-190617-ladsgroup.json
  • 18:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P49779 and previous config saved to /var/cache/conftool/dbconfig/20230727-185110-ladsgroup.json
  • 18:41 milimetric@deploy1002: Finished deploy [analytics/refinery@1af57de] (thin): Deploying to sync script updates and static files (duration: 00m 04s)
  • 18:41 milimetric@deploy1002: Started deploy [analytics/refinery@1af57de] (thin): Deploying to sync script updates and static files
  • 18:41 milimetric@deploy1002: Finished deploy [analytics/refinery@1af57de]: Deploying to sync script updates and static files (duration: 08m 25s)
  • 18:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P49778 and previous config saved to /var/cache/conftool/dbconfig/20230727-183604-ladsgroup.json
  • 18:33 milimetric@deploy1002: Started deploy [analytics/refinery@1af57de]: Deploying to sync script updates and static files
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T342617)', diff saved to https://phabricator.wikimedia.org/P49777 and previous config saved to /var/cache/conftool/dbconfig/20230727-182058-ladsgroup.json
  • 18:13 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
  • 18:12 krinkle@deploy1002: Finished deploy [performance/navtiming@c868e79]: Rename FID labels (Ibab711), Remove QuickSurveys (T336169), Add Vietnam (T340714) (duration: 00m 05s)
  • 18:12 krinkle@deploy1002: Started deploy [performance/navtiming@c868e79]: Rename FID labels (Ibab711), Remove QuickSurveys (T336169), Add Vietnam (T340714)
  • 18:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host rdb1013.eqiad.wmnet with OS bullseye
  • 18:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T342617)', diff saved to https://phabricator.wikimedia.org/P49776 and previous config saved to /var/cache/conftool/dbconfig/20230727-175659-ladsgroup.json
  • 17:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 17:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T342617)', diff saved to https://phabricator.wikimedia.org/P49775 and previous config saved to /var/cache/conftool/dbconfig/20230727-175638-ladsgroup.json
  • 17:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on rdb1013.eqiad.wmnet with reason: host reimage
  • 17:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P49774 and previous config saved to /var/cache/conftool/dbconfig/20230727-174132-ladsgroup.json
  • 17:41 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on rdb1013.eqiad.wmnet with reason: host reimage
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P49773 and previous config saved to /var/cache/conftool/dbconfig/20230727-172626-ladsgroup.json
  • 17:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
  • 17:04 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:03 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:03 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:02 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:02 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:52 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable lw on itwiki and hewiki (T342115) (duration: 20m 53s)
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T342617)', diff saved to https://phabricator.wikimedia.org/P49771 and previous config saved to /var/cache/conftool/dbconfig/20230727-164711-ladsgroup.json
  • 16:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49770 and previous config saved to /var/cache/conftool/dbconfig/20230727-164650-ladsgroup.json
  • 16:45 dancy@deploy1002: Installation of scap version "4.57.0" completed for 600 hosts
  • 16:44 dancy@deploy1002: Installing scap version "4.57.0" for 600 hosts
  • 16:41 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 16:41 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 16:34 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 16:34 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 16:33 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores-extension: enable lw on itwiki and hewiki (T342115) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:31 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable lw on itwiki and hewiki (T342115)
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P49769 and previous config saved to /var/cache/conftool/dbconfig/20230727-163144-ladsgroup.json
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P49768 and previous config saved to /var/cache/conftool/dbconfig/20230727-161638-ladsgroup.json
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49766 and previous config saved to /var/cache/conftool/dbconfig/20230727-160132-ladsgroup.json
  • 15:56 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 15:56 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:49 jynus: restart db2097
  • 15:49 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 15:49 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 15:45 zabe@deploy1002: Finished scap: Backport for Revert "CheckUser event table migration: Write new on group0" (T342902) (duration: 07m 43s)
  • 15:44 kamila@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:42 kamila@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:40 kamila@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:40 kamila@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:40 kamila@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:40 kamila@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:38 zabe@deploy1002: zabe and dreamyjazz: Backport for Revert "CheckUser event table migration: Write new on group0" (T342902) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 15:37 zabe@deploy1002: Started scap: Backport for Revert "CheckUser event table migration: Write new on group0" (T342902)
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49764 and previous config saved to /var/cache/conftool/dbconfig/20230727-153649-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T342617)', diff saved to https://phabricator.wikimedia.org/P49763 and previous config saved to /var/cache/conftool/dbconfig/20230727-153629-ladsgroup.json
  • 15:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P49762 and previous config saved to /var/cache/conftool/dbconfig/20230727-152123-ladsgroup.json
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P49761 and previous config saved to /var/cache/conftool/dbconfig/20230727-150616-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T342617)', diff saved to https://phabricator.wikimedia.org/P49759 and previous config saved to /var/cache/conftool/dbconfig/20230727-145110-ladsgroup.json
  • 14:33 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:27 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1173 (T342617)', diff saved to https://phabricator.wikimedia.org/P49758 and previous config saved to /var/cache/conftool/dbconfig/20230727-142721-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T342617)', diff saved to https://phabricator.wikimedia.org/P49757 and previous config saved to /var/cache/conftool/dbconfig/20230727-142700-ladsgroup.json
  • 14:26 cmooney@cumin1001: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 14:25 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:25 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 14:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for ssw1-a1-codfw - cmooney@cumin1001"
  • 14:22 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:22 cmooney@cumin1001: START - Cookbook sre.network.provision for device ssw1-a1-codfw.mgmt.codfw.wmnet
  • 14:20 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:19 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 14:19 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P49756 and previous config saved to /var/cache/conftool/dbconfig/20230727-141154-ladsgroup.json
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P49755 and previous config saved to /var/cache/conftool/dbconfig/20230727-135648-ladsgroup.json
  • 13:55 fabfur: done restarting lvs6002 (T335835)
  • 13:55 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6002.drmrs.wmnet
  • 13:52 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6002.drmrs.wmnet
  • 13:49 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/benthos-cache-invalidator: apply
  • 13:49 kamila@deploy1002: helmfile [staging] START helmfile.d/services/benthos-cache-invalidator: apply
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T342617)', diff saved to https://phabricator.wikimedia.org/P49754 and previous config saved to /var/cache/conftool/dbconfig/20230727-134141-ladsgroup.json
  • 13:32 fabfur: begin restarting lvs6002 (T335835)
  • 13:18 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:17 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T342617)', diff saved to https://phabricator.wikimedia.org/P49752 and previous config saved to /var/cache/conftool/dbconfig/20230727-131733-ladsgroup.json
  • 13:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T342617)', diff saved to https://phabricator.wikimedia.org/P49751 and previous config saved to /var/cache/conftool/dbconfig/20230727-131712-ladsgroup.json
  • 13:15 fabfur: done restarting lvs6001 (T335835)
  • 13:15 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6001.drmrs.wmnet
  • 13:12 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6001.drmrs.wmnet
  • 13:11 samtar@deploy1002: Finished scap: Backport for Re-enable PC writes for parsoid endpoints (T339867) (duration: 07m 02s)
  • 13:05 samtar@deploy1002: samtar and daniel: Backport for Re-enable PC writes for parsoid endpoints (T339867) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:04 samtar@deploy1002: Started scap: Backport for Re-enable PC writes for parsoid endpoints (T339867)
  • 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P49750 and previous config saved to /var/cache/conftool/dbconfig/20230727-130206-ladsgroup.json
  • 12:54 fabfur: begin restarting lvs6001 (T335835)
  • 12:53 fabfur: done restarting lvs6003 (T335835)
  • 12:48 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6003.drmrs.wmnet
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P49749 and previous config saved to /var/cache/conftool/dbconfig/20230727-124700-ladsgroup.json
  • 12:45 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs6003.drmrs.wmnet
  • 12:43 fabfur: begin restarting lvs6003 (T335835)
  • 12:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T342617)', diff saved to https://phabricator.wikimedia.org/P49748 and previous config saved to /var/cache/conftool/dbconfig/20230727-123153-ladsgroup.json
  • 12:08 jynus: systemctl stop mariadb@s1 @ db2097
  • 12:07 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Also add square logo for Vector-2022 (duration: 07m 05s)
  • 12:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T342617)', diff saved to https://phabricator.wikimedia.org/P49747 and previous config saved to /var/cache/conftool/dbconfig/20230727-120710-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:01 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Also add square logo for Vector-2022 synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:00 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Also add square logo for Vector-2022
  • 12:00 ladsgroup@deploy1002: Finished scap: Backport for rdbms: Avoid making wasteful memcached calls in CP (T314434) (duration: 08m 54s)
  • 11:52 ladsgroup@deploy1002: ladsgroup: Backport for rdbms: Avoid making wasteful memcached calls in CP (T314434) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:51 ladsgroup@deploy1002: Started scap: Backport for rdbms: Avoid making wasteful memcached calls in CP (T314434)
  • 11:50 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudnet2005-dev.private.codfw.wikimedia.cloud on all recursors
  • 11:50 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache cloudnet2005-dev.private.codfw.wikimedia.cloud on all recursors
  • 11:50 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudnet2006-dev.private.codfw.wikimedia.cloud on all recursors
  • 11:50 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache cloudnet2006-dev.private.codfw.wikimedia.cloud on all recursors
  • 11:50 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:50 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet2005-dev/2006-dev - aborrero@cumin1001"
  • 11:49 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet2005-dev/2006-dev - aborrero@cumin1001"
  • 11:48 jforrester@deploy1002: Finished scap: Backport for Wikifunctions: Add logo, wordmark (duration: 08m 35s)
  • 11:47 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:42 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudnet1005.private.eqiad.wikimedia.cloud on all recursors
  • 11:42 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache cloudnet1005.private.eqiad.wikimedia.cloud on all recursors
  • 11:41 jforrester@deploy1002: jforrester: Backport for Wikifunctions: Add logo, wordmark synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:39 jforrester@deploy1002: Started scap: Backport for Wikifunctions: Add logo, wordmark
  • 11:37 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 11:31 ladsgroup@deploy1002: backport Cancelled
  • 11:30 ladsgroup@deploy1002: Finished scap: Backport for CentralAuthUser: Don't load user information unless needed (duration: 07m 47s)
  • 11:24 ladsgroup@deploy1002: ladsgroup: Backport for CentralAuthUser: Don't load user information unless needed synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 11:22 ladsgroup@deploy1002: Started scap: Backport for CentralAuthUser: Don't load user information unless needed
  • 11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 11:12 fabfur: done restarting lvs3006 (T335835)
  • 11:03 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3006.esams.wmnet
  • 11:00 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3006.esams.wmnet
  • 10:53 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet1005/1006 - aborrero@cumin1001"
  • 10:52 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudnet1005/1006 - aborrero@cumin1001"
  • 10:50 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 10:50 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet1005
  • 10:50 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet1005
  • 10:50 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudnet1006
  • 10:50 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudnet1006
  • 10:37 taavi: purge edge caches for "https://wikifunctions.org/"
  • 10:35 fabfur: begin restarting lvs3006 (T335835)
  • 10:34 fabfur: done restarting lvs3005 (T335835)
  • 10:33 kevinbazira@deploy1002: Finished deploy [ores/deploy@c30920f]: T342118 (duration: 09m 04s)
  • 10:24 kevinbazira@deploy1002: Started deploy [ores/deploy@c30920f]: T342118
  • 10:14 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3005.esams.wmnet
  • 10:12 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3005.esams.wmnet
  • 09:56 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 09:54 fabfur: begin restarting lvs3005 (T335835)
  • 09:44 fabfur: done restarting lvs3007 (T335835)
  • 09:42 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3007.esams.wmnet
  • 09:40 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs3007.esams.wmnet
  • 09:38 fabfur: begin restarting lvs3007 (T335835)
  • 09:20 urbanecm: Run `mwscript extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php --wiki=frwiki --page="Sensibilité électromagnétique" --force` to debug T342488
  • 09:12 fabfur: done restarting lvs1019 (T335835)
  • 09:11 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1019.eqiad.wmnet
  • 09:07 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1019.eqiad.wmnet
  • 08:42 fabfur: begin restarting lvs1019 (T335835)
  • 08:34 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 08:16 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.19 refs T340247
  • 07:54 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 07:54 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 07:54 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 07:54 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 07:40 XioNoX: reboot lsw1-a1-codfw (test device)
  • 06:53 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 06:39 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 06:38 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 06:36 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 06:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 05:57 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 05:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 05:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 05:26 oblivian@deploy1002: Started scap: (no justification provided)
  • 05:26 _joe_: scap is not syncing; just rebuilding the image from scratch to verify the reason for a bug.
  • 05:22 oblivian@deploy1002: Started scap: (no justification provided)
  • 03:19 cstone: payments-wiki upgraded from 2a68dfe2 to 1a6ca7ab
  • 03:04 eileen: civicrm upgraded from 5a84b138 to 16c2e58a
  • 00:54 eileen: civicrm upgraded from 68f29b70 to 5a84b138
  • 00:51 eileen: civicrm upgraded from 853c14f3 to 68f29b70
  • 00:20 eileen: rollback because I got an error when I tried to view - so let's see
  • 00:20 eileen: civicrm rolled back from 68f29b70 to 853c14f3 (locked)
  • 00:17 eileen: civicrm upgraded from 853c14f3 to 68f29b70

2023-07-26

  • 23:01 jforrester@deploy1002: Synchronized wmf-config/interwiki.php: Update interwiki cache now that wikifunctions is here (duration: 06m 52s)
  • 21:53 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wcqs2001.codfw.wmnet
  • 21:46 bking@cumin1001: START - Cookbook sre.hosts.reboot-single for host wcqs2001.codfw.wmnet
  • 21:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49745 and previous config saved to /var/cache/conftool/dbconfig/20230726-212310-ladsgroup.json
  • 21:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P49744 and previous config saved to /var/cache/conftool/dbconfig/20230726-210804-ladsgroup.json
  • 21:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
  • 21:04 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
  • 21:00 taavi: manually attach User:WikiLambda_system to SUL T342811
  • 20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P49743 and previous config saved to /var/cache/conftool/dbconfig/20230726-205257-ladsgroup.json
  • 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49742 and previous config saved to /var/cache/conftool/dbconfig/20230726-203751-ladsgroup.json
  • 20:34 taavi@deploy1002: Finished scap: Backport for clienthints: Start collecting client hints data on testwiki (T341110), CheckUser event table migration: Write new on group0 (T330158) (duration: 26m 17s)
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T342617)', diff saved to https://phabricator.wikimedia.org/P49741 and previous config saved to /var/cache/conftool/dbconfig/20230726-201554-ladsgroup.json
  • 20:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 20:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49740 and previous config saved to /var/cache/conftool/dbconfig/20230726-201533-ladsgroup.json
  • 20:09 taavi@deploy1002: dreamyjazz and taavi: Backport for clienthints: Start collecting client hints data on testwiki (T341110), CheckUser event table migration: Write new on group0 (T330158) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD
  • 20:08 taavi@deploy1002: Started scap: Backport for clienthints: Start collecting client hints data on testwiki (T341110), CheckUser event table migration: Write new on group0 (T330158)
  • 20:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P49739 and previous config saved to /var/cache/conftool/dbconfig/20230726-200026-ladsgroup.json
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P49738 and previous config saved to /var/cache/conftool/dbconfig/20230726-194520-ladsgroup.json
  • 19:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49737 and previous config saved to /var/cache/conftool/dbconfig/20230726-193014-ladsgroup.json
  • 18:48 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:47 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:45 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:44 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49736 and previous config saved to /var/cache/conftool/dbconfig/20230726-184430-ladsgroup.json
  • 18:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 18:44 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 18:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49735 and previous config saved to /var/cache/conftool/dbconfig/20230726-184408-ladsgroup.json
  • 18:43 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:37 jforrester@deploy1002: Synchronized wmf-config/: Last fixes for initial wikifunctions.org, he says (duration: 06m 44s)
  • 18:37 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:36 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:34 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:34 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:34 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:33 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P49734 and previous config saved to /var/cache/conftool/dbconfig/20230726-182902-ladsgroup.json
  • 18:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P49732 and previous config saved to /var/cache/conftool/dbconfig/20230726-181356-ladsgroup.json
  • 18:13 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:12 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:12 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:10 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:10 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:09 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49731 and previous config saved to /var/cache/conftool/dbconfig/20230726-175850-ladsgroup.json
  • 17:49 jforrester@deploy1002: Finished scap: Hopefully final update for wikifunctions.org initial config (duration: 07m 30s)
  • 17:41 jforrester@deploy1002: Started scap: Hopefully final update for wikifunctions.org initial config
  • 17:37 jforrester@deploy1002: Finished scap: Backport for wgNoFollowDomainExceptions: Add wikifunctions.org (T275945) (duration: 11m 27s)
  • 17:27 jforrester@deploy1002: jforrester: Backport for wgNoFollowDomainExceptions: Add wikifunctions.org (T275945) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 17:25 jforrester@deploy1002: Started scap: Backport for wgNoFollowDomainExceptions: Add wikifunctions.org (T275945)
  • 17:13 jforrester@deploy1002: Finished scap: Backport for MWMultiVersion: Alert this code to wikifunctions.org existing (T275945) (duration: 08m 40s)
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T342617)', diff saved to https://phabricator.wikimedia.org/P49730 and previous config saved to /var/cache/conftool/dbconfig/20230726-171244-ladsgroup.json
  • 17:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 17:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49729 and previous config saved to /var/cache/conftool/dbconfig/20230726-171223-ladsgroup.json
  • 17:06 jforrester@deploy1002: jforrester: Backport for MWMultiVersion: Alert this code to wikifunctions.org existing (T275945) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 17:04 jforrester@deploy1002: Started scap: Backport for MWMultiVersion: Alert this code to wikifunctions.org existing (T275945)
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P49728 and previous config saved to /var/cache/conftool/dbconfig/20230726-165717-ladsgroup.json
  • 16:53 jforrester@deploy1002: Finished scap: Backport for docroot: Add wikifunctions.org (T275945) (duration: 08m 05s)
  • 16:47 jforrester@deploy1002: jforrester: Backport for docroot: Add wikifunctions.org (T275945) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:45 jforrester@deploy1002: Started scap: Backport for docroot: Add wikifunctions.org (T275945)
  • 16:44 fabfur: end reboot of lvs1018 (T335835)
  • 16:43 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1018.eqiad.wmnet
  • 16:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P49727 and previous config saved to /var/cache/conftool/dbconfig/20230726-164211-ladsgroup.json
  • 16:40 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1018.eqiad.wmnet
  • 16:27 jforrester@deploy1002: Finished scap: Initial deploy of wikifunctionswiki in locked-down mode for T275945 (duration: 07m 49s)
  • 16:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49726 and previous config saved to /var/cache/conftool/dbconfig/20230726-162705-ladsgroup.json
  • 16:20 jforrester@deploy1002: Started scap: Initial deploy of wikifunctionswiki in locked-down mode for T275945
  • 16:18 fabfur: begin reboot of lvs1018 (T335835)
  • 16:15 jforrester@deploy1002: Finished scap: Backport for Add wikifunctions.org to prod wgLocalVirtualHosts (T275945) (duration: 09m 07s)
  • 16:08 jforrester@deploy1002: jforrester: Backport for Add wikifunctions.org to prod wgLocalVirtualHosts (T275945) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:07 fabfur: end reboot of lvs1017 (T335835)
  • 16:06 jforrester@deploy1002: Started scap: Backport for Add wikifunctions.org to prod wgLocalVirtualHosts (T275945)
  • 16:03 jnuche@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.19 refs T340247 (duration: 06m 56s)
  • 15:56 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.19 refs T340247
  • 15:55 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1017.eqiad.wmnet
  • 15:52 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1017.eqiad.wmnet
  • 15:47 jforrester@deploy1002: Finished scap: Backport for Load RescoreFunctions from the ExtensionRegistry (T342744) (duration: 09m 39s)
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T342617)', diff saved to https://phabricator.wikimedia.org/P49725 and previous config saved to /var/cache/conftool/dbconfig/20230726-154245-ladsgroup.json
  • 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T342617)', diff saved to https://phabricator.wikimedia.org/P49724 and previous config saved to /var/cache/conftool/dbconfig/20230726-154209-ladsgroup.json
  • 15:39 jforrester@deploy1002: dcausse and jforrester: Backport for Load RescoreFunctions from the ExtensionRegistry (T342744) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 15:38 derick@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 15:37 jforrester@deploy1002: Started scap: Backport for Load RescoreFunctions from the ExtensionRegistry (T342744)
  • 15:37 derick@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 15:37 derick@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 15:35 derick@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 15:34 derick@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 15:34 derick@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 15:32 derick@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 15:32 derick@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 15:30 fabfur: begin reboot of lvs1017 (T335835)
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P49723 and previous config saved to /var/cache/conftool/dbconfig/20230726-152703-ladsgroup.json
  • 15:26 fabfur: end reboot of lvs1020 (T335835)
  • 15:25 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
  • 15:21 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
  • 15:20 fabfur: begin reboot of lvs1020 (T335835)
  • 15:17 fabfur: end reboot of lvs4009 (T335835)
  • 15:13 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4009.ulsfo.wmnet
  • 15:13 ladsgroup@deploy1002: Finished scap: Backport for ores-extension: enable lw on eswikiquotes and eswikibooks (T342115) (duration: 14m 06s)
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P49722 and previous config saved to /var/cache/conftool/dbconfig/20230726-151157-ladsgroup.json
  • 15:10 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4009.ulsfo.wmnet
  • 15:00 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores-extension: enable lw on eswikiquotes and eswikibooks (T342115) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:59 ladsgroup@deploy1002: Started scap: Backport for ores-extension: enable lw on eswikiquotes and eswikibooks (T342115)
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T342617)', diff saved to https://phabricator.wikimedia.org/P49721 and previous config saved to /var/cache/conftool/dbconfig/20230726-145651-ladsgroup.json
  • 14:49 fabfur: begin reboot of lvs4009 (T335835)
  • 14:38 jforrester@deploy1002: Finished scap: Backport for Normalize the skin name when it comes from preferences or useskin (T342733) (duration: 08m 24s)
  • 14:34 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 14:33 hnowlan: enabling puppet on A:cp to deploy r/941440
  • 14:32 jforrester@deploy1002: jforrester: Backport for Normalize the skin name when it comes from preferences or useskin (T342733) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:31 kuncung: test
  • 14:30 jforrester@deploy1002: Started scap: Backport for Normalize the skin name when it comes from preferences or useskin (T342733)
  • 14:28 fabfur: end reboot of lvs4008 (T335835)
  • 14:27 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4008.ulsfo.wmnet
  • 14:24 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4008.ulsfo.wmnet
  • 14:19 hnowlan: disabling puppet on A:cp to deploy r/941440
  • 14:19 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 14:13 urbanecm@deploy1002: Finished scap: Backport for Revert "specials: Use cross-wiki aware UserIdentityLookup on Special:UserRights" (T255309 T342747) (duration: 12m 33s)
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2151 (T342617)', diff saved to https://phabricator.wikimedia.org/P49720 and previous config saved to /var/cache/conftool/dbconfig/20230726-141228-ladsgroup.json
  • 14:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 14:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 14:02 urbanecm@deploy1002: urbanecm: Backport for Revert "specials: Use cross-wiki aware UserIdentityLookup on Special:UserRights" (T255309 T342747) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:00 urbanecm@deploy1002: Started scap: Backport for Revert "specials: Use cross-wiki aware UserIdentityLookup on Special:UserRights" (T255309 T342747)
  • 14:00 fabfur: begin reboot of lvs4008 (T335835)
  • 13:55 fabfur: end reboot of lvs4010 (T335835)
  • 13:52 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
  • 13:50 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
  • 13:46 fabfur: begin reboot of lvs4010 (T335835)
  • 13:34 jforrester@deploy1002: Finished scap: Backport for Add stream config for iOS schema (T341896) (duration: 20m 16s)
  • 13:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T342617)', diff saved to https://phabricator.wikimedia.org/P49719 and previous config saved to /var/cache/conftool/dbconfig/20230726-133104-ladsgroup.json
  • 13:23 jforrester@deploy1002: jforrester and tsev: Backport for Add stream config for iOS schema (T341896) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:23 fab@deploy1002: Finished deploy [airflow-dags/research@e7b9253]: (no justification provided) (duration: 00m 07s)
  • 13:22 fab@deploy1002: Started deploy [airflow-dags/research@e7b9253]: (no justification provided)
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P49718 and previous config saved to /var/cache/conftool/dbconfig/20230726-131557-ladsgroup.json
  • 13:14 jforrester@deploy1002: Started scap: Backport for Add stream config for iOS schema (T341896)
  • 13:13 jforrester@deploy1002: sync-world aborted: Backport for Add stream config for iOS schema (T341896) (duration: 11m 00s)
  • 13:05 James_F: Created cu_useragent_clienthints.sql and cu_useragent_clienthints_map.sql on testwiki for T258105
  • 13:02 jforrester@deploy1002: Started scap: Backport for Add stream config for iOS schema (T341896)
  • 13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P49717 and previous config saved to /var/cache/conftool/dbconfig/20230726-130051-ladsgroup.json
  • 12:52 jforrester@deploy1002: Synchronized php-1.41.0-wmf.19/extensions/WikiLambda/: Update WikiLambda wmf.19 branch to latest ahead of wikifunctions.org roll-out (duration: 07m 10s)
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T342617)', diff saved to https://phabricator.wikimedia.org/P49716 and previous config saved to /var/cache/conftool/dbconfig/20230726-124545-ladsgroup.json
  • 12:41 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
  • 12:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
  • 12:36 jforrester@deploy1002: Finished scap: Backport for ProductionServices: Define the wikifunctions orchestrator access point (T297314) (duration: 07m 39s)
  • 12:30 jforrester@deploy1002: jforrester: Backport for ProductionServices: Define the wikifunctions orchestrator access point (T297314) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:28 jforrester@deploy1002: Started scap: Backport for ProductionServices: Define the wikifunctions orchestrator access point (T297314)
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T342617)', diff saved to https://phabricator.wikimedia.org/P49714 and previous config saved to /var/cache/conftool/dbconfig/20230726-115528-ladsgroup.json
  • 11:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T342617)', diff saved to https://phabricator.wikimedia.org/P49713 and previous config saved to /var/cache/conftool/dbconfig/20230726-115507-ladsgroup.json
  • 11:50 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 11:48 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 11:48 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 11:47 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 11:46 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:45 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:40 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P49712 and previous config saved to /var/cache/conftool/dbconfig/20230726-114001-ladsgroup.json
  • 11:32 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1016.eqiad.wmnet with OS bullseye
  • 11:27 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bullseye
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P49711 and previous config saved to /var/cache/conftool/dbconfig/20230726-112454-ladsgroup.json
  • 11:19 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.19 refs T340247
  • 11:14 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1016.eqiad.wmnet with OS bullseye
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T342617)', diff saved to https://phabricator.wikimedia.org/P49710 and previous config saved to /var/cache/conftool/dbconfig/20230726-110948-ladsgroup.json
  • 11:05 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts releases1002.eqiad.wmnet
  • 11:05 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:05 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: releases1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 11:03 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: releases1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 11:01 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 10:57 eoghan@cumin1001: START - Cookbook sre.hosts.decommission for hosts releases1002.eqiad.wmnet
  • 10:56 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts releases2002.codfw.wmnet
  • 10:56 eoghan@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:56 eoghan@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: releases2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 10:56 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bullseye
  • 10:55 eoghan@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: releases2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - eoghan@cumin1001"
  • 10:53 eoghan@cumin1001: START - Cookbook sre.dns.netbox
  • 10:31 eoghan@cumin1001: START - Cookbook sre.hosts.decommission for hosts releases2002.codfw.wmnet
  • 10:27 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T342617)', diff saved to https://phabricator.wikimedia.org/P49708 and previous config saved to /var/cache/conftool/dbconfig/20230726-102232-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 10:17 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 09:59 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-eqiad cluster: Roll restart of jvm daemons.
  • 08:36 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 08:34 jnuche@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.19 refs T340247 (duration: 19m 56s)
  • 08:14 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.19 refs T340247
  • 08:01 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 07:58 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:56 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:52 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • off: updating bookworm netboot image for point release 12.1 ( https://wikitech.wikimedia.org/wiki/Updating_netboot_image_with_newer_kernel#Updating_production_point_release )
  • 07:46 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:37 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
  • 07:37 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 07:36 volans@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bullseye
  • 07:36 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 07:35 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 07:35 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 07:30 volans@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 07:18 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 07:07 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 06:48 oblivian@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:47 oblivian@cumin1001: START - Cookbook sre.dns.netbox
  • 06:37 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts xhgui2001.codfw.wmnet,xhgui1001.eqiad.wmnet
  • 06:37 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:37 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui2001.codfw.wmnet,xhgui1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 06:34 marostegui: Stop mariadb on clouddb1021 T334651
  • 06:33 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui2001.codfw.wmnet,xhgui1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 06:31 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 06:30 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 06:26 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 06:25 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 06:24 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 06:21 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts xhgui2001.codfw.wmnet,xhgui1001.eqiad.wmnet
  • 06:18 oblivian@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 06:17 oblivian@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 06:17 oblivian@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 06:15 oblivian@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 01:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye

2023-07-25

  • 22:52 eileen: revision c62433ab -> 8689d10d
  • 21:18 zabe@deploy1002: Finished scap: Backport for Create UserIdentityValue with correct wiki (T342655), Create UserIdentityValue with correct wiki (T342655) (duration: 10m 06s)
  • 21:10 zabe@deploy1002: zabe: Backport for Create UserIdentityValue with correct wiki (T342655), Create UserIdentityValue with correct wiki (T342655) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 21:08 zabe@deploy1002: Started scap: Backport for Create UserIdentityValue with correct wiki (T342655), Create UserIdentityValue with correct wiki (T342655)
  • 21:02 zabe@deploy1002: Finished scap: update interwiki cache, gerrit:941057 (duration: 07m 20s)
  • 20:55 zabe@deploy1002: Started scap: update interwiki cache, gerrit:941057
  • 20:53 zabe@deploy1002: Finished scap: T335216 (duration: 08m 24s)
  • 20:46 zabe@deploy1002: zabe: T335216 synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:44 zabe@deploy1002: Started scap: T335216
  • 20:42 zabe: create Wiktionary Mandailing # T335216
  • 20:33 taavi@deploy1002: Finished scap: Backport for Fix text showing on icon only buttons (duration: 12m 08s)
  • 20:23 taavi@deploy1002: taavi and bwang: Backport for Fix text showing on icon only buttons synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:21 taavi@deploy1002: Started scap: Backport for Fix text showing on icon only buttons
  • 18:24 dwisehaupt@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:24 dwisehaupt@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"
  • 18:23 dwisehaupt@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"
  • 18:21 dwisehaupt@cumin1001: START - Cookbook sre.dns.netbox
  • 18:21 sukhe: dummy authdns-update returns
  • 18:11 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4003.wikimedia.org
  • 18:06 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns4003.wikimedia.org
  • 17:56 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns4004.wikimedia.org
  • 17:51 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns4004.wikimedia.org
  • 16:56 sukhe: dummy authdns-update
  • 16:51 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:51 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 16:49 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6002.wikimedia.org
  • 16:45 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns6002.wikimedia.org
  • 16:41 fabfur: end rebooting lvs5005 (T335835)
  • 16:41 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5005.eqsin.wmnet
  • 16:40 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 16:37 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5005.eqsin.wmnet
  • 16:30 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns6001.wikimedia.org
  • 16:26 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns6001.wikimedia.org
  • 16:19 fabfur: begin rebooting lvs5005 (T335835)
  • 15:57 dancy@deploy1002: Finished deploy [releng/jenkins-deploy@97b4674] (releasing): (no justification provided) (duration: 33m 26s)
  • 15:36 fabfur: lvs5004 restarted and services are reactivating (T335835)
  • 15:28 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5004.eqsin.wmnet
  • 15:26 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:25 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5004.eqsin.wmnet
  • 15:25 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5004.eqsin.wmnet
  • 15:25 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5004.eqsin.wmnet
  • 15:25 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:24 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:23 dancy@deploy1002: Started deploy [releng/jenkins-deploy@97b4674] (releasing): (no justification provided)
  • 15:22 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:21 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:17 _joe_: removing all tags for docker image openjdk-8-jre T341115
  • 15:16 zabe@deploy1002: Finished scap: Backport for Add namespace translations for Mandailing (btm) (T335217), Add namespace translations for Mandailing (btm) (T335217) (duration: 07m 51s)
  • 15:14 _joe_: removing all tags for docker image openjdk-8-jdk T341115
  • 15:10 zabe@deploy1002: zabe: Backport for Add namespace translations for Mandailing (btm) (T335217), Add namespace translations for Mandailing (btm) (T335217) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 15:09 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:08 zabe@deploy1002: Started scap: Backport for Add namespace translations for Mandailing (btm) (T335217), Add namespace translations for Mandailing (btm) (T335217)
  • 14:59 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:59 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:59 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:59 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:58 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-eqiad cluster: Roll restart of jvm daemons.
  • 14:58 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:58 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:58 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 14:58 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:52 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 14:47 damilare: SmashPig upgraded from 9ee24eef to f40badde
  • 14:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 14:45 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 14:45 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 14:45 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 14:43 fabfur: begin rebooting lvs5004 (T335835)
  • 14:35 fabfur: lvs5006 rebooted and services restarted (T335835)
  • 14:33 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5006.eqsin.wmnet
  • 14:30 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 14:30 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
  • 14:30 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5006.eqsin.wmnet
  • 14:29 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
  • 14:29 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5006.eqsin.wmnet
  • 14:29 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
  • 14:29 hnowlan: disabling puppet on A:cp for rollout of r/941405
  • 14:28 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5006.eqsin.wmnet
  • 14:28 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
  • 14:27 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host lvs5006.eqsin.wmnet
  • 14:26 fabfur@cumin1001: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
  • 14:24 fabfur: start stopping services and rebooting lvs5006 (T335835)
  • 14:12 damilare: SmashPig upgraded from a9156920 to 9ee24eef
  • 14:02 urbanecm@deploy1002: Finished scap: Backport for Enable write new on testwiki for CheckUser event tables migration (T330158) (duration: 22m 27s)
  • 14:00 sukhe: rolling out pdns-recursor update on A:dns-rec
  • 13:42 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15 days, 0:00:00 on parse1002.eqiad.wmnet with reason: T339340 - hw troubleshooting
  • 13:42 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on parse1002.eqiad.wmnet with reason: T339340 - hw troubleshooting
  • 13:41 urbanecm@deploy1002: urbanecm and dreamyjazz: Backport for Enable write new on testwiki for CheckUser event tables migration (T330158) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "re-run to fix mw1486 - cgoubert@cumin1001"
  • 13:40 cgoubert@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "re-run to fix mw1486 - cgoubert@cumin1001"
  • 13:40 urbanecm@deploy1002: Started scap: Backport for Enable write new on testwiki for CheckUser event tables migration (T330158)
  • 13:38 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1486.eqiad.wmnet with OS buster
  • 13:38 cgoubert@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - cgoubert@cumin1001"
  • 13:38 urbanecm@deploy1002: Finished scap: Backport for Add support for writing both new and old to Hooks.php (T341934 T341586), Follow-up: Add support for writing both new and old to Hooks.php (T341586) (duration: 07m 28s)
  • 13:30 urbanecm@deploy1002: Started scap: Backport for Add support for writing both new and old to Hooks.php (T341934 T341586), Follow-up: Add support for writing both new and old to Hooks.php (T341586)
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T342617)', diff saved to https://phabricator.wikimedia.org/P49704 and previous config saved to /var/cache/conftool/dbconfig/20230725-132121-ladsgroup.json
  • 13:20 godog: powercycle parse1002 - T339340
  • 13:17 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-main-codfw cluster: Roll restart of jvm daemons.
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P49702 and previous config saved to /var/cache/conftool/dbconfig/20230725-130615-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114', diff saved to https://phabricator.wikimedia.org/P49701 and previous config saved to /var/cache/conftool/dbconfig/20230725-125109-ladsgroup.json
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2114 (T342617)', diff saved to https://phabricator.wikimedia.org/P49700 and previous config saved to /var/cache/conftool/dbconfig/20230725-123602-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2114 (T342617)', diff saved to https://phabricator.wikimedia.org/P49699 and previous config saved to /var/cache/conftool/dbconfig/20230725-120641-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 11:49 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:49 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:48 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:48 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:48 akosiaris@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:47 akosiaris@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:46 akosiaris@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:45 akosiaris@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:37 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:37 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack - aborrero@cumin1001"
  • 11:36 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: openstack - aborrero@cumin1001"
  • 11:34 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:32 akosiaris: T340087 wikidiff2 rollout done. 1 host is unreachable and will need to be reimaged or upgraded manually to pick this up, parse1002.eqiad.wmnet
  • 11:30 akosiaris: T340087 starting wikidiff2 1.41.1 rollout to eqiad. codfw already done.
  • 11:28 akosiaris: restart php on mw1457
  • 11:25 akosiaris: T340087 keep a copy php-wikidiff2_1.13.0-1_amd64.deb in apt1001:/home/akosiaris/wd/ in case of emergency
  • 11:24 akosiaris: T340087 starting wikidiff2 1.41.1 rollout to codfw
  • 10:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 31 days, 0:00:00 on lvs[1013-1015].eqiad.wmnet with reason: test hosts
  • 10:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 31 days, 0:00:00 on lvs[1013-1015].eqiad.wmnet with reason: test hosts
  • 09:50 elukey: restart kafka on kafka-main1001 to pick up the new changes - T341558
  • 09:47 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-main1001.eqiad.wmnet with reason: Apply a new setting to the Kafka broker
  • 09:46 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-main1001.eqiad.wmnet with reason: Apply a new setting to the Kafka broker
  • 09:06 slyngs: Restart Tomcat / Apereo CAS on idp1002
  • 09:01 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.19 refs T340247
  • 08:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 08:59 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 08:51 jnuche@deploy1002: Pruned MediaWiki: 1.41.0-wmf.17 (duration: 02m 11s)
  • 08:49 jnuche@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.19 refs T340247 (duration: 52m 35s)
  • 08:35 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-main1001.eqiad.wmnet with reason: Apply a new setting to the Kafka broker
  • 08:35 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-main1001.eqiad.wmnet with reason: Apply a new setting to the Kafka broker
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49696 and previous config saved to /var/cache/conftool/dbconfig/20230725-080326-root.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49695 and previous config saved to /var/cache/conftool/dbconfig/20230725-080315-root.json
  • 07:57 jnuche@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.19 refs T340247
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49694 and previous config saved to /var/cache/conftool/dbconfig/20230725-074821-root.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49693 and previous config saved to /var/cache/conftool/dbconfig/20230725-074810-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49692 and previous config saved to /var/cache/conftool/dbconfig/20230725-073317-root.json
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49691 and previous config saved to /var/cache/conftool/dbconfig/20230725-073305-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49690 and previous config saved to /var/cache/conftool/dbconfig/20230725-071812-root.json
  • 07:18 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49689 and previous config saved to /var/cache/conftool/dbconfig/20230725-071801-root.json
  • 07:03 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49688 and previous config saved to /var/cache/conftool/dbconfig/20230725-070307-root.json
  • 07:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49687 and previous config saved to /var/cache/conftool/dbconfig/20230725-070256-root.json
  • 06:48 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49686 and previous config saved to /var/cache/conftool/dbconfig/20230725-064802-root.json
  • 06:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49685 and previous config saved to /var/cache/conftool/dbconfig/20230725-064751-root.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49684 and previous config saved to /var/cache/conftool/dbconfig/20230725-063258-root.json
  • 06:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49683 and previous config saved to /var/cache/conftool/dbconfig/20230725-063247-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3316 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49682 and previous config saved to /var/cache/conftool/dbconfig/20230725-061753-root.json
  • 06:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1213:3315 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49681 and previous config saved to /var/cache/conftool/dbconfig/20230725-061742-root.json
  • 06:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1213 (s5, s6)', diff saved to https://phabricator.wikimedia.org/P49680 and previous config saved to /var/cache/conftool/dbconfig/20230725-061319-root.json
  • 06:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wdqs[2004-2006].codfw.wmnet
  • 06:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:10 ryankemper@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 05:52 ryankemper@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wdqs[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin1001"
  • 05:46 ryankemper@cumin1001: START - Cookbook sre.dns.netbox
  • 05:09 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs[2004-2006].codfw.wmnet
  • 05:08 ryankemper@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts wdqs[2004-2006].codfw.wmnet
  • 04:56 ryankemper@cumin1001: START - Cookbook sre.hosts.decommission for hosts wdqs[2004-2006].codfw.wmnet
  • 03:47 eileen: civicrm upgraded from ad642712 to 853c14f3
  • 03:31 eileen: civicrm upgraded from d7c8d77e to ad642712 (back to head as the rollback didn't do anything)
  • 03:11 eileen: civicrm changed from ad642712 to d7c8d77e (locked)
  • 02:59 eileen: civicrm upgraded from 6fd25bf6 to ad642712
  • 01:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1014.eqiad.wmnet with OS bullseye
  • 01:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
  • 01:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
  • 00:50 wfan: civicrm upgraded from d7c8d77e to 6fd25bf6
  • 00:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye

2023-07-24

  • 23:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
  • 22:46 zabe@deploy1002: Finished scap: Backport for client: Avoid dynamically registering hook handlers (T341102), HookContainer: avoid instantiation of handlers when calling register() (T341102 T340113 T339834) (duration: 09m 59s)
  • 22:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
  • 22:37 zabe@deploy1002: zabe: Backport for client: Avoid dynamically registering hook handlers (T341102), HookContainer: avoid instantiation of handlers when calling register() (T341102 T340113 T339834) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experime
  • 22:36 zabe@deploy1002: Started scap: Backport for client: Avoid dynamically registering hook handlers (T341102), HookContainer: avoid instantiation of handlers when calling register() (T341102 T340113 T339834)
  • 22:16 jgleeson: civiproxy upgraded from 99cecb92 to c000fc1e
  • 21:28 maryum: Deployed patch for T341565
  • 21:14 sbassett: Deployed updated mitigation for T336027
  • 20:04 dancy@deploy1002: Installing scap version "4.56.0" for 605 hosts
  • 19:29 krinkle@deploy1002: Synchronized lib/: Iaa0cb0c75d4 (duration: 06m 21s)
  • 19:21 krinkle@deploy1002: Synchronized src/Profiler.php: Idada376134 (duration: 06m 30s)
  • 18:09 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 18:08 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 18:08 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 18:07 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 18:07 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 18:06 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:39 sukhe: restart ATS to pick up CR 940953: T339134
  • 16:00 topranks: Re-enabling disabled transport from knams to esams after fiber cleaning T337997
  • 14:53 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1014.eqiad.wmnet with OS bullseye
  • 14:44 vgutierrez: Repooling cp4052 (upload) running ATS 9.2.1 - T339134
  • 14:37 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:36 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:36 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:36 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:34 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:30 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:30 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:29 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:28 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:23 samtar@deploy1002: Finished scap: Backport for Revert "Revert "Run a synthetic test for client side preferences"" (T336527 T339268) (duration: 14m 05s)
  • 14:11 samtar@deploy1002: samtar: Backport for Revert "Revert "Run a synthetic test for client side preferences"" (T336527 T339268) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 14:09 samtar@deploy1002: Started scap: Backport for Revert "Revert "Run a synthetic test for client side preferences"" (T336527 T339268)
  • 14:07 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
  • 14:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49677 and previous config saved to /var/cache/conftool/dbconfig/20230724-140226-root.json
  • 14:02 samtar@deploy1002: Finished scap: Backport for Revert "Run a synthetic test for client side preferences" (duration: 07m 20s)
  • 14:00 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
  • 13:56 samtar@deploy1002: samtar: Backport for Revert "Run a synthetic test for client side preferences" synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49676 and previous config saved to /var/cache/conftool/dbconfig/20230724-135604-root.json
  • 13:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49675 and previous config saved to /var/cache/conftool/dbconfig/20230724-135557-root.json
  • 13:54 samtar@deploy1002: Started scap: Backport for Revert "Run a synthetic test for client side preferences"
  • 13:52 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 13:52 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 13:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:51 TheresNoTime: close UTC afternoon backport window
  • 13:51 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
  • 13:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:49 TheresNoTime: gerrit:939312 not synced. T336527 T339268
  • 13:48 samtar@deploy1002: Sync cancelled.
  • 13:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49674 and previous config saved to /var/cache/conftool/dbconfig/20230724-134721-root.json
  • 13:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1014.eqiad.wmnet with OS bullseye
  • 13:44 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49673 and previous config saved to /var/cache/conftool/dbconfig/20230724-134059-root.json
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49672 and previous config saved to /var/cache/conftool/dbconfig/20230724-134052-root.json
  • 13:38 taavi: run `taavi@mwmaint1002 ~ $ mwscript namespaceDupes.php mywiktionary --fix` after purging null editing page #131577 for T342516
  • 13:34 samtar@deploy1002: samtar and mabualruz: Backport for Run a synthetic test for client side preferences (T336527 T339268) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:33 samtar@deploy1002: Started scap: Backport for Run a synthetic test for client side preferences (T336527 T339268)
  • 13:32 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49671 and previous config saved to /var/cache/conftool/dbconfig/20230724-133217-root.json
  • 13:31 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 13:31 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 13:30 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
  • 13:30 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
  • 13:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1013.eqiad.wmnet with OS bullseye
  • 13:28 jclark@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host rdb1014.eqiad.wmnet with OS bullseye
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49669 and previous config saved to /var/cache/conftool/dbconfig/20230724-132555-root.json
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49668 and previous config saved to /var/cache/conftool/dbconfig/20230724-132548-root.json
  • 13:25 TheresNoTime: `[samtar@mwmaint1002 ~]$ mwscript namespaceDupes.php mywiktionary --fix` T342516
  • 13:25 samtar@deploy1002: Finished scap: Backport for add citations, concordance, rhymes, reconstruction, therasus, namespaces for mywiktionary (T342516) (duration: 21m 28s)
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49667 and previous config saved to /var/cache/conftool/dbconfig/20230724-131712-root.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49665 and previous config saved to /var/cache/conftool/dbconfig/20230724-131050-root.json
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49664 and previous config saved to /var/cache/conftool/dbconfig/20230724-131043-root.json
  • 13:05 samtar@deploy1002: anzx and samtar: Backport for add citations, concordance, rhymes, reconstruction, therasus, namespaces for mywiktionary (T342516) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:03 vgutierrez: depooling cp4052 for some ATS 9.2.1 testing - T339134
  • 13:03 samtar@deploy1002: Started scap: Backport for add citations, concordance, rhymes, reconstruction, therasus, namespaces for mywiktionary (T342516)
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49663 and previous config saved to /var/cache/conftool/dbconfig/20230724-130208-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49662 and previous config saved to /var/cache/conftool/dbconfig/20230724-125545-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49661 and previous config saved to /var/cache/conftool/dbconfig/20230724-125538-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49660 and previous config saved to /var/cache/conftool/dbconfig/20230724-124703-root.json
  • 12:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28458
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49659 and previous config saved to /var/cache/conftool/dbconfig/20230724-124040-root.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49658 and previous config saved to /var/cache/conftool/dbconfig/20230724-124034-root.json
  • 12:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 28458
  • 12:36 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1014.eqiad.wmnet with OS bullseye
  • 12:31 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49656 and previous config saved to /var/cache/conftool/dbconfig/20230724-123158-root.json
  • 12:31 jclark@cumin1001: START - Cookbook sre.hosts.reimage for host rdb1013.eqiad.wmnet with OS bullseye
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49655 and previous config saved to /var/cache/conftool/dbconfig/20230724-122536-root.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49654 and previous config saved to /var/cache/conftool/dbconfig/20230724-122529-root.json
  • 12:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['rdb1013.eqiad.wmnet']
  • 12:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['rdb1013.eqiad.wmnet']
  • 12:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['rdb1014.eqiad.wmnet']
  • 12:17 jclark@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['rdb1013.eqiad.wmnet']
  • 12:17 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['rdb1014.eqiad.wmnet']
  • 12:16 marostegui@cumin1001: dbctl commit (dc=all): 'db1187 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49653 and previous config saved to /var/cache/conftool/dbconfig/20230724-121653-root.json
  • 12:16 jclark@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['rdb1013.eqiad.wmnet']
  • 12:14 dcausse@deploy1002: Finished deploy [airflow-dags/search@e7b9253]: search: fix table name for wmf_raw.mediawiki_page (duration: 00m 12s)
  • 12:14 dcausse@deploy1002: Started deploy [airflow-dags/search@e7b9253]: search: fix table name for wmf_raw.mediawiki_page
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1187', diff saved to https://phabricator.wikimedia.org/P49652 and previous config saved to /var/cache/conftool/dbconfig/20230724-121329-root.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3316 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49651 and previous config saved to /var/cache/conftool/dbconfig/20230724-121031-root.json
  • 12:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2169:3317 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49650 and previous config saved to /var/cache/conftool/dbconfig/20230724-121024-root.json
  • 12:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2169 (s6, s7)', diff saved to https://phabricator.wikimedia.org/P49649 and previous config saved to /var/cache/conftool/dbconfig/20230724-120609-root.json
  • 10:58 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:51 eoghan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on releases2002.codfw.wmnet,releases1002.eqiad.wmnet with reason: Decommissioning prep
  • 10:51 eoghan@cumin1001: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on releases2002.codfw.wmnet,releases1002.eqiad.wmnet with reason: Decommissioning prep
  • 10:48 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:47 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:47 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:46 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:46 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:45 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:44 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:41 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:41 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940880 (T342211) to eqiad DC, only one left (disable keepalive on port 80 on A:cp)
  • 10:41 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:39 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1005
  • 10:39 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1005
  • 09:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1124.eqiad.wmnet onto db1133.eqiad.wmnet
  • 09:26 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940873 (T342211) to drmrs DC (disable keepalive on port 80 on A:cp-drmrs)
  • 09:26 dcausse@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 09:24 dcausse@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/rdf-streaming-updater: apply
  • 09:22 vgutierrez: rollback to trafficserver 9.1.4 in cp4052 - T339134
  • 09:15 ladsgroup@cumin1001: START - Cookbook sre.mysql.clone of db1124.eqiad.wmnet onto db1133.eqiad.wmnet
  • 09:13 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:12 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:08 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:08 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:03 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:01 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:00 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:59 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:58 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:57 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:56 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:54 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:45 vgutierrez: testing trafficserver 9.2.1 in cp4052 (upload node) - T339134
  • 08:39 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 08:38 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 08:36 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 08:36 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 08:33 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:33 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:32 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:31 oblivian@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:30 oblivian@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:30 oblivian@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 08:29 oblivian@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 08:28 oblivian@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:22 dcausse@deploy1002: Finished deploy [airflow-dags/search@a47bd0f]: search: Fix partition definition for wmf_raw.mediawiki_page_table (duration: 00m 12s)
  • 08:22 dcausse@deploy1002: Started deploy [airflow-dags/search@a47bd0f]: search: Fix partition definition for wmf_raw.mediawiki_page_table
  • 07:40 urbanecm@deploy1002: Finished scap: Backport for ChangeMentor: Refactor the notification conditions (T336875) (duration: 07m 02s)
  • 07:33 urbanecm@deploy1002: Started scap: Backport for ChangeMentor: Refactor the notification conditions (T336875)
  • 07:32 urbanecm@deploy1002: Finished scap: Backport for Add reassignMentees.php maintenance script (T330071) (duration: 14m 39s)
  • 07:31 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:30 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:25 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:23 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:23 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:17 urbanecm@deploy1002: Started scap: Backport for Add reassignMentees.php maintenance script (T330071)
  • 06:32 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:28 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:27 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 06:23 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.

2023-07-23

  • 19:53 sukhe@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 19:53 sukhe@cumin2002: START - Cookbook sre.network.cf
  • 01:15 sukhe@cumin2002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 01:15 sukhe@cumin2002: START - Cookbook sre.network.cf

2023-07-21

  • 21:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1149.eqiad.wmnet with OS bullseye
  • 21:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1149.eqiad.wmnet with reason: host reimage
  • 20:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1149.eqiad.wmnet with reason: host reimage
  • 20:21 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1149.eqiad.wmnet with OS bullseye
  • 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1150.eqiad.wmnet with OS bullseye
  • 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 20:15 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:14 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:04 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 20:04 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 20:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1150.eqiad.wmnet with reason: host reimage
  • 19:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1150.eqiad.wmnet with reason: host reimage
  • 19:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1150.eqiad.wmnet with OS bullseye
  • 19:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1151.eqiad.wmnet with OS bullseye
  • 19:41 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 19:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1151.eqiad.wmnet with reason: host reimage
  • 19:21 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1151.eqiad.wmnet with reason: host reimage
  • 19:07 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1151.eqiad.wmnet with OS bullseye
  • 19:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1152.eqiad.wmnet with OS bullseye
  • 18:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1152.eqiad.wmnet with reason: host reimage
  • 18:45 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1152.eqiad.wmnet with reason: host reimage
  • 18:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1152.eqiad.wmnet with OS bullseye
  • 18:16 dancy@deploy1002: Finished scap: Backport for Enable the CampaignEvents extension before loading CommonSettings-labs (T342452) (duration: 17m 31s)
  • 18:14 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 18:00 dancy@deploy1002: daimona and dancy: Backport for Enable the CampaignEvents extension before loading CommonSettings-labs (T342452) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 17:59 dancy@deploy1002: Started scap: Backport for Enable the CampaignEvents extension before loading CommonSettings-labs (T342452)
  • 17:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1152.eqiad.wmnet with reason: host reimage
  • 17:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1152.eqiad.wmnet with reason: host reimage
  • 17:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1152.eqiad.wmnet with OS bullseye
  • 15:35 aokoth@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 90 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Read-only DB
  • 15:35 aokoth@cumin1001: START - Cookbook sre.hosts.downtime for 90 days, 0:00:00 on vrts2001.codfw.wmnet with reason: Read-only DB
  • 15:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.sonic-ssh (exit_code=0) for network device lsw1-e8-eqiad
  • 15:11 ayounsi@cumin1001: START - Cookbook sre.network.sonic-ssh for network device lsw1-e8-eqiad
  • 15:10 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:06 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:05 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:04 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:04 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:58 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1016.eqiad.wmnet with OS bookworm
  • 14:55 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:55 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:54 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:50 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:38 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 14:37 sukhe: sudo ipmitool -I lanplus -H "lvs1016.mgmt.eqiad.wmnet" -U root -E chassis power off
  • 14:17 sukhe: sudo ipmitool -I lanplus -H "lvs1016.mgmt.eqiad.wmnet" -U root -E chassis power cycle
  • 14:14 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1016.eqiad.wmnet with OS bookworm
  • 14:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 13:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:43 jclark@cumin1001: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host rdb1014.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host rdb1014.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:52 jclark@cumin1001: START - Cookbook sre.hosts.provision for host rdb1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 12:52 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host rdb1014
  • 12:50 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host rdb1014
  • 12:50 jclark@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host rdb1013
  • 12:49 jclark@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host rdb1013
  • 12:49 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:49 jclark@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt rdb101[34] - jclark@cumin1001"
  • 12:48 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt rdb101[34] - jclark@cumin1001"
  • 12:46 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 12:39 jbond@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 12:38 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 12:38 jbond@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 12:37 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 12:35 jbond@cumin1001: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 12:35 jbond@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 12:14 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1016.eqiad.wmnet with OS bookworm
  • 12:03 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 11:47 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1016.eqiad.wmnet with OS bookworm
  • 11:30 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 10:49 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:49 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:34 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:33 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:30 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1016.eqiad.wmnet with OS bookworm
  • 10:27 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1015.eqiad.wmnet with OS bookworm
  • 10:27 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 10:26 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 10:16 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 10:13 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 10:12 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1015.eqiad.wmnet with reason: host reimage
  • 10:08 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1015.eqiad.wmnet with reason: host reimage
  • 09:58 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 09:58 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 09:58 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1015.eqiad.wmnet with OS bookworm
  • 09:57 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1015.eqiad.wmnet with OS bookworm
  • 09:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 09:53 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1015.eqiad.wmnet with OS bookworm
  • 09:52 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 09:50 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1014.eqiad.wmnet with OS bookworm
  • 09:50 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 09:47 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 09:36 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 09:33 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1014.eqiad.wmnet with reason: host reimage
  • 09:30 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1014.eqiad.wmnet with reason: host reimage
  • 09:19 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1014.eqiad.wmnet with OS bookworm
  • 09:19 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1014.eqiad.wmnet with OS bookworm
  • 09:09 jayme: enable puppet on C:confd - T341669
  • 09:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49645 and previous config saved to /var/cache/conftool/dbconfig/20230721-090625-root.json
  • 09:00 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49644 and previous config saved to /var/cache/conftool/dbconfig/20230721-090003-root.json
  • 08:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49643 and previous config saved to /var/cache/conftool/dbconfig/20230721-085955-root.json
  • 08:59 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1014.eqiad.wmnet with OS bookworm
  • 08:56 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1108.eqiad.wmnet with reason: db1108 has been replaced with db1208 - leaving for a few days before decom
  • 08:56 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1108.eqiad.wmnet with reason: db1108 has been replaced with db1208 - leaving for a few days before decom
  • 08:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudcontrol1005 - aborrero@cumin1001 - T341495"
  • 08:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49642 and previous config saved to /var/cache/conftool/dbconfig/20230721-085120-root.json
  • 08:48 jayme: ignore "disabling puppet in C:cumin" - was a typo
  • 08:47 jayme: disabling puppet in C:confd - T341669
  • 08:47 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudcontrol1005 - aborrero@cumin1001 - T341495"
  • 08:47 jayme: disabling puppet in C:cumin - T341669
  • 08:45 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "cloudcontrol1005 - aborrero@cumin1001 - T341495"
  • 08:45 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "cloudcontrol1005 - aborrero@cumin1001 - T341495"
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49641 and previous config saved to /var/cache/conftool/dbconfig/20230721-084459-root.json
  • 08:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49640 and previous config saved to /var/cache/conftool/dbconfig/20230721-084450-root.json
  • 08:44 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:43 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 08:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49639 and previous config saved to /var/cache/conftool/dbconfig/20230721-083616-root.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49638 and previous config saved to /var/cache/conftool/dbconfig/20230721-082954-root.json
  • 08:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49637 and previous config saved to /var/cache/conftool/dbconfig/20230721-082946-root.json
  • 08:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49636 and previous config saved to /var/cache/conftool/dbconfig/20230721-082111-root.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49635 and previous config saved to /var/cache/conftool/dbconfig/20230721-081449-root.json
  • 08:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49634 and previous config saved to /var/cache/conftool/dbconfig/20230721-081441-root.json
  • 08:10 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49633 and previous config saved to /var/cache/conftool/dbconfig/20230721-080606-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49632 and previous config saved to /var/cache/conftool/dbconfig/20230721-075944-root.json
  • 07:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49631 and previous config saved to /var/cache/conftool/dbconfig/20230721-075936-root.json
  • 07:57 zabe@deploy1002: Finished scap: T342405 (duration: 07m 03s)
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49630 and previous config saved to /var/cache/conftool/dbconfig/20230721-075101-root.json
  • 07:50 zabe@deploy1002: Started scap: T342405
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49629 and previous config saved to /var/cache/conftool/dbconfig/20230721-074440-root.json
  • 07:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49628 and previous config saved to /var/cache/conftool/dbconfig/20230721-074431-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49627 and previous config saved to /var/cache/conftool/dbconfig/20230721-073557-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49626 and previous config saved to /var/cache/conftool/dbconfig/20230721-072935-root.json
  • 07:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49625 and previous config saved to /var/cache/conftool/dbconfig/20230721-072927-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1201 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49624 and previous config saved to /var/cache/conftool/dbconfig/20230721-072052-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1201', diff saved to https://phabricator.wikimedia.org/P49623 and previous config saved to /var/cache/conftool/dbconfig/20230721-071623-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3316 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49622 and previous config saved to /var/cache/conftool/dbconfig/20230721-071430-root.json
  • 07:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2171:3315 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49621 and previous config saved to /var/cache/conftool/dbconfig/20230721-071422-root.json
  • 07:12 marostegui: Upgrade dbstore1005 to mariadb 10.6 T334652
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2171 (s5 and s6)', diff saved to https://phabricator.wikimedia.org/P49620 and previous config saved to /var/cache/conftool/dbconfig/20230721-070110-root.json
  • 06:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3209
  • 06:40 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3209
  • 06:40 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 398203
  • 06:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 398203
  • 06:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 139418
  • 06:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 139418
  • 06:36 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 139148
  • 06:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 139148
  • 04:00 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:01:00 on 10 hosts with reason: trying to remove downtime on these new hosts
  • 04:00 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 0:01:00 on 10 hosts with reason: trying to remove downtime on these new hosts
  • 03:51 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=yes; selector: name=wdqs202([1-2])\.codfw\.wmnet
  • 03:50 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=active; selector: name=wdqs222([0-1])\.codfw\.wmnet
  • 00:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1153.eqiad.wmnet with OS bullseye
  • 00:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 00:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1153.eqiad.wmnet with reason: host reimage

2023-07-20

  • 23:59 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1153.eqiad.wmnet with reason: host reimage
  • 23:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1153.eqiad.wmnet with OS bullseye
  • 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1154.eqiad.wmnet with OS bullseye
  • 23:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 23:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1154.eqiad.wmnet with reason: host reimage
  • 23:13 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1154.eqiad.wmnet with reason: host reimage
  • 22:54 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1154.eqiad.wmnet with OS bullseye
  • 22:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1155.eqiad.wmnet with OS bullseye
  • 22:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:47 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 22:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1155.eqiad.wmnet with reason: host reimage
  • 22:29 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1155.eqiad.wmnet with reason: host reimage
  • 22:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1155.eqiad.wmnet with OS bullseye
  • 21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1156.eqiad.wmnet with OS bullseye
  • 21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 21:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1156.eqiad.wmnet with reason: host reimage
  • 21:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1156.eqiad.wmnet with reason: host reimage
  • 21:18 hashar@deploy1002: Finished deploy [integration/docroot@0e476e5]: Tweak Zuul status page css 🥚 (duration: 00m 07s)
  • 21:18 hashar@deploy1002: Started deploy [integration/docroot@0e476e5]: Tweak Zuul status page css 🥚
  • 21:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1156.eqiad.wmnet with OS bullseye
  • 20:25 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:25 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:17 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:44 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.41.0-wmf.18 refs T340246
  • 18:44 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2020.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs20{20}.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2019.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2018.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2017.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2016.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2015.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2014.codfw.wmnet
  • 18:43 bking@cumin1001: conftool action : set/pooled=yes:weight=10; selector: name=wdqs2013.codfw.wmnet
  • 18:41 bking@cumin1001: conftool action : set/pooled=yes,set/weight=10; selector: name=wdqs2013-19.codfw.wmnet
  • 18:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:38 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 18:38 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk1002.eqiad.wmnet with OS bookworm
  • 17:41 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye
  • 17:39 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk1002.eqiad.wmnet with reason: host reimage
  • 17:36 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk1002.eqiad.wmnet with reason: host reimage
  • 17:25 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1002.eqiad.wmnet with OS bookworm
  • 17:25 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 17:22 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 17:22 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk1001.eqiad.wmnet with OS bookworm
  • 17:21 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs1013.eqiad.wmnet with OS bookworm
  • 17:21 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 17:12 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - fabfur@cumin1001"
  • 17:09 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 17:00 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk1001.eqiad.wmnet with reason: host reimage
  • 16:57 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 16:56 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk1001.eqiad.wmnet with reason: host reimage
  • 16:55 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 16:53 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs1013.eqiad.wmnet with reason: host reimage
  • 16:52 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 16:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcontrol1005.eqiad.wmnet with reason: host reimage
  • 16:49 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 16:49 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts analytics1075.eqiad.wmnet
  • 16:49 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940190 (T342211) to codfw DC (disable keepalive on port 80 on A:cp-codfw)
  • 16:47 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1073.eqiad.wmnet with OS bullseye
  • 16:43 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
  • 16:42 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1001.eqiad.wmnet with OS bookworm
  • 16:41 fabfur@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs1013.eqiad.wmnet with OS bookworm
  • 16:40 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
  • 16:38 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 16:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 16:37 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 16:31 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudcontrol1005.eqiad.wmnet with OS bullseye
  • 16:22 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcontrol1005
  • 16:21 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1005
  • 16:21 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bullseye
  • 16:21 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:21 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005 - aborrero@cumin1001"
  • 16:20 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005 - aborrero@cumin1001"
  • 16:18 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1073.eqiad.wmnet with reason: host reimage
  • 16:18 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 16:16 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1073.eqiad.wmnet with reason: host reimage
  • 16:15 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
  • 16:13 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1073.eqiad.wmnet with OS bullseye
  • 16:05 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 16:03 aborrero@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcontrol1005
  • 16:03 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudcontrol1005
  • 16:02 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 15:51 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:51 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:50 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 15:49 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 15:48 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 15:48 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 15:48 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bookworm
  • 15:46 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 15:46 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 15:46 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:31 elukey: stop kafka main eqiad maintenance - T341558
  • 15:20 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:16 bking@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 15:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1073.eqiad.wmnet with reason: host reimage
  • 15:08 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1073.eqiad.wmnet with reason: host reimage
  • 15:07 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:06 jbond@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bullseye
  • 14:58 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1075.eqiad.wmnet with OS bullseye
  • 14:58 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940150 (T342211) to ulsfo DC (disable keepalive on port 80 on A:cp-ulsfo)
  • 14:56 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:56 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on flink-zk1003.eqiad.wmnet with reason: host reimage
  • 14:51 bking@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on flink-zk1003.eqiad.wmnet with reason: host reimage
  • 14:51 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
  • 14:50 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1073.eqiad.wmnet with OS bullseye
  • 14:45 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts analytics1073.eqiad.wmnet
  • 14:45 herron: roll restart codfw/eqiad low-traffic pybals to add prometheus-https T326657
  • 14:45 btullis@cumin1001: START - Cookbook sre.hadoop.init-hadoop-workers for hosts analytics1073.eqiad.wmnet
  • 14:41 apine@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:41 apine@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:38 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 14:38 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 14:37 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 14:36 sukhe: run agent on cumin -b1 -s30 'A:dns-rec and not P{dns4004*}'
  • 14:34 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1075.eqiad.wmnet with reason: host reimage
  • 14:32 bking@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 14:31 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1075.eqiad.wmnet with reason: host reimage
  • 14:30 sukhe: disable puppet on A:dns-rec to slowly roll out CR 937991
  • 14:15 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1075.eqiad.wmnet with OS bullseye
  • 14:14 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 14:13 sukhe: dns1004 upgrade to pdns-rec 4.8.4: T341611
  • 14:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 14:04 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 14:04 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 14:01 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 13:59 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:58 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:57 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 13:57 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 13:55 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 13:55 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:54 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 13:54 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 13:53 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:52 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:51 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:48 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:47 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:45 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
  • 13:32 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:31 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for vlan ints lsw1-f8-eqiad - cmooney@cumin1001"
  • 13:21 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for vlan ints lsw1-f8-eqiad - cmooney@cumin1001"
  • 13:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:18 cmooney@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 13:18 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:17 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 13:12 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 13:10 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:10 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for vlan ints lsw1-f8-eqiad - cmooney@cumin1001"
  • 13:09 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for vlan ints lsw1-f8-eqiad - cmooney@cumin1001"
  • 13:04 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 13:00 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 12:56 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 12:44 topranks: LDAP - adding user ifrahkh to groups wmde & nda
  • 12:43 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 12:43 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 12:24 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 12:22 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:22 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for assigned switch gw ips. - cmooney@cumin1001"
  • 12:20 zabe@deploy1002: Finished scap: Backport for SpecialUserRights: Check for username to be temporary (T340468 T342322) (duration: 08m 22s)
  • 12:15 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for assigned switch gw ips. - cmooney@cumin1001"
  • 12:13 zabe@deploy1002: zabe and dreamyjazz: Backport for SpecialUserRights: Check for username to be temporary (T340468 T342322) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 12:13 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:12 zabe@deploy1002: Started scap: Backport for SpecialUserRights: Check for username to be temporary (T340468 T342322)
  • 11:53 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:53 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for assigned switch loopbacks. - cmooney@cumin1001"
  • 11:52 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for assigned switch loopbacks. - cmooney@cumin1001"
  • 11:50 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 11:35 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940101 (T342211) to eqsin DC (disable keepalive on port 80 on A:cp-eqsin)
  • 10:53 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:40 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:33 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:50 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:50 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove reverse dns for IP allocated in error. - cmooney@cumin1001"
  • 09:26 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw1357.eqiad.wmnet
  • 09:25 filippo@cumin1001: conftool action : set/weight=10; selector: name=mw1357.eqiad.wmnet
  • 09:25 filippo@cumin1001: conftool action : set/pooled=yes; selector: name=mw1356.eqiad.wmnet
  • 09:25 filippo@cumin1001: conftool action : set/weight=10; selector: name=mw1356.eqiad.wmnet
  • 09:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove reverse dns for IP allocated in error. - cmooney@cumin1001"
  • 09:19 filippo@cumin1001: conftool action : set/pooled=no; selector: name=mw1357.eqiad.wmnet
  • 09:19 filippo@cumin1001: conftool action : set/pooled=no; selector: name=mw1356.eqiad.wmnet
  • 09:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 09:17 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:17 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove reverse dns for IP allocated in error. - cmooney@cumin1001"
  • 09:15 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove reverse dns for IP allocated in error. - cmooney@cumin1001"
  • 09:12 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 08:31 fabfur: applying https://gerrit.wikimedia.org/r/c/operations/puppet/+/940091 (T342211) to esams DC (disable keepalive on port 80)
  • 08:29 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 08:27 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 08:25 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 08:24 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 08:24 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:23 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:21 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 08:21 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 07:56 apergos: UTC morning backport and config training window really complete
  • 07:50 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 07:50 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 07:40 apergos: UTC morning backport and config training window reopened for fix to the last noc patch
  • 07:37 apergos: UTC morning backport and config training window complete
  • 07:36 ariel@deploy1002: Finished scap: Backport for noc/db.php: use the new etcd fetch function (T341859) (duration: 09m 14s)
  • 07:29 ariel@deploy1002: oblivian and ariel: Backport for noc/db.php: use the new etcd fetch function (T341859) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:27 ariel@deploy1002: Started scap: Backport for noc/db.php: use the new etcd fetch function (T341859)
  • 07:25 ariel@deploy1002: Finished scap: Backport for noc: add script to dump etcd db config (T341859) (duration: 09m 35s)
  • 07:17 ariel@deploy1002: oblivian and ariel: Backport for noc: add script to dump etcd db config (T341859) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:16 ariel@deploy1002: Started scap: Backport for noc: add script to dump etcd db config (T341859)
  • 07:12 ariel@deploy1002: Finished scap: Backport for Enable EditInSequence in pawikisource (duration: 09m 52s)
  • 07:04 ariel@deploy1002: ariel and soda: Backport for Enable EditInSequence in pawikisource synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:02 ariel@deploy1002: Started scap: Backport for Enable EditInSequence in pawikisource
  • 06:37 elukey: start kafka main eqiad maintenance (partitions rebalancing) - T341558
  • 04:33 oblivian@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=parse1002.*
  • 04:28 eileen: civicrm upgraded from 0cde2608 to d7c8d77e
  • 01:46 tstarling@deploy1002: Synchronized php-1.41.0-wmf.18/includes/diff/DifferenceEngine.php: fix prod error T342099, T341961 (duration: 08m 32s)
  • 01:35 tstarling@deploy1002: Synchronized php-1.41.0-wmf.17/includes/diff/DifferenceEngine.php: fix prod error T342099, T341961 (duration: 09m 20s)

2023-07-19

  • 22:36 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 22:36 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 22:08 eileen: civicrm upgraded from 7642b3d9 to 0cde2608
  • 21:41 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 21:38 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:37 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:37 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 21:37 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 21:37 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:37 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:36 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:32 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 21:32 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 21:27 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 01m 05s)
  • 21:26 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 21:21 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 01m 47s)
  • 21:20 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 20:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk1003.eqiad.wmnet
  • 20:55 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:55 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 20:54 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 20:43 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:39 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk1003.eqiad.wmnet
  • 20:39 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 20:39 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:38 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:38 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 20:33 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 20:33 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 20:31 TheresNoTime: backport window closed
  • 20:28 samtar@deploy1002: Finished scap: Backport for Replace underscores with spaces in 4 Arabic sitenames (T337725) (duration: 17m 09s)
  • 20:26 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:26 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse dns for spine linknets eqiad - cmooney@cumin1001"
  • 20:24 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add reverse dns for spine linknets eqiad - cmooney@cumin1001"
  • 20:22 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 20:12 samtar@deploy1002: samtar and hubaishan: Backport for Replace underscores with spaces in 4 Arabic sitenames (T337725) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:11 samtar@deploy1002: Started scap: Backport for Replace underscores with spaces in 4 Arabic sitenames (T337725)
  • 19:45 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-durum (exit_code=0) rolling reboot on A:durum and A:durum
  • 19:42 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 19:41 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 19:40 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 19:40 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 19:40 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 19:40 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:40 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 19:39 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 19:37 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 19:37 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 19:36 bking@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts flink-zk1003.eqiad.wmnet
  • 19:36 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:36 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 19:35 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: flink-zk1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - bking@cumin1001"
  • 19:28 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 19:24 bking@cumin1001: START - Cookbook sre.hosts.decommission for hosts flink-zk1003.eqiad.wmnet
  • 19:03 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pybal-test2003.codfw.wmnet
  • 18:58 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host pybal-test2003.codfw.wmnet
  • 18:41 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-durum rolling reboot on A:durum and A:durum
  • 18:29 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough and A:wikidough
  • 18:25 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.18 refs T340246
  • 17:58 ayounsi@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:50 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host lvs1013.eqiad.wmnet with OS bullseye
  • 17:49 Amir1: powercycled db1218 (T342284)
  • 17:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1218.eqiad.wmnet with reason: Maint
  • 17:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1218.eqiad.wmnet with reason: Maint
  • 17:41 robh@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bullseye
  • 17:40 sukhe@cumin2002: dbctl commit (dc=all): 'Depool db1218', diff saved to https://phabricator.wikimedia.org/P49603 and previous config saved to /var/cache/conftool/dbconfig/20230719-174019-sukhe.json
  • 17:40 sukhe: depool db1218
  • 17:32 sukhe: dummy run of authdns-update
  • 17:29 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1011.eqiad.wmnet
  • 17:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1011.eqiad.wmnet
  • 17:22 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1010.eqiad.wmnet
  • 17:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5004.wikimedia.org
  • 17:16 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1010.eqiad.wmnet
  • 17:15 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns5004.wikimedia.org
  • 17:09 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough and A:wikidough
  • 17:02 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 17:00 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host druid1009.eqiad.wmnet
  • 16:56 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:56 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:55 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host druid1009.eqiad.wmnet
  • 16:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 16:47 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:46 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:30 joal@deploy1002: Finished deploy [airflow-dags/analytics@4c06501]: Fix bug introduced in cassandra loading jobs (duration: 00m 15s)
  • 16:29 joal@deploy1002: Started deploy [airflow-dags/analytics@4c06501]: Fix bug introduced in cassandra loading jobs
  • 16:26 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:26 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for e4 mgmt entries - cmooney@cumin1001"
  • 16:25 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for e4 mgmt entries - cmooney@cumin1001"
  • 16:21 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 16:20 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 16:20 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 16:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:20 cmooney@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for e4 mgmt entries - cmooney@cumin1001"
  • 16:17 cmooney@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS for e4 mgmt entries - cmooney@cumin1001"
  • 16:14 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 15:55 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:46 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 15:44 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 15:43 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 15:40 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1073.eqiad.wmnet with OS bullseye
  • 15:35 apergos: dumpsdata1007 is now the fallback host for sql/xml dumps and for misc dumps. dumpsdata1004, the former fallback host, is now a spare.
  • 15:34 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1075.eqiad.wmnet with OS bullseye
  • 15:32 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:30 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:28 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:26 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:26 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: trying to resolve netbox issues - sukhe@cumin2002"
  • 15:25 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: trying to resolve netbox issues - sukhe@cumin2002"
  • 15:23 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:23 sukhe@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:23 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bullseye
  • 15:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
  • 15:19 ayounsi@cumin1001: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:18 ayounsi@cumin1001: START - Cookbook sre.dns.netbox
  • 15:18 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 15:14 robh: mw140[89] downtime for relocation per T308339
  • 15:13 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
  • 15:11 robh: mw141[01] returned to service per T308339
  • 15:11 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 15:11 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:11 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw1411
  • 15:11 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host mw1411
  • 15:09 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 15:09 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 15:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 15:03 fabfur: disabling keepalive on port 80 for cp5024 https://gerrit.wikimedia.org/r/939707 (T342211)
  • 14:59 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 14:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 14:58 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 14:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 14:54 ayounsi@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:54 jclark@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcontrol1005 - jclark@cumin1001"
  • 14:53 jclark@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcontrol1005 - jclark@cumin1001"
  • 14:51 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:49 robh: mw141[23] returned to service per T308339. ignore typo of mw1414 it is uninvolved
  • 14:48 robh: mw141[34] returned to service per T308339
  • 14:40 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw1412
  • 14:39 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host mw1412
  • 14:39 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mw1413
  • 14:38 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host mw1413
  • 14:37 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dns5003.wikimedia.org
  • 14:35 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:33 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:33 sukhe@cumin2002: START - Cookbook sre.hosts.reboot-single for host dns5003.wikimedia.org
  • 14:30 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 14:30 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:28 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:28 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 14:21 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 14:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
  • 14:19 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1073.eqiad.wmnet with OS bullseye
  • 14:16 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 14:16 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:14 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:14 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1075.eqiad.wmnet with OS bullseye
  • 14:14 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 14:07 robh: mw141[23] downtimes and relocating per T308339
  • 13:54 Lucas_WMDE: pulled tests: Test setting names (T342249) to deploy1002 (no scap sync needed, tests-only change)
  • 13:46 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 13:46 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 13:46 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 13:46 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:46 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 13:45 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 13:42 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 13:42 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 13:42 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 13:42 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:42 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 13:42 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 13:39 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 13:39 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 13:31 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 13:29 fabfur: aborted previous operations, no need to disable puppet to apply that CR (https://gerrit.wikimedia.org/r/c/operations/puppet/+/939661) (T342211)
  • 13:27 fabfur: temporary disable puppet on cp3052 to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/939661 (T342211)
  • 13:26 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:15 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
  • 13:13 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix) (T318433) (duration: 10m 47s)
  • 13:04 lucaswerkmeister-wmde@deploy1002: ssastry and lucaswerkmeister-wmde: Backport for Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix) (T318433) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:02 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Fix incorrect use of UseLegacyMediaStyles (missing "wg" prefix) (T318433)
  • 12:43 joal@deploy1002: Finished deploy [airflow-dags/analytics@87be328]: Refactor cassandra loading jobs (duration: 00m 14s)
  • 12:43 joal@deploy1002: Started deploy [airflow-dags/analytics@87be328]: Refactor cassandra loading jobs
  • 12:27 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/services/ipoid: apply
  • 12:27 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/services/ipoid: apply
  • 12:22 jbond: switch puppertboard.wikimedia.oreg to use puppet7 infrastructre
  • 12:22 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:22 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:17 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet on all recursors
  • 12:17 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet on all recursors
  • 12:17 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard-next.discovery.wmnet on all recursors
  • 12:17 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard-next.discovery.wmnet on all recursors
  • 11:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1016.eqiad.wmnet
  • 11:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 11:47 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 11:45 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 11:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1016.eqiad.wmnet
  • 11:13 jebe@deploy1002: Finished deploy [analytics/refinery@eaabff2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@eaabff2] (duration: 01m 43s)
  • 11:12 jebe@deploy1002: Started deploy [analytics/refinery@eaabff2] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@eaabff2]
  • 11:11 jebe@deploy1002: Finished deploy [analytics/refinery@eaabff2] (thin): Regular analytics weekly train THIN [analytics/refinery@eaabff2] (duration: 00m 04s)
  • 11:11 jebe@deploy1002: Started deploy [analytics/refinery@eaabff2] (thin): Regular analytics weekly train THIN [analytics/refinery@eaabff2]
  • 11:09 jebe@deploy1002: Finished deploy [analytics/refinery@eaabff2]: Regular analytics weekly train [analytics/refinery@eaabff2] (duration: 10m 24s)
  • 10:59 jebe@deploy1002: Started deploy [analytics/refinery@eaabff2]: Regular analytics weekly train [analytics/refinery@eaabff2]
  • 10:02 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:54 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 09:54 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 09:50 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 09:48 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 09:43 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:14 btullis@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 04s)
  • 09:14 btullis@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
  • 09:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49599 and previous config saved to /var/cache/conftool/dbconfig/20230719-091205-root.json
  • 09:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49598 and previous config saved to /var/cache/conftool/dbconfig/20230719-090328-root.json
  • 08:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49597 and previous config saved to /var/cache/conftool/dbconfig/20230719-085700-root.json
  • 08:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49596 and previous config saved to /var/cache/conftool/dbconfig/20230719-084823-root.json
  • 08:45 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49595 and previous config saved to /var/cache/conftool/dbconfig/20230719-084156-root.json
  • 08:38 dcausse: closing the UTC morning backport window
  • 08:37 dcausse@deploy1002: Finished scap: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate (duration: 07m 59s)
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49594 and previous config saved to /var/cache/conftool/dbconfig/20230719-083319-root.json
  • 08:30 dcausse@deploy1002: dcausse: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:29 dcausse@deploy1002: Started scap: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate
  • 08:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49593 and previous config saved to /var/cache/conftool/dbconfig/20230719-082651-root.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49592 and previous config saved to /var/cache/conftool/dbconfig/20230719-081814-root.json
  • 08:11 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49591 and previous config saved to /var/cache/conftool/dbconfig/20230719-081146-root.json
  • 08:10 dcausse@deploy1002: Finished scap: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate (duration: 07m 36s)
  • 08:04 dcausse@deploy1002: dcausse: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49590 and previous config saved to /var/cache/conftool/dbconfig/20230719-080309-root.json
  • 08:02 dcausse@deploy1002: Started scap: Backport for Use the LinksUpdate::isRecursive flag again to route cirrusSearchLinksUpdate
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49589 and previous config saved to /var/cache/conftool/dbconfig/20230719-075642-root.json
  • 07:54 _joe_: ran scap pull, pool on parse1002 after powercycling
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49588 and previous config saved to /var/cache/conftool/dbconfig/20230719-074804-root.json
  • 07:47 _joe_: powercycling parse1002, console blank, unreachable to network
  • 07:46 dcausse@deploy1002: Backport cancelled.
  • 07:45 oblivian@cumin1001: conftool action : set/pooled=inactive; selector: name=parse1002.eqiad.wmnet
  • 07:41 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49587 and previous config saved to /var/cache/conftool/dbconfig/20230719-074137-root.json
  • 07:36 dcausse@deploy1002: Finished scap: Backport for Add channel for TtmServerMessageUpdate of Translate extension (duration: 17m 44s)
  • 07:33 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49586 and previous config saved to /var/cache/conftool/dbconfig/20230719-073300-root.json
  • 07:26 marostegui@cumin1001: dbctl commit (dc=all): 'db1180 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49585 and previous config saved to /var/cache/conftool/dbconfig/20230719-072632-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1180', diff saved to https://phabricator.wikimedia.org/P49584 and previous config saved to /var/cache/conftool/dbconfig/20230719-072207-root.json
  • 07:20 dcausse@deploy1002: dcausse and abi: Backport for Add channel for TtmServerMessageUpdate of Translate extension synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 07:18 dcausse@deploy1002: Started scap: Backport for Add channel for TtmServerMessageUpdate of Translate extension
  • 07:17 marostegui@cumin1001: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49583 and previous config saved to /var/cache/conftool/dbconfig/20230719-071755-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2158', diff saved to https://phabricator.wikimedia.org/P49582 and previous config saved to /var/cache/conftool/dbconfig/20230719-071204-root.json
  • 06:23 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49581 and previous config saved to /var/cache/conftool/dbconfig/20230719-062313-root.json
  • 06:08 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49580 and previous config saved to /var/cache/conftool/dbconfig/20230719-060809-root.json
  • 05:53 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49579 and previous config saved to /var/cache/conftool/dbconfig/20230719-055304-root.json
  • 05:38 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49578 and previous config saved to /var/cache/conftool/dbconfig/20230719-053759-root.json
  • 05:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49577 and previous config saved to /var/cache/conftool/dbconfig/20230719-052254-root.json
  • 05:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49576 and previous config saved to /var/cache/conftool/dbconfig/20230719-050750-root.json
  • 04:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 3%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49575 and previous config saved to /var/cache/conftool/dbconfig/20230719-045245-root.json
  • 04:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling after maintenance', diff saved to https://phabricator.wikimedia.org/P49574 and previous config saved to /var/cache/conftool/dbconfig/20230719-043740-root.json
  • 00:16 eileen: civicrm upgraded from 67c526e7 to 7642b3d9

2023-07-18

  • 22:51 brett@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on P{doh5002*} and A:wikidough
  • 22:44 brett@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on P{doh5002*} and A:wikidough
  • 22:34 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 22:34 bking@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 22:32 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 22:32 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 22:24 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 22:24 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 22:24 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 22:24 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:24 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 22:23 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 22:18 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 22:18 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 22:18 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 22:18 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:18 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 22:16 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 22:12 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 22:12 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 22:06 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1003.eqiad.wmnet
  • 22:06 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 21:28 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host analytics1073.eqiad.wmnet
  • 21:22 urbanecm@deploy1002: Finished scap: Backport for Don't log for documentElement (nodeType 9) (T340081) (duration: 07m 42s)
  • 21:15 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Don't log for documentElement (nodeType 9) (T340081) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 21:15 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1003.eqiad.wmnet with OS bookworm
  • 21:15 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:14 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:14 urbanecm@deploy1002: Started scap: Backport for Don't log for documentElement (nodeType 9) (T340081)
  • 21:14 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1003.eqiad.wmnet on all recursors
  • 21:14 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1003.eqiad.wmnet on all recursors
  • 21:14 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:14 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:13 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1003.eqiad.wmnet - bking@cumin1001"
  • 21:10 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 21:10 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1003.eqiad.wmnet
  • 21:03 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1002.eqiad.wmnet
  • 21:03 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1002.eqiad.wmnet on all recursors
  • 21:03 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1002.eqiad.wmnet on all recursors
  • 21:03 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:03 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 21:02 urbanecm@deploy1002: Finished scap: Backport for Don't log for documentElement (nodeType 9) (T340081) (duration: 10m 01s)
  • 21:02 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 20:54 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Don't log for documentElement (nodeType 9) (T340081) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:52 urbanecm@deploy1002: Started scap: Backport for Don't log for documentElement (nodeType 9) (T340081)
  • 20:48 btullis@cumin1001: START - Cookbook sre.hosts.dhcp for host analytics1073.eqiad.wmnet
  • 20:43 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:43 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1002.eqiad.wmnet on all recursors
  • 20:43 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1002.eqiad.wmnet on all recursors
  • 20:43 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:43 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 20:41 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 20:28 urbanecm@deploy1002: Finished scap: Backport for Add additional debugging closest bug (T340081), Add additional debugging closest bug (T340081), Fixes: Mobile login watermark large and uncentered (T341812) (duration: 10m 28s)
  • 20:26 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:26 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1002.eqiad.wmnet
  • 20:19 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Add additional debugging closest bug (T340081), Add additional debugging closest bug (T340081), Fixes: Mobile login watermark large and uncentered (T341812) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes de
  • 20:17 urbanecm@deploy1002: Started scap: Backport for Add additional debugging closest bug (T340081), Add additional debugging closest bug (T340081), Fixes: Mobile login watermark large and uncentered (T341812)
  • 20:16 urbanecm@deploy1002: Finished scap: Backport for Deploy new logos (T341260 T341243 T341912) (duration: 09m 50s)
  • 20:07 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Deploy new logos (T341260 T341243 T341912) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:06 urbanecm@deploy1002: Started scap: Backport for Deploy new logos (T341260 T341243 T341912)
  • 19:53 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1075.eqiad.wmnet']
  • 19:53 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1075.eqiad.wmnet']
  • 19:53 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 19:52 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 19:50 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 19:49 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 19:49 btullis@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['analytics1073.mgmt.eqiad.wmnet']
  • 19:49 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.mgmt.eqiad.wmnet']
  • 18:57 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 18s)
  • 18:57 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 18:54 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
  • 18:54 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 18:51 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 17s)
  • 18:51 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 18:16 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1002.eqiad.wmnet
  • 18:16 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1002.eqiad.wmnet with OS bookworm
  • 18:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.18 refs T340246
  • 17:46 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 17:45 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1075.eqiad.wmnet with OS bullseye
  • 17:30 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1016
  • 17:30 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs1016
  • 17:29 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:29 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016 relocation - robh@cumin1001"
  • 17:29 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016 relocation - robh@cumin1001"
  • 17:27 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 17:25 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1002.eqiad.wmnet with OS bookworm
  • 17:21 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 17:20 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 17:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1002.eqiad.wmnet on all recursors
  • 17:19 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1002.eqiad.wmnet on all recursors
  • 17:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:19 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 17:19 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling reboot on A:wikidough-drmrs and A:wikidough
  • 17:19 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1002.eqiad.wmnet - bking@cumin1001"
  • 17:16 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 17:16 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1002.eqiad.wmnet
  • 17:07 jelto@cumin1001: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:gitlab-runner
  • 17:04 sukhe@cumin2002: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling reboot on A:wikidough-drmrs and A:wikidough
  • 17:02 dancy@deploy1002: Installation of scap version "4.55.0" completed for 605 hosts
  • 17:01 dancy@deploy1002: Installing scap version "4.55.0" for 605 hosts
  • 16:33 dancy@deploy1002: Pruned MediaWiki: 1.41.0-wmf.16 (duration: 02m 11s)
  • 16:30 dancy@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.18 refs T340246 (duration: 46m 15s)
  • 16:28 elukey: maintenance finished for kafka main-codfw
  • 16:24 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1075.eqiad.wmnet with OS bullseye
  • 16:03 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1015
  • 16:03 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs1015
  • 16:03 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1014
  • 16:03 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs10145 relocation - robh@cumin1001"
  • 16:03 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs1014
  • 16:02 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs10145 relocation - robh@cumin1001"
  • 16:01 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bullseye
  • 16:00 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 15:44 dancy@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.18 refs T340246
  • 15:31 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs1016.eqiad.wmnet
  • 15:31 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:31 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 15:31 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1016.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 15:28 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 15:23 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts lvs1016.eqiad.wmnet
  • 15:22 robh@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bullseye
  • 15:21 robh@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs1013
  • 15:21 robh@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host lvs1013
  • 15:20 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs1013.eqiad.wmnet with OS bullseye
  • 15:18 robh@cumin1001: START - Cookbook sre.hosts.reimage for host lvs1013.eqiad.wmnet with OS bullseye
  • 15:08 robh@cumin1001: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host lvs1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:02 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs1015.eqiad.wmnet
  • 15:02 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 15:00 robh@cumin1001: START - Cookbook sre.hosts.provision for host lvs1013.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:57 robh@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:57 robh@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1013 relocation - robh@cumin1001"
  • 14:56 robh@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1013 relocation - robh@cumin1001"
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1198', diff saved to https://phabricator.wikimedia.org/P49571 and previous config saved to /var/cache/conftool/dbconfig/20230718-145529-root.json
  • 14:54 robh@cumin1001: START - Cookbook sre.dns.netbox
  • 14:54 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts lvs1015.eqiad.wmnet
  • 14:45 sukhe: dns2004 upgrade to pdns-rec 4.8.4: T341611
  • 14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.3 - ayounsi@cumin1001
  • 14:37 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs1014.eqiad.wmnet
  • 14:37 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 14:36 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1014.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 14:36 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: Release v0.6.3 - ayounsi@cumin1001
  • 14:35 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:35 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:34 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 14:34 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:33 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:33 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:30 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:29 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts lvs1014.eqiad.wmnet
  • 14:22 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lvs1013.eqiad.wmnet
  • 14:22 fabfur@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:22 fabfur@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 14:22 fabfur@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lvs1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - fabfur@cumin1001"
  • 14:19 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1001.eqiad.wmnet
  • 14:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1001.eqiad.wmnet on all recursors
  • 14:19 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1001.eqiad.wmnet on all recursors
  • 14:19 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:19 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 14:18 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 14:16 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 14:16 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1001.eqiad.wmnet on all recursors
  • 14:16 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1001.eqiad.wmnet on all recursors
  • 14:16 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:16 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 14:12 fabfur@cumin1001: START - Cookbook sre.dns.netbox
  • 14:10 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 14:05 XioNoX: asw2-esams# set interfaces xe-4/0/4 disable - T342121
  • 14:04 jforrester@deploy1002: Finished scap: Backport for private/readme.php: Add $wgCampaignEventsProgramsAndEventsDashboardAPISecret (T320260) (duration: 08m 04s)
  • 14:04 fabfur@cumin1001: START - Cookbook sre.hosts.decommission for hosts lvs1013.eqiad.wmnet
  • 13:58 jforrester@deploy1002: jforrester and daimona: Backport for private/readme.php: Add $wgCampaignEventsProgramsAndEventsDashboardAPISecret (T320260) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:56 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 13:56 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1001.eqiad.wmnet
  • 13:56 jforrester@deploy1002: Started scap: Backport for private/readme.php: Add $wgCampaignEventsProgramsAndEventsDashboardAPISecret (T320260)
  • 13:55 jforrester@deploy1002: Finished scap: Backport for prod: Enable wgCampaignEventsProgramsAndEventsDashboardInstance (T320260) (duration: 21m 19s)
  • 13:43 jbond: upload python3-conftool_2.2.2-1+deb12u1
  • 13:43 jbond: upload python3-conftool_2.2.2-1
  • 13:43 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:42 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:42 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 13:41 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 13:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:39 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:35 jforrester@deploy1002: daimona and jforrester: Backport for prod: Enable wgCampaignEventsProgramsAndEventsDashboardInstance (T320260) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:35 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 03s)
  • 13:35 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
  • 13:34 jforrester@deploy1002: Started scap: Backport for prod: Enable wgCampaignEventsProgramsAndEventsDashboardInstance (T320260)
  • 13:26 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host analytics1073.eqiad.wmnet
  • 13:26 jforrester@deploy1002: Finished scap: Backport for Add wikifunctions.org to wgCentralNoticeContentSecurityPolicy (T275945) (duration: 07m 52s)
  • 13:20 stevemunene@deploy1002: Finished deploy [airflow-dags/analytics_test@be05071]: (no justification provided) (duration: 00m 03s)
  • 13:20 stevemunene@deploy1002: Started deploy [airflow-dags/analytics_test@be05071]: (no justification provided)
  • 13:20 jforrester@deploy1002: jforrester: Backport for Add wikifunctions.org to wgCentralNoticeContentSecurityPolicy (T275945) synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:18 jforrester@deploy1002: Started scap: Backport for Add wikifunctions.org to wgCentralNoticeContentSecurityPolicy (T275945)
  • 13:17 jforrester@deploy1002: sync-world aborted: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219) (duration: 00m 06s)
  • 13:17 jforrester@deploy1002: Started scap: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219)
  • 13:16 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 13:15 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 13:13 jforrester@deploy1002: Finished scap: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219) (duration: 08m 40s)
  • 13:12 derick@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 13:10 derick@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 13:09 derick@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 13:08 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.mgmt.eqiad.wmnet']
  • 13:08 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.mgmt.eqiad.wmnet']
  • 13:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 13:07 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 13:07 btullis@cumin1001: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 13:07 btullis@cumin1001: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['analytics1073.eqiad.wmnet']
  • 13:07 derick@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 13:06 derick@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 13:06 jforrester@deploy1002: jforrester: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 13:04 derick@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 13:04 jforrester@deploy1002: Started scap: Backport for Follow-up ca3aa70754: Drop 30x30px Notifications icons, unused for 7 years (T147219)
  • 13:00 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-masters (exit_code=0) restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 12:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1074.eqiad.wmnet with OS bullseye
  • 12:42 btullis@cumin1001: START - Cookbook sre.hosts.dhcp for host analytics1073.eqiad.wmnet
  • 12:40 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host analytics1073.eqiad.wmnet with OS bullseye
  • 12:37 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:37 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:37 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:28 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop test cluster: Restart of jvm daemons.
  • 12:23 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:23 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:09 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:09 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:08 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 12:07 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1074.eqiad.wmnet with reason: host reimage
  • 12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1015.eqiad.wmnet
  • 12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 12:04 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1074.eqiad.wmnet with reason: host reimage
  • 12:04 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1015.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 12:01 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 11:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1015.eqiad.wmnet
  • 11:51 jbond@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-f8-eqiad
  • 11:50 jbond@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
  • 11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f8-eqiad
  • 11:39 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
  • 11:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f8-eqiad
  • 11:39 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
  • 11:27 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-f8-eqiad
  • 11:27 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
  • 11:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
  • 11:27 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
  • 11:25 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-f8-eqiad
  • 11:25 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
  • 11:24 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-f8-eqiad
  • 11:24 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
  • 11:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2004.codfw.wmnet
  • 11:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
  • 11:22 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
  • 11:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
  • 11:22 ayounsi@cumin1001: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
  • 11:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1074.eqiad.wmnet with OS bullseye
  • 11:20 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1073.eqiad.wmnet with OS bullseye
  • 11:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
  • 11:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2003.codfw.wmnet
  • 11:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1071.eqiad.wmnet with OS bullseye
  • 11:06 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
  • 11:03 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2002.codfw.wmnet
  • 10:57 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:56 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:55 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:55 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:55 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:55 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:54 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
  • 10:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be2001.codfw.wmnet
  • 10:42 topranks: repool esams after successful move of cr3-knams to new rack T337997
  • 10:41 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 10:40 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:39 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:39 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 10:38 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 10:38 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:38 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 10:35 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 10:32 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1004.eqiad.wmnet
  • 10:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
  • 10:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1003.eqiad.wmnet
  • 10:24 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cr3-knams,cr3-knams IPv6
  • 10:24 cmooney@cumin1001: START - Cookbook sre.hosts.remove-downtime for cr3-knams,cr3-knams IPv6
  • 10:20 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1071.eqiad.wmnet with reason: host reimage
  • 10:17 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1071.eqiad.wmnet with reason: host reimage
  • 10:11 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
  • 10:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1002.eqiad.wmnet
  • 10:05 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
  • 10:02 fabfur: fix last entry: correct CR is https://gerrit.wikimedia.org/r/939242
  • 10:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
  • 10:02 fabfur: enable puppet on A:cp-esams for https://gerrit.wikimedia.org/r/939235 (T340983) (hosts will run puppet with the usual schedule)
  • 10:02 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
  • 10:00 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1071.eqiad.wmnet with OS bullseye
  • 09:52 fabfur: disable puppet on A:cp-esams to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/939242 (T340983)
  • 09:51 arturo: deploying https://gerrit.wikimedia.org/r/c/operations/homer/public/+/938819 via homer to cr-eqiad & cr-codfw
  • 09:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1075.eqiad.wmnet
  • 09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:30 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:29 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:28 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-be1001.eqiad.wmnet
  • 09:28 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:27 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:24 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1075.eqiad.wmnet
  • 09:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1074.eqiad.wmnet
  • 09:24 XioNoX: remove asw-b1-codfw from asw-b-codfw VC - T342076
  • 09:21 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:21 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:20 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
  • 09:17 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:16 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:16 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1074.eqiad.wmnet
  • 09:15 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2073.codfw.wmnet
  • 09:09 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1073.eqiad.wmnet
  • 09:08 ladsgroup@deploy1002: Finished scap: Backport for ores: use envoy proxy for Lift Wing (T319170) (duration: 14m 56s)
  • 09:07 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:02 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1073.eqiad.wmnet
  • 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2073.codfw.wmnet
  • 08:58 fabfur: enable puppet on A:cp-eqiad for https://gerrit.wikimedia.org/r/939235 (T340983) (hosts will run puppet with the usual schedule)
  • 08:57 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores: use envoy proxy for Lift Wing (T319170) synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 08:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2072.codfw.wmnet
  • 08:56 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1072.eqiad.wmnet
  • 08:55 fabfur: disable puppet on A:cp-eqiad to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/939235 (T340983)
  • 08:53 ladsgroup@deploy1002: Started scap: Backport for ores: use envoy proxy for Lift Wing (T319170)
  • 08:48 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1072.eqiad.wmnet
  • 08:48 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2072.codfw.wmnet
  • 08:47 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2071.codfw.wmnet
  • 08:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
  • 08:37 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2071.codfw.wmnet
  • 08:37 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
  • 08:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2070.codfw.wmnet
  • 08:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
  • 08:28 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2070.codfw.wmnet
  • 08:27 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
  • 08:25 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
  • 08:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
  • 08:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
  • 08:17 fabfur: enable puppet on A:cp-drmrs for https://gerrit.wikimedia.org/r/c/operations/puppet/+/938902/ (T340983) (hosts will run puppet with the usual schedule)
  • 08:16 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
  • 08:13 fabfur: disable puppet on A:cp-drmrs to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938902/ (T340983)
  • 08:09 topranks: cr3-knams going offline for move
  • 08:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
  • 08:08 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
  • 07:16 elukey: restart kafka main-codfw rebalances (long maintenance) - T341558
  • 06:48 XioNoX: disable asw-b-codfw:ae0 (to cloudsw1-b1-codfw) - T342076
  • 06:36 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cr3-knams,cr3-knams IPv6 with reason: Downtime cr3-knams ahead of remote hands moving router
  • 06:36 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cr3-knams,cr3-knams IPv6 with reason: Downtime cr3-knams ahead of remote hands moving router

2023-07-17

  • 21:57 btullis@deploy1002: Finished deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint (duration: 02m 10s)
  • 21:55 btullis@deploy1002: Started deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint
  • 21:55 btullis@deploy1002: Finished deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint (duration: 136m 46s)
  • 21:53 bking@cumin1001: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host flink-zk1001.eqiad.wmnet
  • 21:52 bking@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host flink-zk1001.eqiad.wmnet with OS bookworm
  • 21:37 eileen: civicrm upgraded from 2c60d58d to 67c526e7
  • 21:19 jgleeson: payments-wiki upgraded from d76b9085 to c9e298c9
  • 21:16 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 47s)
  • 21:15 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 21:01 bking@cumin1001: START - Cookbook sre.hosts.reimage for host flink-zk1001.eqiad.wmnet with OS bookworm
  • 21:00 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 21:00 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 20:59 bking@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) flink-zk1001.eqiad.wmnet on all recursors
  • 20:59 bking@cumin1001: START - Cookbook sre.dns.wipe-cache flink-zk1001.eqiad.wmnet on all recursors
  • 20:59 bking@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:59 bking@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 20:51 bking@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM flink-zk1001.eqiad.wmnet - bking@cumin1001"
  • 20:43 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:19 taavi: taavi@mwmaint1002 ~ $ echo "https://en.wikipedia.org/static/images/mobile/copyright/wikiquote-wordmark-bn.svg" | mwscript purgeList.php --wiki enwiki
  • 20:13 taavi@deploy1002: Finished scap: Backport for bnwikiquote: Update wordmark (T341910) (duration: 08m 34s)
  • 20:06 taavi@deploy1002: taavi and stang: Backport for bnwikiquote: Update wordmark (T341910) synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 20:05 taavi@deploy1002: Started scap: Backport for bnwikiquote: Update wordmark (T341910)
  • 20:03 bking@cumin1001: START - Cookbook sre.dns.netbox
  • 20:03 bking@cumin1001: START - Cookbook sre.ganeti.makevm for new host flink-zk1001.eqiad.wmnet
  • 19:38 btullis@deploy1002: Started deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint
  • 18:58 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 18:58 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 18:48 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:34 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 02m 41s)
  • 17:31 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 17:19 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 01m 50s)
  • 17:18 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 16:42 urbanecm@deploy1002: Finished scap: Backport for Fix UserDatabaseHelper::hasMainspaceEdits (T341994) (duration: 08m 43s)
  • 16:34 urbanecm@deploy1002: urbanecm: Backport for Fix UserDatabaseHelper::hasMainspaceEdits (T341994) synced to the testservers mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)
  • 16:33 urbanecm@deploy1002: Started scap: Backport for Fix UserDatabaseHelper::hasMainspaceEdits (T341994)
  • 16:29 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 16:29 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 16:28 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 16:12 elukey: stop kafka-main codfw maintenance - T341558
  • 16:08 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 16:08 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 16:07 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 16:05 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 16:05 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 16:04 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 16:04 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
  • 16:02 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
  • 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 15:57 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:56 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 15:55 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
  • 15:50 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:49 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
  • 15:49 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
  • 15:49 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:49 fabfur@cumin1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 15:48 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
  • 15:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
  • 15:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
  • 15:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
  • 15:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 15:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
  • 15:22 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
  • 15:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
  • 15:14 dancy@deploy1002: Installing scap version "4.54.0" for 605 hosts
  • 15:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
  • 15:09 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
  • 15:09 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
  • 15:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
  • 15:04 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
  • 15:02 sukhe: dns5003 upgrade to pdns-rec 4.8.4: T341611
  • 14:59 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
  • 14:57 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:57 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1072.eqiad.wmnet with OS bullseye
  • 14:41 fabfur@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 14:39 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
  • 14:38 fabfur@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 14:36 elukey: restart rsyslog on centrallog1002 ("peer did not provide a certificate, not permitted to talk to it")
  • 14:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
  • 14:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
  • 14:24 klausman@puppetmaster1001: conftool action : set/pooled=yes; selector: name=ores2003.codfw.wmnet
  • 14:21 klausman@puppetmaster1001: conftool action : set/pooled=no; selector: name=ores2003.codfw.wmnet
  • 14:20 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
  • 14:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
  • 14:20 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
  • 14:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
  • 14:17 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 14:16 fabfur@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
  • 14:13 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1072.eqiad.wmnet with reason: host reimage
  • 14:12 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:12 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
  • 14:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
  • 14:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
  • 14:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
  • 14:10 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1072.eqiad.wmnet with reason: host reimage
  • 14:10 elukey: start kafka partitions rebalance for main-codfw (long running maintenance, see https://phabricator.wikimedia.org/T341558)
  • 14:09 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:08 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1003.eqiad.wmnet
  • 14:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1003.eqiad.wmnet
  • 14:03 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
  • 14:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
  • 13:54 lucaswerkmeister-wmde: Deployed security patch for T340217
  • 13:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
  • 13:53 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1072.eqiad.wmnet with OS bullseye
  • 13:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
  • 13:50 btullis@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host analytics1072.eqiad.wmnet with OS bullseye
  • 13:48 fabfur@cumin1001: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 13:47 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
  • 13:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
  • 13:46 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:45 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:43 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
  • 13:43 akosiaris: deploy removal of nutcracker from thumbor. T318695
  • 13:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 13:42 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:42 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:40 fabfur: reimaging cp4037 as preparatory test for knams migration
  • 13:38 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
  • 13:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
  • 13:37 taavi@deploy1002: Finished scap: Backport for NewImpact: fix undefined log function (T341865) (duration: 10m 19s)
  • 13:33 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1072.eqiad.wmnet with OS bullseye
  • 13:32 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
  • 13:28 taavi@deploy1002: taavi and urbanecm: Backport for NewImpact: fix undefined log function (T341865) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:27 taavi: taavi@mwmaint1002 ~ $ mwscript namespaceDupes.php --wiki huwiktionary --fix # T341926
  • 13:27 taavi@deploy1002: Started scap: Backport for NewImpact: fix undefined log function (T341865)
  • 13:26 taavi@deploy1002: Finished scap: Backport for change wgExtraNamespaces , wgNamespaceAliases for mnwwiktionary (T341940), Add appendix namespace aliases on huwiktionary (T341926), robots.txt: Disable indexing draft-related pages on knwiki (T341958) (duration: 19m 48s)
  • 13:25 taavi: taavi@deploy1002 ~ $ mwscript namespaceDupes.php --wiki mnwwiktionary --fix # T341940
  • 13:21 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1002.eqiad.wmnet
  • 13:17 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1002.eqiad.wmnet
  • 13:16 taavi@deploy1002: taavi and anzx: Backport for change wgExtraNamespaces , wgNamespaceAliases for mnwwiktionary (T341940), Add appendix namespace aliases on huwiktionary (T341926), robots.txt: Disable indexing draft-related pages on knwiki (T341958) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqia
  • 13:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
  • 13:13 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it
  • 13:12 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it
  • 13:09 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
  • 13:08 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 13:07 fabfur: enabled puppet on A:cp-codfw to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938840 (T340983) (hosts will run puppet with the usual schedule)
  • 13:06 taavi@deploy1002: Started scap: Backport for change wgExtraNamespaces , wgNamespaceAliases for mnwwiktionary (T341940), Add appendix namespace aliases on huwiktionary (T341926), robots.txt: Disable indexing draft-related pages on knwiki (T341958)
  • 13:04 fabfur: run puppet on cp2027 to deploy https://gerrit.wikimedia.org/r/c/operations/puppet/+/938840 (T340983)
  • 13:03 sukhe: run authdns-update to depool esams
  • 12:58 fabfur: disabled puppet on A:cp-codfw to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938840 (T340983)
  • 12:54 elukey@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:54 elukey@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:53 elukey@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:42 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
  • 12:01 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcontrol1005.wikimedia.org
  • 12:01 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:01 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
  • 11:53 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
  • 11:46 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
  • 11:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
  • 11:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
  • 11:45 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
  • 11:39 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
  • 11:38 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
  • 11:36 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
  • 11:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
  • 11:35 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host analytics1070.eqiad.wmnet with OS bullseye
  • 11:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
  • 11:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
  • 11:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
  • 11:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
  • 11:26 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcontrol1005.wikimedia.org decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
  • 11:23 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
  • 11:18 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
  • 11:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
  • 11:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
  • 11:12 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudcontrol1005.wikimedia.org
  • 11:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on analytics1070.eqiad.wmnet with reason: host reimage
  • 11:10 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
  • 11:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
  • 11:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
  • 11:08 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
  • 11:07 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on analytics1070.eqiad.wmnet with reason: host reimage
  • 10:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
  • 10:54 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
  • 10:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
  • 10:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
  • 10:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host datahubsearch1001.eqiad.wmnet
  • 10:45 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host datahubsearch1001.eqiad.wmnet
  • 10:45 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
  • 10:44 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host analytics1070.eqiad.wmnet with OS bullseye
  • 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
  • 10:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1010.eqiad.wmnet
  • 10:33 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
  • 10:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
  • 10:32 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host stat1010.eqiad.wmnet
  • 10:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2048.codfw.wmnet
  • 10:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2002.codfw.wmnet
  • 10:30 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
  • 10:24 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
  • 10:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
  • 10:23 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2002.codfw.wmnet
  • 10:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
  • 10:21 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
  • 10:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
  • 10:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
  • 10:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2001.codfw.wmnet
  • 10:08 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
  • 10:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2001.codfw.wmnet
  • 09:59 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1002.eqiad.wmnet
  • 09:51 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1002.eqiad.wmnet
  • 09:50 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1001.eqiad.wmnet
  • 09:48 fabfur: enabled puppet on A:cp hosts in ulsfo to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938807 (T340983) (hosts will run puppet with the usual schedule)
  • 09:46 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
  • 09:44 fabfur: disabled puppet on A:cp hosts in ulsfo to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938807 (T340983)
  • 09:43 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1001.eqiad.wmnet
  • 09:42 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kafka-stretch1001.eqiad.wmnet
  • 09:42 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1001.eqiad.wmnet
  • 09:39 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
  • 09:38 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
  • 09:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
  • 09:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
  • 09:30 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
  • 09:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
  • 09:27 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
  • 09:26 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2045.codfw.wmnet
  • 09:22 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
  • 09:19 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1044.eqiad.wmnet
  • 09:18 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 09:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
  • 09:18 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 09:17 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:17 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 09:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
  • 09:02 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
  • 09:01 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
  • 08:51 fabfur: enable puppet on A:cp-eqsin to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938002 (T340983)
  • 08:37 fabfur: enable puppet on cp5024 and cp5032 to deploy 938002
  • 08:30 fabfur: disable puppet on all cp* hosts in eqsin to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/938002 (T340983)
  • 04:33 hashar@deploy1002: Finished deploy [gerrit/gerrit@1153a16]: wm-checks-api: check undefined real_author (2) - T328484 (duration: 00m 08s)
  • 04:33 hashar@deploy1002: Started deploy [gerrit/gerrit@1153a16]: wm-checks-api: check undefined real_author (2) - T328484
  • 04:09 hashar@deploy1002: Finished deploy [gerrit/gerrit@cad3002]: wm-checks-api: check undefined real_author - T328484 (duration: 00m 08s)
  • 04:08 hashar@deploy1002: Started deploy [gerrit/gerrit@cad3002]: wm-checks-api: check undefined real_author - T328484

2023-07-16

  • 23:23 eileen: civicrm upgraded from 562224c1 to 2c60d58d
  • 17:20 apergos: starting rsync of sql/xml dumps files, pulling from dumpsdata1004, running in ariel screen session on dumpsdata1007, bw limited to 1G

2023-07-14

  • 19:57 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:56 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:55 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:55 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host an-worker1153.eqiad.wmnet with OS bullseye
  • 19:19 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@37d3ad6]: Run page_content_change_to_wikitext_raw DAG serially. T335860 (duration: 00m 14s)
  • 19:19 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@37d3ad6]: Run page_content_change_to_wikitext_raw DAG serially. T335860
  • 18:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1153.eqiad.wmnet with OS bullseye
  • 16:05 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 16:04 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:25 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:22 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 10:43 jelto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 10:42 jelto@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 10:41 jelto@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 10:40 jelto@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 10:38 jelto@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 10:38 jelto@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 09:02 klausman@puppetmaster1001: conftool action : set/pooled=inactive; selector: name=ores2003.codfw.wmnet
  • 09:02 klausman: Setting ores2003 to pooled=inactive wheile we attempt repairs/decide on decom
  • 08:51 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 08:48 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 08:48 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 08:47 _joe_: deploying to mw on k8s for T341825
  • 08:47 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 07:16 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 07:13 hashar@deploy1002: Finished deploy [integration/docroot@56b5745]: Add mwbot-rs to doc.wikimedia.org - T341543 (duration: 00m 08s)
  • 07:13 hashar@deploy1002: Started deploy [integration/docroot@56b5745]: Add mwbot-rs to doc.wikimedia.org - T341543
  • 07:12 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 07:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: sync
  • 07:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: sync
  • 07:06 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 07:04 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:04 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 06:24 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:16 oblivian@deploy1002: Started scap: (no justification provided)

2023-07-13

  • 20:59 inflatador: bking@cumin1001 'disable puppet on hosts using zookeeper class T341792'
  • 20:37 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@889e13f]: (no justification provided) (duration: 00m 23s)
  • 20:37 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@889e13f]: (no justification provided)
  • 20:29 taavi@deploy1002: Finished scap: Backport for Avoid calling wfMessage in the hook handler constructor (T339272) (duration: 07m 38s)
  • 20:23 taavi@deploy1002: func and taavi: Backport for Avoid calling wfMessage in the hook handler constructor (T339272) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:22 taavi@deploy1002: Started scap: Backport for Avoid calling wfMessage in the hook handler constructor (T339272)
  • 20:19 taavi@deploy1002: func and taavi: Backport for Avoid calling wfMessage in the hook handler constructor (T339272) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:17 taavi@deploy1002: Started scap: Backport for Avoid calling wfMessage in the hook handler constructor (T339272)
  • 20:16 taavi@deploy1002: Finished scap: Backport for Set default for UseLegacyMediaStyles and disable on officewiki (T318433) (duration: 07m 47s)
  • 20:10 taavi@deploy1002: taavi and arlolra: Backport for Set default for UseLegacyMediaStyles and disable on officewiki (T318433) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:08 taavi@deploy1002: Started scap: Backport for Set default for UseLegacyMediaStyles and disable on officewiki (T318433)
  • 18:11 milimetric@deploy1002: Finished deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint (duration: 03m 39s)
  • 18:08 milimetric@deploy1002: Started deploy [analytics/aqs/deploy@91f8d92] (aqs-next): Deploying new AQS endpoint
  • 18:06 milimetric@deploy1002: Finished deploy [analytics/aqs/deploy@91f8d92]: Deploying new AQS endpoint (duration: 00m 05s)
  • 18:06 milimetric@deploy1002: Started deploy [analytics/aqs/deploy@91f8d92]: Deploying new AQS endpoint
  • 16:40 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 16:40 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 16:15 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 16:14 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 16:13 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 15:56 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 15:56 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 15:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 15:54 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sretest2002
  • 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:33 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2002 decommissioned, removing all IPs except the asset tag one - pt1979@cumin2002"
  • 15:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest2002 decommissioned, removing all IPs except the asset tag one - pt1979@cumin2002"
  • 15:30 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:25 pt1979@cumin2002: START - Cookbook sre.hosts.decommission for hosts sretest2002
  • 15:18 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:17 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:45 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 14:44 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 14:43 elukey: depool ores2003 to allow DCops maintenance work
  • 14:43 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it
  • 14:43 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on ores2003.codfw.wmnet with reason: DCops working on it
  • 14:32 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:32 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:29 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:28 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:28 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:26 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:19 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 14:16 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 13:21 urbanecm: UTC afternoon B&C window done
  • 13:19 urbanecm: Run `mwscript namespaceDupes.php --wiki=mnwwiktionary --fix` (T330689)
  • 13:17 urbanecm@deploy1002: Finished scap: Backport for Create Reconstruction and Rhymes namespaces in mnwwiktionary (T330689) (duration: 09m 46s)
  • 13:09 urbanecm@deploy1002: anzx and urbanecm: Backport for Create Reconstruction and Rhymes namespaces in mnwwiktionary (T330689) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:08 urbanecm@deploy1002: Started scap: Backport for Create Reconstruction and Rhymes namespaces in mnwwiktionary (T330689)
  • 12:47 Lucas_WMDE: Start `mwscript DiscussionTools:persistRevisionThreadItems ruwiki --current --all --start '["10086120"]'; touch ~/T315510-ruwiki-exited-$?` in tmux on mwmaint1002 (T315510)
  • 11:38 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:38 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:32 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1014.eqiad.wmnet
  • 11:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:30 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 11:29 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1014.eqiad.wmnet
  • 10:57 vgutierrez: restarting pybal on lvs1020
  • 10:35 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:34 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:19 fabfur: puppet enabled on cp3052 and cp5017 and new configuration applied (https://gerrit.wikimedia.org/r/c/operations/puppet/+/936701)
  • 10:15 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 10:15 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 10:13 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 10:12 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 10:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 10:11 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:07 fabfur: disable puppet on cp3052 and cp5017 to safely monitor https://gerrit.wikimedia.org/r/c/operations/puppet/+/936701
  • 10:04 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:04 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:03 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:03 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 09:11 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 09:11 elukey: increased kafka partitions for mediawiki.job.cirrusSearchLinksUpdate and mediawiki.job.cirrusSearchLinksUpdate (eqiad/codfw) - T341558
  • 09:10 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:09 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 09:09 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 09:04 XioNoX: update NAT on pfw3-eqiad - T340252
  • 08:14 hashar: Restarting CI Jenkins for plugin installation
  • 07:55 apergos: UTC morning backport and config deployment window done
  • 07:53 ariel@deploy1002: Finished scap: Backport for [idwikiquote] Change the logo and add a wordmark (T341177) (duration: 08m 20s)
  • 07:47 ariel@deploy1002: ariel and superpes: Backport for [idwikiquote] Change the logo and add a wordmark (T341177) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 07:45 ariel@deploy1002: Started scap: Backport for [idwikiquote] Change the logo and add a wordmark (T341177)
  • 07:40 ariel@deploy1002: Finished scap: Backport for [idwikiquote] Change the sitename and the project namespace (T341177) (duration: 09m 43s)
  • 07:32 ariel@deploy1002: ariel and superpes: Backport for [idwikiquote] Change the sitename and the project namespace (T341177) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:31 ariel@deploy1002: Started scap: Backport for [idwikiquote] Change the sitename and the project namespace (T341177)
  • 07:27 ariel@deploy1002: Finished scap: Backport for [mywiki] Create 'autopatrolled' and 'patroller' usergroups (T341026) (duration: 08m 39s)
  • 07:20 ariel@deploy1002: ariel and superpes: Backport for [mywiki] Create 'autopatrolled' and 'patroller' usergroups (T341026) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:18 ariel@deploy1002: Started scap: Backport for [mywiki] Create 'autopatrolled' and 'patroller' usergroups (T341026)
  • 07:14 ariel@deploy1002: Finished scap: Backport for [knwiki] Reverting the temporary logo and updating logo/wordmark/tagline (T338136) (duration: 08m 35s)
  • 07:07 ariel@deploy1002: superpes and ariel: Backport for [knwiki] Reverting the temporary logo and updating logo/wordmark/tagline (T338136) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 07:05 ariel@deploy1002: Started scap: Backport for [knwiki] Reverting the temporary logo and updating logo/wordmark/tagline (T338136)
  • 04:22 eileen: civicrm upgraded from 4521c00a to 562224c1
  • 03:05 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[25,26,27,30,33].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 02:52 eileen: civicrm upgraded from 882e2310 to 4521c00a
  • 02:33 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[25,26,27,30,33].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 01:59 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase10[18,25,26,27,30,33].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 01:50 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[18,25,26,27,30,33].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 01:31 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[17,22,23,24,29,32].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 00:53 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[17,22,23,24,29,32].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 00:52 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase10[17,22,23,24,29,32].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 00:50 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[17,22,23,24,29,32].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 00:47 eileen: config revision changed from ccc33b1e to e3e5a11d - renabled jobs
  • 00:33 eileen: civicrm upgraded from 1bfc3b17 to 882e2310
  • 00:17 eileen: drush @wmff civicrm-upgrade-db
  • 00:08 eileen: config revision changed from 6ac88ea8 to ccc33b1e (I pushed the upgrade code out)

2023-07-12

  • 23:59 eileen: config revision changed from c543419d to ccc33b1e
  • 23:57 eileen: config revision changed from 6ac88ea8 to c543419d
  • 23:47 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase10[16,19,20,21,28,31].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 23:27 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 23:07 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase10[16,19,20,21,28,31].eqiad.wmnet: Applying JVM update - eevans@cumin1001
  • 22:18 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[12,17,18,23,26,27].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 21:49 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:43 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[12,17,18,23,26,27].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 21:33 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[16,20,22,25].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 21:32 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 21:32 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 21:09 TheresNoTime: close UTC late backport window
  • 21:09 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[16,20,22,25].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 21:08 samtar@deploy1002: Finished scap: Backport for ruwikibooks: Add NS104 to wgNamespacesToBeSearchedDefault (T341708) (duration: 08m 10s)
  • 21:02 samtar@deploy1002: stang and samtar: Backport for ruwikibooks: Add NS104 to wgNamespacesToBeSearchedDefault (T341708) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 21:00 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 21:00 samtar@deploy1002: Started scap: Backport for ruwikibooks: Add NS104 to wgNamespacesToBeSearchedDefault (T341708)
  • 21:00 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 20:59 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase20[15-16,20,22,25].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:59 samtar@deploy1002: Finished scap: Backport for Fix mediawiki.special_diff_interactions configuration (duration: 08m 47s)
  • 20:52 samtar@deploy1002: samtar and urbanecm: Backport for Fix mediawiki.special_diff_interactions configuration synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:50 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[15-16,20,22,25].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:50 samtar@deploy1002: Started scap: Backport for Fix mediawiki.special_diff_interactions configuration
  • 20:49 samtar@deploy1002: Finished scap: Backport for log additional events on Special:Diff|MobileDiff (T326212) (duration: 26m 41s)
  • 20:48 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[14,21,24].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:29 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[14,21,24].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:29 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase20[13,19].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:24 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 20:24 samtar@deploy1002: jsn and samtar: Backport for log additional events on Special:Diff|MobileDiff (T326212) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:23 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 20:22 samtar@deploy1002: Started scap: Backport for log additional events on Special:Diff|MobileDiff (T326212)
  • 20:21 samtar@deploy1002: Finished scap: Backport for Fix Error: Module "./ext.pageTriage.defaultTagsOptions.js" is not loaded (T340112) (duration: 09m 27s)
  • 20:20 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 20:20 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 20:20 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 20:20 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 20:17 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[13,19].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:13 samtar@deploy1002: samtar: Backport for Fix Error: Module "./ext.pageTriage.defaultTagsOptions.js" is not loaded (T340112) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:11 samtar@deploy1002: Started scap: Backport for Fix Error: Module "./ext.pageTriage.defaultTagsOptions.js" is not loaded (T340112)
  • 20:10 samtar@deploy1002: Finished scap: Backport for [ruwiki] Add permissions to 'editor' usergroup (T341707) (duration: 08m 04s)
  • 20:08 eevans@cumin1001: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase2013.codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 20:04 samtar@deploy1002: samtar: Backport for [ruwiki] Add permissions to 'editor' usergroup (T341707) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:02 samtar@deploy1002: Started scap: Backport for [ruwiki] Add permissions to 'editor' usergroup (T341707)
  • 19:57 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase2013.codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 18:39 dduvall@deploy1002: Synchronized php: group1 wikis to 1.41.0-wmf.17 refs T340245 (duration: 06m 16s)
  • 18:33 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.17 refs T340245
  • 18:24 dduvall@deploy1002: Finished scap: Backport for QueryMessageGroupActionApi: Apply sorting to groups only (T341627) (duration: 08m 22s)
  • 18:17 dduvall@deploy1002: abi and dduvall: Backport for QueryMessageGroupActionApi: Apply sorting to groups only (T341627) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 18:16 dduvall@deploy1002: Started scap: Backport for QueryMessageGroupActionApi: Apply sorting to groups only (T341627)
  • 17:10 sukhe: restart pybal on lvs1018
  • 17:07 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum6001.drmrs.wmnet with OS bullseye
  • 16:59 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 16:59 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 16:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 16:47 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 16:42 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-a,name=dbproxy1018.eqiad.wmnet
  • 16:42 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-a,name=dbproxy1019.eqiad.wmnet
  • 16:41 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 16:41 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dbproxy1013.eqiad.wmnet
  • 16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 16:38 ladsgroup@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dbproxy1013.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ladsgroup@cumin1001"
  • 16:37 btullis@puppetmaster1001: conftool action : set/pooled=no; selector: service=wikireplicas-b,name=dbproxy1019.eqiad.wmnet
  • 16:37 btullis@puppetmaster1001: conftool action : set/pooled=yes; selector: service=wikireplicas-b,name=dbproxy1018.eqiad.wmnet
  • 16:35 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 16:32 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@a0e00cb] (releasing): (no justification provided) (duration: 00m 58s)
  • 16:31 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@a0e00cb] (releasing): (no justification provided)
  • 16:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1013.eqiad.wmnet
  • 16:21 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bullseye
  • 16:21 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host durum6001.drmrs.wmnet with OS bookworm
  • 16:01 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 15:57 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on durum6001.drmrs.wmnet with reason: host reimage
  • 15:43 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:43 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:42 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:42 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:35 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:34 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:11 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 15:08 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host durum6001.drmrs.wmnet with OS bookworm
  • 15:07 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:05 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
  • 15:03 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:49 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Temporarily allow OAuth on non-API entry points again (T341656) (duration: 08m 03s)
  • 14:48 sukhe: upgrade dns2004 to gdnsd 3.99.0~alpha2
  • 14:42 lucaswerkmeister-wmde@deploy1002: tgr and lucaswerkmeister-wmde: Backport for Temporarily allow OAuth on non-API entry points again (T341656) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:41 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Temporarily allow OAuth on non-API entry points again (T341656)
  • 14:17 btullis@cumin1001: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 14:11 btullis@cumin1001: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
  • 14:07 sukhe: dns4003: upgrade to pdns-rec 4.8.4: T341611
  • 13:59 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:57 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:56 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:56 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:46 sukhe: doh6001: upgrade to pdns-rec 4.8.4: T341611
  • 13:44 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 13:43 jiji@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 13:42 sukhe: reprepro -C main include bullseye-wikimedia pdns-recursor_4.8.4-1+wmf11u1_amd64.changes: T341611
  • 13:40 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:38 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Add new campaign_events.event_answers_status column (T341142) (duration: 07m 59s)
  • 13:34 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:31 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 13:31 lucaswerkmeister-wmde@deploy1002: daimona and lucaswerkmeister-wmde: Backport for Add new campaign_events.event_answers_status column (T341142) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:30 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Add new campaign_events.event_answers_status column (T341142)
  • 13:29 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Html: Support more attr types in getTextInputAttributes() (T341566) (duration: 07m 40s)
  • 13:28 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 47s)
  • 13:27 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 13:27 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 02m 30s)
  • 13:24 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 13:24 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 02m 09s)
  • 13:23 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for Html: Support more attr types in getTextInputAttributes() (T341566) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:22 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 13:21 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Html: Support more attr types in getTextInputAttributes() (T341566)
  • 13:06 moritzm: installing node-tough-cookie security updates
  • 12:54 moritzm: rebalance ganeti codfw/D following reboots
  • 12:52 moritzm: imported wikidiff2 1.14.1-0+wmf1+buster1+icu67u1 to component/icu67 T340087 T329491
  • 12:44 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 12:30 akosiaris: upgrade wikidiff2 1.13.0-1+wmf1+buster1 -> 1.14.1-0+wmf1+buster1 on mw-canary hosts T340087
  • 12:11 mvernon@cumin2002: conftool action : set/pooled=no; selector: name=thanos-fe1002.eqiad.wmnet,service=thanos-web
  • 11:50 moritzm: installing apache2 security updates on Bullseye
  • 11:48 moritzm: installing wireshark security updates
  • 11:44 moritzm: rebalance ganeti codfw/C following reboots
  • 11:43 hnowlan: rebuilding fluent-bit image to include wmf-certificates
  • 11:33 ariel@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
  • 11:26 ariel@cumin1001: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
  • 11:00 ladsgroup@deploy1002: Finished scap: Backport for fix: add request headers properly (T319170) (duration: 10m 20s)
  • 10:51 ladsgroup@deploy1002: ladsgroup: Backport for fix: add request headers properly (T319170) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 10:50 ladsgroup@deploy1002: Started scap: Backport for fix: add request headers properly (T319170)
  • 10:49 ladsgroup@deploy1002: Finished scap: Backport for Externallinks: Keep domain wildcard if path is not specified (T326251) (duration: 08m 09s)
  • 10:42 ladsgroup@deploy1002: ladsgroup: Backport for Externallinks: Keep domain wildcard if path is not specified (T326251) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 10:40 ladsgroup@deploy1002: Started scap: Backport for Externallinks: Keep domain wildcard if path is not specified (T326251)
  • 09:35 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:35 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:34 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:33 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:55 moritzm: move secondary instances away from ganeti2014 T341546
  • 07:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0)
  • 07:29 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 07:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0)
  • 07:26 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:55 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
  • 06:55 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0)
  • 06:47 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:45 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.tls (exit_code=0)
  • 06:45 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:37 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
  • 06:37 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:18 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
  • 06:18 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:16 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
  • 06:16 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:11 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
  • 06:11 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 06:06 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.tls (exit_code=99)
  • 06:06 ayounsi@cumin1001: START - Cookbook sre.network.tls
  • 04:36 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 04:35 tchin@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 03:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 03:41 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 03:39 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 03:29 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 03:29 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 03:16 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 03:15 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 03:14 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 03:14 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 01:47 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 19:00:00 on wdqs[2013,2022].codfw.wmnet with reason: new host
  • 01:46 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 6 days, 19:00:00 on wdqs[2013,2022].codfw.wmnet with reason: new host

2023-07-11

  • 21:51 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:45 urbanecm@deploy1002: Finished scap: Backport for Logos: Fixes grantswiki and idwiktionary, Drop idwiktionary wordmark, Always return the class as string from Html::getTextInputAttributes (T341566) (duration: 11m 10s)
  • 20:35 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Logos: Fixes grantswiki and idwiktionary, Drop idwiktionary wordmark, Always return the class as string from Html::getTextInputAttributes (T341566) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:33 urbanecm@deploy1002: Started scap: Backport for Logos: Fixes grantswiki and idwiktionary, Drop idwiktionary wordmark, Always return the class as string from Html::getTextInputAttributes (T341566)
  • 20:32 urbanecm@deploy1002: Sync cancelled.
  • 20:26 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Logos: Fixes grantswiki and idwiktionary synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:25 urbanecm@deploy1002: Started scap: Backport for Logos: Fixes grantswiki and idwiktionary
  • 20:16 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 20:14 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 19:49 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:48 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 18:57 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.17 refs T340245
  • 18:46 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:53 dduvall@deploy1002: Pruned MediaWiki: 1.41.0-wmf.15 (duration: 02m 16s)
  • 17:50 dduvall@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.17 refs T340245 (duration: 45m 50s)
  • 17:05 dduvall@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.17 refs T340245
  • 16:52 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 16:28 vgutierrez: reenabling puppet in cp6002
  • 16:24 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 16:08 sukhe: upgrade dns1004 to gdnsd 3.99.0~alpha2
  • 16:04 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching restbase20[13-27].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 16:03 Lucas_WMDE: previous backport also included Remove oversampling for Navigation Timing extension. (T337858)
  • 16:02 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Add option for html label in Menu template (T340217) (duration: 09m 15s)
  • 15:54 lucaswerkmeister-wmde@deploy1002: jdlrobson and lucaswerkmeister-wmde: Backport for Add option for html label in Menu template (T340217) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 15:54 Krinkle: Deployed https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/930712 ("Remove oversampling for Navigation Timing extension.")
  • 15:53 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Add option for html label in Menu template (T340217)
  • 15:48 krinkle@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: pending security problem, see mediawiki_security IRC (duration: 17m 03s)
  • 15:31 krinkle@deploy1002: Locking from deployment [ALL REPOSITORIES]: pending security problem, see mediawiki_security IRC
  • 15:26 krinkle@deploy1002: Sync cancelled.
  • 15:24 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase20[13-27].codfw.wmnet: Applying JVM update - eevans@cumin1001
  • 15:22 krinkle@deploy1002: phedenskog and krinkle: Backport for Remove oversampling for Navigation Timing extension. (T337858) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 15:21 krinkle@deploy1002: Started scap: Backport for Remove oversampling for Navigation Timing extension. (T337858)
  • 15:17 eevans@cumin1001: END (FAIL) - Cookbook sre.cassandra.roll-restart (exit_code=99) for nodes matching A:restbase-codfw: Applying JVM update - eevans@cumin1001
  • 15:09 eevans@cumin1001: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Applying JVM update - eevans@cumin1001
  • 14:49 btullis@cumin1001: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 14:21 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 14:19 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-mirror-maker (exit_code=0) restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 14:17 moritzm: restarting apache on mw canaries
  • 14:17 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 14:15 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 14:12 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 14:02 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:00 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-mirror-maker restart MirrorMaker for Kafka A:kafka-mirror-maker-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 13:59 moritzm: installing yajl security updates
  • 13:59 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:57 btullis@cumin1001: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 13:49 moritzm: rebalance ganeti group eqiad/d after reboots
  • 13:42 jgiannelos@deploy1002: Finished deploy [restbase/deploy@930f075]: (no justification provided) (duration: 19m 50s)
  • 13:33 urbanecm@deploy1002: Finished scap: Backport for Enable tabs for non loggedin mobile users on knwikisource (T340276) (duration: 11m 33s)
  • 13:23 urbanecm@deploy1002: urbanecm and anzx: Backport for Enable tabs for non loggedin mobile users on knwikisource (T340276) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:22 jgiannelos@deploy1002: Started deploy [restbase/deploy@930f075]: (no justification provided)
  • 13:21 urbanecm@deploy1002: Started scap: Backport for Enable tabs for non loggedin mobile users on knwikisource (T340276)
  • 13:21 urbanecm@deploy1002: Finished scap: Backport for Growth: Increase mentorship percentage to 25% on enwiki (T341399) (duration: 07m 15s)
  • 13:14 urbanecm@deploy1002: Started scap: Backport for Growth: Increase mentorship percentage to 25% on enwiki (T341399)
  • 13:13 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137) (duration: 09m 45s)
  • 13:05 urbanecm@deploy1002: sgimeno and urbanecm: Backport for GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:03 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: Enable backend of link recommendation 10, 11, 12th round wikis (T308135 T308136 T308137)
  • 13:00 jbond@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 12:59 jbond@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 12:59 jbond@cumin1001: END (ERROR) - Cookbook sre.postgresql.postgres-init (exit_code=97)
  • 12:53 jbond@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 12:00 XioNoX: decom datahop in knams - T340049
  • 11:42 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/datahub: sync on main
  • 11:38 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/datahub: apply on main
  • 11:37 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 11:27 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 11:17 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/datahub: sync on main
  • 11:06 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/datahub: apply on main
  • 10:46 moritzm: installing libx11 security updates
  • 10:44 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:44 ladsgroup@deploy1002: Sync cancelled.
  • 10:39 ladsgroup@deploy1002: ladsgroup: Backport for ExternalLinks: Make oneWildcard avoid adding wildcard to domain (T326251) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 10:38 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 10:37 ladsgroup@deploy1002: Started scap: Backport for ExternalLinks: Make oneWildcard avoid adding wildcard to domain (T326251)
  • 10:19 moritzm: rebalance ganeti group codfw/C after reboots
  • 10:03 ladsgroup@deploy1002: Finished scap: Backport for Override liftwing hostname (T319170) (duration: 14m 34s)
  • 09:52 ladsgroup@deploy1002: ladsgroup: Backport for Override liftwing hostname (T319170) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:52 jbond: disable puppet fleet wide to deploy 936273
  • 09:49 ladsgroup@deploy1002: Started scap: Backport for Override liftwing hostname (T319170)
  • 09:47 jbond: renable puppet
  • 09:43 hashar: Updating Zuul configuration which was stall to a version from March 29th after the switchover from contint2001 to contint2002 | T324659 T341556
  • 09:36 jbond: deploy gerrit:936273 enable submitting data to puppetdb7
  • 09:30 jbond: disable puppet fleet wide to deploy 936273
  • 09:08 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host kafkamon1003.eqiad.wmnet
  • 09:06 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 09:06 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 09:06 jayme: enabled puppet on 'P{R:Package = envoyproxy}'
  • 09:01 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 09:01 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 08:59 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 08:59 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host kafkamon1003.eqiad.wmnet
  • 08:59 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 08:43 volans: previous downtiming completed
  • 08:40 volans: downtiming service 'Check no envoy runtime configuration is left persistent' on envoy hosts
  • 08:39 jayme: disabled puppet on 'P{R:Package = envoyproxy}'
  • 08:19 godog: upgrade prometheus to 2.24.1+ds-1+wmf2 on cloudmetrics*
  • 08:03 hashar: Stopping Jenkins and Zuul for server switch over
  • 08:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on contint2002.wikimedia.org with reason: Switch contint hosts for hardware replacement
  • 08:01 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on contint2002.wikimedia.org with reason: Switch contint hosts for hardware replacement
  • 08:01 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on contint2001.wikimedia.org with reason: Switch contint hosts for hardware replacement
  • 08:01 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on contint2001.wikimedia.org with reason: Switch contint hosts for hardware replacement
  • 07:55 kart_: Updated MinT to 2023-07-10-051738-production (T341335, T333969)
  • 07:54 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 07:49 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 07:47 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 07:42 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 07:38 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 07:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 07:36 moritzm: failover broken ganeti2014 node
  • 07:28 moritzm: powercycle ganeti2014
  • 07:22 moritzm: installing libxpm security updates
  • 07:08 moritzm: rebalance ganeti in drmrs after reboots
  • 06:59 elukey: restart kube-apiserver on ml-serve-ctrl1* as attempt to resolve spikes in latencies
  • 06:36 moritzm: rebalance ganeti group eqiad/B after reboots
  • 05:24 rzl: imported otelcol-contrib 0.81.0 to buster-wikimedia and bullseye-wikimedia in component thirdparty/otelcol-contrib
  • 04:34 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 02:05 mutante: LDAP - added urbanecm to wmf group, removed from nda group (conversion volunteer to staff) T341443

2023-07-10

  • 23:11 Krinkle: krinkle@xhgui1001$ Define new `xhgui.watches` table via xhguiadmin@m2-master.eqiad.wmnet database, ref T341499
  • 22:12 maryum: Deployed security patch for T340200
  • 21:42 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 21:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 21:37 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 52s)
  • 21:36 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 20:46 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 20:43 TheresNoTime: close UTC late backport window
  • 20:42 samtar@deploy1002: Finished scap: Backport for Revert "log additional events on Special:Diff|MobileDiff" (duration: 07m 27s)
  • 20:42 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 20:36 samtar@deploy1002: samtar: Backport for Revert "log additional events on Special:Diff|MobileDiff" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 20:34 samtar@deploy1002: Started scap: Backport for Revert "log additional events on Special:Diff|MobileDiff"
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49544 and previous config saved to /var/cache/conftool/dbconfig/20230710-202536-ladsgroup.json
  • 20:23 samtar@deploy1002: Finished scap: Backport for log additional events on Special:Diff|MobileDiff (T326212) (duration: 21m 42s)
  • 20:23 inflatador: bking@wdqs1006 Restart wdqs-blazegraph to hopefully clear the free allocators alerts
  • 20:19 TheresNoTime: syncing https://gerrit.wikimedia.org/r/c/936748 untested (T326212) for test after sync
  • 20:14 mutante: miscweb1003/miscweb2003 - rm -rf /srv/org/wikimedia/static-tendril
  • 20:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49541 and previous config saved to /var/cache/conftool/dbconfig/20230710-201031-ladsgroup.json
  • 20:07 eileen: civicrm upgraded from 0ddd1a51 to 7caf5274
  • 20:03 samtar@deploy1002: samtar and jsn: Backport for log additional events on Special:Diff|MobileDiff (T326212) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:02 samtar@deploy1002: Started scap: Backport for log additional events on Special:Diff|MobileDiff (T326212)
  • 20:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1124.eqiad.wmnet with reason: Reboot
  • 19:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1124.eqiad.wmnet with reason: Reboot
  • 19:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49540 and previous config saved to /var/cache/conftool/dbconfig/20230710-195527-ladsgroup.json
  • 19:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 19:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 19:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2112 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49538 and previous config saved to /var/cache/conftool/dbconfig/20230710-194022-ladsgroup.json
  • 19:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 19:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 19:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2112.codfw.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2112 T341511', diff saved to https://phabricator.wikimedia.org/P49537 and previous config saved to /var/cache/conftool/dbconfig/20230710-191511-ladsgroup.json
  • 19:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2103 to s1 primary T341511', diff saved to https://phabricator.wikimedia.org/P49536 and previous config saved to /var/cache/conftool/dbconfig/20230710-191259-ladsgroup.json
  • 19:12 Amir1: Starting s1 codfw failover from db2112 to db2103 - T341511
  • 18:59 sukhe: running authdns-update
  • 18:57 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbproxy1012.eqiad.wmnet
  • 18:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:56 sukhe: finished commissionioning new DNS hosts in eqiad: dns100[4-6]. decomissioned dns100[1-3].
  • 18:55 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 18:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1012.eqiad.wmnet
  • 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns[1002-1003].wikimedia.org
  • 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:50 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns[1002-1003].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:49 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dbproxy1012.eqiad.wmnet
  • 18:49 ladsgroup@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:49 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns[1002-1003].wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 18:46 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2103 with weight 0 T341511', diff saved to https://phabricator.wikimedia.org/P49535 and previous config saved to /var/cache/conftool/dbconfig/20230710-184521-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 37 hosts with reason: Primary switchover s1 T341511
  • 18:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 37 hosts with reason: Primary switchover s1 T341511
  • 18:43 ladsgroup@cumin1001: START - Cookbook sre.dns.netbox
  • 18:38 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns[1002-1003].wikimedia.org
  • 18:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.decommission for hosts dbproxy1012.eqiad.wmnet
  • 18:32 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 18:31 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 18:29 dzahn@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 18:26 dzahn@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 18:03 dzahn@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:55 dzahn@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:51 sukhe: homer "mr*" commit "update ntp_servers (remove dns100[2-3], add dns100[5-6])"
  • 17:26 dzahn@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:24 dzahn@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 16:52 sukhe: rolling restart of ntp.service on A:dns-rec
  • 16:44 sukhe: homer "cr*-eqiad*" commit "Gerrit: 936757 remove DNS hosts dns1002 and dns1003"
  • 16:26 sukhe: ns0: set routing-options static route 208.80.154.238/32 next-hop [ 208.80.154.6 208.80.154.153 208.80.154.77 ]
  • 16:26 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@8fa416b]: T328276: Change articletopic source to the outlink model (duration: 00m 20s)
  • 16:25 ebernhardson@deploy1002: Started deploy [airflow-dags/search@8fa416b]: T328276: Change articletopic source to the outlink model
  • 16:07 taavi@deploy1002: Finished scap: Backport for wikitech: Update codfw1dev LDAP server hostname, Disable UrlShortener on wikitech (T341470) (duration: 07m 47s)
  • 16:00 taavi@deploy1002: taavi: Backport for wikitech: Update codfw1dev LDAP server hostname, Disable UrlShortener on wikitech (T341470) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 15:59 taavi@deploy1002: Started scap: Backport for wikitech: Update codfw1dev LDAP server hostname, Disable UrlShortener on wikitech (T341470)
  • 15:53 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 30s)
  • 15:46 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 31s)
  • 15:30 sukhe: homer "cr*-eqiad*" commit "Gerrit: 936720 add new DNS host dns1006"
  • 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
  • 15:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
  • 15:25 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1006.wikimedia.org with OS bullseye
  • 15:25 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:23 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
  • 15:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
  • 15:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
  • 15:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
  • 15:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1006.wikimedia.org with reason: host reimage
  • 15:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1006.wikimedia.org with reason: host reimage
  • 15:00 moritzm: rebalance ganeti group eqiad/A after reboots
  • 14:57 tchin@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:57 tchin@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
  • 14:51 tchin@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:51 tchin@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:49 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1006.wikimedia.org with OS bullseye
  • 14:46 tchin@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:46 tchin@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:33 fabfur: add new dns host dns1005
  • 14:28 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:28 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:27 sukhe@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "running manually for dns1005 - sukhe@cumin1001"
  • 14:26 sukhe@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "running manually for dns1005 - sukhe@cumin1001"
  • 14:23 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1005.wikimedia.org with OS bullseye
  • 14:23 sukhe@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1001"
  • 14:22 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:22 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:22 sukhe@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin1001"
  • 14:19 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:19 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:15 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:15 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:10 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:10 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:04 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host contint2002.wikimedia.org
  • 14:02 sukhe@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1005.wikimedia.org with reason: host reimage
  • 14:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
  • 14:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
  • 14:01 ladsgroup@deploy1002: Finished scap: Backport for Set commons to READ_NEW for externallinks migration (T335343) (duration: 09m 22s)
  • 13:59 sukhe@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1005.wikimedia.org with reason: host reimage
  • 13:58 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host contint2002.wikimedia.org
  • 13:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
  • 13:53 ladsgroup@deploy1002: ladsgroup: Backport for Set commons to READ_NEW for externallinks migration (T335343) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:52 btullis@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host karapace1002.eqiad.wmnet
  • 13:52 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host karapace1002.eqiad.wmnet with OS bullseye
  • 13:51 ladsgroup@deploy1002: Started scap: Backport for Set commons to READ_NEW for externallinks migration (T335343)
  • 13:47 sukhe@cumin1001: START - Cookbook sre.hosts.reimage for host dns1005.wikimedia.org with OS bullseye
  • 13:46 ladsgroup@deploy1002: Finished scap: Backport for ExternalLinks: Make order by and continue only rely on el_id in READ NEW (T341000 T47237) (duration: 11m 03s)
  • 13:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
  • 13:39 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on karapace1002.eqiad.wmnet with reason: host reimage
  • 13:36 ladsgroup@deploy1002: ladsgroup: Backport for ExternalLinks: Make order by and continue only rely on el_id in READ NEW (T341000 T47237) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:36 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on karapace1002.eqiad.wmnet with reason: host reimage
  • 13:35 ladsgroup@deploy1002: Started scap: Backport for ExternalLinks: Make order by and continue only rely on el_id in READ NEW (T341000 T47237)
  • 13:28 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wdqs2020.codfw.wmnet
  • 13:28 ladsgroup@deploy1002: Finished scap: Backport for ores extension: deploy LiftWing usage on testwiki (T319170) (duration: 09m 02s)
  • 13:27 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 13:27 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 13:27 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host karapace1002.eqiad.wmnet with OS bullseye
  • 13:22 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM karapace1002.eqiad.wmnet - btullis@cumin1001"
  • 13:21 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM karapace1002.eqiad.wmnet - btullis@cumin1001"
  • 13:21 btullis@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) karapace1002.eqiad.wmnet on all recursors
  • 13:21 btullis@cumin1001: START - Cookbook sre.dns.wipe-cache karapace1002.eqiad.wmnet on all recursors
  • 13:21 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:21 btullis@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM karapace1002.eqiad.wmnet - btullis@cumin1001"
  • 13:20 btullis@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM karapace1002.eqiad.wmnet - btullis@cumin1001"
  • 13:20 ladsgroup@deploy1002: isaranto and ladsgroup: Backport for ores extension: deploy LiftWing usage on testwiki (T319170) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:19 ladsgroup@deploy1002: Started scap: Backport for ores extension: deploy LiftWing usage on testwiki (T319170)
  • 13:16 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on group2 wikis (T314318) (duration: 10m 26s)
  • 13:16 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 13:16 btullis@cumin1001: START - Cookbook sre.ganeti.makevm for new host karapace1002.eqiad.wmnet
  • 13:07 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and arlolra: Backport for Disable wgParserEnableLegacyMediaDOM on group2 wikis (T314318) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on group2 wikis (T314318)
  • 12:34 claime: Running puppet on cp-text hosts - T341463
  • 12:33 claime: Sending 1% of global traffic to mw-on-k8s - T341463
  • 12:04 moritzm: failover ganeti masters in drmrs
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 11:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 11:55 moritzm: installing avahi security updates
  • 11:52 vgutierrez: repool cp2037 (debugging finished) - T320967
  • 11:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
  • 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
  • 11:34 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 11:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
  • 11:28 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:23 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 28 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 11:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 28 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP
  • 11:14 moritzm: remove unused VM netflow6002 T330884
  • 11:11 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti6003.drmrs.wmnet
  • 11:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
  • 10:55 moritzm: failover ganeti master in eqiad to ganeti1029
  • 10:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
  • 10:50 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 10:50 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 10:49 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb1002.eqiad.wmnet with OS bullseye
  • 10:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
  • 10:45 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 10:44 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: sync
  • 10:44 elukey@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: sync
  • 10:43 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for cp2037.codfw.wmnet
  • 10:43 vgutierrez@cumin1001: START - Cookbook sre.hosts.remove-downtime for cp2037.codfw.wmnet
  • 10:34 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb1002.eqiad.wmnet with reason: host reimage
  • 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
  • 10:25 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb1002.eqiad.wmnet with reason: host reimage
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
  • 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
  • 10:13 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1002.eqiad.wmnet with OS bullseye
  • 10:12 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=parsoid,name=parse1012.*
  • 10:12 claime: repooling parse1012.eqiad.wmnet
  • 10:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 10:11 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb1002.eqiad.wmnet with OS bullseye
  • 10:05 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1002.eqiad.wmnet with OS bullseye
  • 10:03 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 10:03 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin1001"
  • 10:02 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - aborrero@cumin1001"
  • 09:53 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1033.eqiad.wmnet
  • 09:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 09:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1033.eqiad.wmnet
  • 09:50 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp2037.codfw.wmnet with reason: vgutierrez debugging
  • 09:50 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on cp2037.codfw.wmnet with reason: vgutierrez debugging
  • 09:44 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb1002.private.eqiad.wikimedia.cloud on all recursors
  • 09:44 aborrero@cumin1001: START - Cookbook sre.dns.wipe-cache cloudlb1002.private.eqiad.wikimedia.cloud on all recursors
  • 09:39 moritzm: rebalance ganeti group codfw/B after reboots
  • 09:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudlb1001.eqiad.wmnet with reason: host reimage
  • 09:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 09:35 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudlb1001.eqiad.wmnet with reason: host reimage
  • 09:35 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:35 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb1002 - aborrero@cumin1001"
  • 09:33 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb1002 - aborrero@cumin1001"
  • 09:31 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
  • 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
  • 09:29 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:29 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb1001 - aborrero@cumin1001"
  • 09:28 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb1001 - aborrero@cumin1001"
  • 09:25 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 09:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
  • 09:23 moritzm: rebalance ganeti group codfw/A after reboots
  • 09:14 vgutierrez: depool cp2037 (debugging ATS cacheability issues) - T320967
  • 09:12 moritzm: restarting mw canaries to pick up libxpm security update
  • 09:08 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-eqiad
  • 09:07 moritzm: installing cups security updates (libs only)
  • 09:06 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 09:04 moritzm: rebalance ganeti clusters in esams/ulsfo/eqsin following reboots
  • 09:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
  • 08:58 lucaswerkmeister-wmde:: Deployed security patch for T340220
  • 08:57 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-eqiad
  • 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.aqs.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:aqs-codfw
  • 08:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
  • 08:48 moritzm: installing libxpm security updates
  • 08:47 kart_: Updated cxserver to 2023-07-10-065135-production (T337719, T340989)
  • 08:46 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 08:45 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 08:44 jmm@cumin2002: START - Cookbook sre.aqs.roll-restart-reboot rolling restart_daemons on A:aqs-codfw
  • 08:41 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 08:40 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 08:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1030.eqiad.wmnet
  • 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
  • 08:27 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 08:27 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 08:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
  • 08:24 claime: Running puppet on cp-text hosts - T337489
  • 08:11 hashar: UTC morning backport window completed.
  • 08:11 hashar@deploy1002: Finished scap: Backport for Deploy action blocks on bnwiki (T340904) (duration: 08m 15s)
  • 08:04 moritzm: installing c-ares security updates on buster
  • 08:04 hashar@deploy1002: hashar and mdsshakil: Backport for Deploy action blocks on bnwiki (T340904) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:03 hashar@deploy1002: Started scap: Backport for Deploy action blocks on bnwiki (T340904)
  • 08:02 hashar@deploy1002: Finished scap: Backport for thwiki: Update logos from commons (T341407) (duration: 25m 32s)
  • 08:00 moritzm: installing flask security updates on bullseye
  • 07:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 07:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 07:45 hashar@deploy1002: func and hashar: Backport for thwiki: Update logos from commons (T341407) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 07:36 hashar@deploy1002: Started scap: Backport for thwiki: Update logos from commons (T341407)
  • 07:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
  • 07:30 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 07:30 moritzm: installing libgstreamer-plugins-base1.0-0 security updates
  • 07:29 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 07:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
  • 07:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
  • 07:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
  • 07:22 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 07:21 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 07:21 hashar: deploy1002: removed empty untracked directory from MediaWiki staging area: `rmdir /srv/mediawiki-staging/wmf-config/scap/log/ && rmdir /srv/mediawiki-staging/wmf-config/scap/` | T341292
  • 07:20 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 07:20 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
  • 07:02 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
  • 07:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
  • 07:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 06:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
  • 06:43 godog: add 100G to prometheus/k8s in codfw
  • 01:06 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 01:06 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply

2023-07-09

  • 14:51 apergos: swapped dumpsdata1003 in as the new nfs share for misc dumps; dumpsdata1002 is now a spare, to be decommissioned. 1003 is running bullseye.
  • 04:04 apergos: rsync misc dumps output files from dumpsdata1002 to 1003, in ariel screen session on 1003, bwlimit to 1G

2023-07-08

  • 03:21 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply

2023-07-07

  • 22:55 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 22:55 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 22:41 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 22:21 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 22:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1156.eqiad.wmnet with OS bullseye
  • 21:59 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 21:24 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 57s)
  • 21:23 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 21:23 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 20:53 dwisehaupt@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:53 dwisehaupt@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"
  • 20:52 dwisehaupt@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1001"
  • 20:50 dwisehaupt@cumin1001: START - Cookbook sre.dns.netbox
  • 20:44 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host an-worker1156.eqiad.wmnet with OS bullseye
  • 19:33 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:33 bking@cumin1001: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 19:32 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 19:12 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:11 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 18:08 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 17:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 17:57 pt1979@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 17:56 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 17:40 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 16:44 pt1979@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 16:38 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wdqs2020.codfw.wmnet
  • 16:23 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:20 hashar: Restarting CI Jenkins due to a confusion in the next build number leading to intermittent 404 when browsing console links | T341348
  • 16:00 bking@cumin1001: conftool action : set/pooled=no; selector: name=wdqs2020.codfw.wmnet
  • 15:53 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudlb1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:51 bking@cumin1001: conftool action : set/pooled=yes; selector: name=wdqs2020.codfw.wmnet
  • 15:50 bking@cumin1001: conftool action : set/weight=10; selector: name=wdqs2020.codfw.wmnet
  • 15:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:47 aborrero@cumin1001: START - Cookbook sre.hosts.provision for host cloudlb1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:46 bking@cumin1001: conftool action : set/pooled=yes; selector: service=(wdqs|wdqs-ssl|wdqs-heavy-queries),name=wdqs2020.codfw.wmnet
  • 15:45 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:43 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 15:33 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:30 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:05 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 50s)
  • 15:04 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 14:58 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 49s)
  • 14:57 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 14:57 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 14:50 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
  • 14:50 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 14:49 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
  • 14:49 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 14:47 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 14:26 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 13:59 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 07s)
  • 13:59 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 13:58 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
  • 13:58 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 12:50 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 12:17 hashar: Re-enabled zuul-merger on contint2001 and removed the Icinga maintenance window
  • 12:02 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:02 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
  • 12:01 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
  • 11:58 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:48 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:48 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
  • 11:47 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
  • 11:45 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:42 hashar: Enabled zuul-merger contint1002, disabled it on contint2001 and marked that host as under maintenance in Icinga for the next two hours
  • 11:27 hashar: Stopped zuul-merger contint1002
  • 11:17 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:05 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
  • 11:04 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikimediacloud - aborrero@cumin1001"
  • 11:02 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 10:13 moritzm: rebooting puppetdb1003
  • 10:09 moritzm: rebooting puppetserver1001
  • 10:06 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb2003.codfw.wmnet
  • 10:05 moritzm: rebooting puppetserver2001
  • 10:05 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 10:03 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow1002.eqiad.wmnet
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb2003.codfw.wmnet
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow1002.eqiad.wmnet
  • 09:52 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host debmonitor2003.codfw.wmnet
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow2003.codfw.wmnet
  • 09:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow2003.codfw.wmnet
  • 09:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor2003.codfw.wmnet
  • 09:45 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 09:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host debmonitor1003.eqiad.wmnet
  • 09:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
  • 09:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host debmonitor1003.eqiad.wmnet
  • 09:34 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host lists1003.wikimedia.org
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
  • 09:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow6001.drmrs.wmnet
  • 09:29 stevemunene@cumin1001: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
  • 09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow6001.drmrs.wmnet
  • 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow3002.esams.wmnet
  • 09:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host lists1003.wikimedia.org
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people1004.eqiad.wmnet
  • 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people1004.eqiad.wmnet
  • 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow3002.esams.wmnet
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow5002.eqsin.wmnet
  • 09:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host people2003.codfw.wmnet
  • 09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host people2003.codfw.wmnet
  • 09:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow5002.eqsin.wmnet
  • 08:53 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 08:50 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:48 moritzm: installing bookworm kernel updates
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: xhgui2002.codfw.wmnet
  • 08:47 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: xhgui2002.codfw.wmnet
  • 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: xhgui1002.eqiad.wmnet
  • 08:46 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: xhgui1002.eqiad.wmnet
  • 08:05 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-test[1006-1010].eqiad.wmnet with reason: resetting cluster
  • 08:05 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-test[1006-1010].eqiad.wmnet with reason: resetting cluster
  • 01:55 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 00:28 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer

2023-07-06

  • 23:14 mutante: mx1001 - rm /usr/local/bin/otrs_aliases ; rm /lib/systemd/system/generate_otrs_aliases.* after deploying gerrit:932316 which renamed script and timer without absenting them
  • 23:08 mutante: mx2001 - rm /usr/local/bin/otrs_aliases ; rm /lib/systemd/system/generate_otrs_aliases.* after deploying gerrit:932316 which renamed script and timer without absenting them
  • 21:12 thcipriani@deploy1002: Finished scap: Clean up font directory gerrit:723652 (duration: 06m 33s)
  • 21:10 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 14m 56s)
  • 21:06 thcipriani@deploy1002: Started scap: Clean up font directory gerrit:723652
  • 21:04 thcipriani@deploy1002: Finished scap: Backport for pawikibooks: Install Quiz extension (T340613) (duration: 12m 19s)
  • 20:55 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 20:54 bking@deploy1002: Finished deploy [wdqs/wdqs@dff41b7]: 0.3.124 (duration: 00m 05s)
  • 20:54 bking@deploy1002: Started deploy [wdqs/wdqs@dff41b7]: 0.3.124
  • 20:53 thcipriani@deploy1002: stang and thcipriani: Backport for pawikibooks: Install Quiz extension (T340613) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:51 thcipriani@deploy1002: Started scap: Backport for pawikibooks: Install Quiz extension (T340613)
  • 20:48 thcipriani@deploy1002: Finished scap: Backport for Update more logos with available SVGs (T338162) (duration: 12m 41s)
  • 20:37 thcipriani@deploy1002: jdlrobson and thcipriani: Backport for Update more logos with available SVGs (T338162) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:35 thcipriani@deploy1002: Started scap: Backport for Update more logos with available SVGs (T338162)
  • 20:16 thcipriani@deploy1002: Finished scap: Backport for Disable purging of old client hint data by default (T340959 T341076) (duration: 10m 08s)
  • 20:07 thcipriani@deploy1002: thcipriani and dreamyjazz: Backport for Disable purging of old client hint data by default (T340959 T341076) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:06 thcipriani@deploy1002: Started scap: Backport for Disable purging of old client hint data by default (T340959 T341076)
  • 19:24 urbanecm@deploy1002: Finished scap: Backport for PageView: Fix base URL when using service proxy (T341191) (duration: 07m 16s)
  • 19:17 urbanecm@deploy1002: Started scap: Backport for PageView: Fix base URL when using service proxy (T341191)
  • 19:06 bking@cumin1001: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 19:03 urbanecm@deploy1002: Finished scap: Backport for PageView: Route requests through restbase service proxy (T341191) (duration: 07m 27s)
  • 18:57 urbanecm@deploy1002: urbanecm: Backport for PageView: Route requests through restbase service proxy (T341191) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 18:56 urbanecm@deploy1002: Started scap: Backport for PageView: Route requests through restbase service proxy (T341191)
  • 17:33 cstone: tools upgraded from 2ca83336 to 10972e59
  • 17:24 sukhe: sudo cumin -b1 -s300 'A:dns-rec' 'systemctl restart ntp.service'
  • 17:17 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 17:15 sukhe: homer "mr*" commit "update ntp_servers (add dns1004, remove dns1001)"
  • 17:07 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:06 cstone: SmashPig upgraded from db23b998 to 95181a1b
  • 17:04 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns1001.wikimedia.org
  • 17:04 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:04 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:02 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin2002"
  • 17:00 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 16:58 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 16:58 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 16:54 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns1001.wikimedia.org
  • 16:49 sukhe: sudo cumin A:netbox 'run-puppet-agent': removing dns1001 before decomm cookbook
  • 16:44 sukhe: homer "cr*-eqiad*" commit "decommission DNS host dns1001 (replaced by dns1004)"
  • 16:31 sukhe: ns0: set routing-options static route 208.80.154.238/32 next-hop [ 208.80.154.6 208.80.155.108 208.80.154.134 ]
  • 16:30 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 16:30 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 16:16 sukhe: homer "cr*-eqiad*" commit "Gerrit: 933917 add new DNS host dns1004"
  • 16:12 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 16:11 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply
  • 15:54 elukey: changeprop's kafka linger.ms set to 20s - T338357 (was 5ms, now changeprop waits a bit more to batch messages to send to kafka in one go)
  • 15:53 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: sync
  • 15:53 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: sync
  • 15:47 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 15:47 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 15:45 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 15:45 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 15:36 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 15:36 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 15:35 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 15:29 sukhe: restart ntp.service on A:dns-rec
  • 15:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 15:25 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 15:21 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns1004.wikimedia.org with OS bullseye
  • 15:21 sukhe@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:20 sukhe@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - sukhe@cumin2002"
  • 15:16 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 15:15 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 15:02 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns1004.wikimedia.org with reason: host reimage
  • 14:58 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns1004.wikimedia.org with reason: host reimage
  • 14:55 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:54 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 14:51 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:49 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: host reimage
  • 14:47 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1069.eqiad.wmnet
  • 14:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:46 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1069.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 14:46 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-worker1003.eqiad.wmnet with reason: host reimage
  • 14:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bullseye
  • 14:45 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1004.wikimedia.org with OS bullseye
  • 14:42 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1069.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 14:37 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 14:36 hnowlan@puppetmaster1001: conftool action : set/pooled=yes; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 14:35 hnowlan: reenabling puppet on A:cp
  • 14:31 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1069.eqiad.wmnet
  • 14:29 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1068.eqiad.wmnet
  • 14:29 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:29 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1068.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 14:28 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 14:27 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1068.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 14:27 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 14:25 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 14:25 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 14:22 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 14:22 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 14:20 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1068.eqiad.wmnet
  • 14:19 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1001
  • 14:19 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001
  • 14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1067.eqiad.wmnet
  • 14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:18 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 14:16 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 14:15 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 14:14 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 14:14 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bullseye
  • 14:13 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1004.wikimedia.org with OS bullseye
  • 14:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:12 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 14:09 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 14:06 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1067.eqiad.wmnet
  • 14:05 hnowlan: disabling puppet on A:cp-text to test 935464
  • 14:05 hnowlan@puppetmaster1001: conftool action : set/pooled=no; selector: service=ats-be,name=cp2037.codfw.wmnet
  • 14:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bullseye
  • 14:02 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns1004.wikimedia.org with OS bullseye
  • 14:01 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-test-worker1003.eqiad.wmnet with OS bullseye
  • 13:56 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns1004.wikimedia.org with OS bullseye
  • 13:55 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 13:54 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudlb1001.eqiad.wmnet
  • 13:42 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:38 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudlb1001.eqiad.wmnet
  • 13:34 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:33 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host zookeeper-test1002.eqiad.wmnet with OS bookworm
  • 13:32 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-test-worker1003.eqiad.wmnet
  • 13:30 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1066.eqiad.wmnet
  • 13:30 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:30 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1066.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 13:29 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:29 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host an-test-worker1003.eqiad.wmnet
  • 13:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-test-worker1003.eqiad.wmnet
  • 13:24 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1066.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 13:22 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 13:18 urbanecm@deploy1002: Finished scap: Backport for Enable global abuse filters on almost all projects (T341159) (duration: 10m 07s)
  • 13:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker1095.eqiad.wmnet with reason: Replacing RAID controller battery
  • 13:17 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on an-worker1095.eqiad.wmnet with reason: Replacing RAID controller battery
  • 13:14 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1066.eqiad.wmnet
  • 13:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1065.eqiad.wmnet
  • 13:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:12 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1065.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 13:10 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
  • 13:10 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1065.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 13:10 urbanecm@deploy1002: urbanecm: Backport for Enable global abuse filters on almost all projects (T341159) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:08 urbanecm@deploy1002: Started scap: Backport for Enable global abuse filters on almost all projects (T341159)
  • 13:08 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 13:02 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1065.eqiad.wmnet
  • 13:00 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 12:58 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on zookeeper-test1002.eqiad.wmnet with reason: host reimage
  • 12:58 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1064.eqiad.wmnet
  • 12:58 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:58 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1064.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 12:56 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on zookeeper-test1002.eqiad.wmnet with reason: host reimage
  • 12:43 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1064.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 12:42 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host zookeeper-test1002.eqiad.wmnet with OS bookworm
  • 12:40 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 12:35 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1064.eqiad.wmnet
  • 12:32 samtar@deploy1002: Finished scap: Backport for Revert "Add tag when reference added to the page" (T341202) (duration: 24m 04s)
  • 12:21 samtar@deploy1002: matmarex and samtar: Backport for Revert "Add tag when reference added to the page" (T341202) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 12:15 elukey@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host zookeeper-test1002.eqiad.wmnet with OS bookworm
  • 12:15 elukey@cumin1001: START - Cookbook sre.hosts.reimage for host zookeeper-test1002.eqiad.wmnet with OS bookworm
  • 12:08 samtar@deploy1002: Started scap: Backport for Revert "Add tag when reference added to the page" (T341202)
  • 11:56 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudlb1001.eqiad.wmnet with OS bullseye
  • 11:56 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:56 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts analytics1063.eqiad.wmnet
  • 11:56 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:56 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1002
  • 11:56 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1002
  • 11:56 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:56 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:55 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:55 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 11:54 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb1001.eqiad.wmnet on all recursors
  • 11:54 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudlb1001.eqiad.wmnet on all recursors
  • 11:53 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:53 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:53 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:52 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:52 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 11:50 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:50 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1063.eqiad.wmnet
  • 11:50 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudlb1001.eqiad.wmnet on all recursors
  • 11:50 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudlb1001.eqiad.wmnet on all recursors
  • 11:49 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 11:49 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 11:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 11:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 11:48 aborrero@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudlb1001
  • 11:48 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 11:48 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 11:48 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001
  • 11:47 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts analytics1063.eqiad.wmnet
  • 11:47 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:46 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 11:43 aborrero@cumin1001: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudlb1001
  • 11:42 aborrero@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cloudlb1001
  • 11:41 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1063.eqiad.wmnet
  • 11:41 stevemunene@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts analytics1063.eqiad.wmnet
  • 11:41 stevemunene@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:39 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:39 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:38 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudlb - aborrero@cumin1001"
  • 11:35 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:34 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Beta-Wikidata: Always show mul on desktop Termbox (T339104) (duration: 07m 37s)
  • 11:30 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudswift1002.eqiad.wmnet
  • 11:30 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:30 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudswift1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
  • 11:29 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudswift1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - aborrero@cumin1001"
  • 11:27 lucaswerkmeister-wmde@deploy1002: migr and lucaswerkmeister-wmde: Backport for Beta-Wikidata: Always show mul on desktop Termbox (T339104) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 11:27 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-test-eqiad cluster: Roll restart of jvm daemons.
  • 11:27 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:26 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Beta-Wikidata: Always show mul on desktop Termbox (T339104)
  • 11:25 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for foundationwiki: Enable WikibaseClient (T321967) (duration: 08m 58s)
  • 11:24 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 11:24 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudswift1001.eqiad.wmnet
  • 11:24 aborrero@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:23 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudswift1002.eqiad.wmnet
  • 11:22 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 11:19 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1063.eqiad.wmnet
  • 11:18 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 11:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 11:17 lucaswerkmeister-wmde@deploy1002: varnent and lucaswerkmeister-wmde: Backport for foundationwiki: Enable WikibaseClient (T321967) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 11:16 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for foundationwiki: Enable WikibaseClient (T321967)
  • 11:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 11:14 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 11:14 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for outreachwiki: Set wmgWikibaseSiteGroup (duration: 07m 35s)
  • 11:12 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 11:12 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 11:10 aborrero@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudswift1001.eqiad.wmnet
  • 11:07 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde: Backport for outreachwiki: Set wmgWikibaseSiteGroup synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 11:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for outreachwiki: Set wmgWikibaseSiteGroup
  • 11:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1062.eqiad.wmnet
  • 11:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:05 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1062.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 11:04 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 11:03 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on A:swift-fe
  • 10:58 taavi@deploy1002: Finished scap: Backport for extdist: REL1_40 is stable, REL1_38 is EOL (duration: 08m 21s)
  • 10:54 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1062.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 10:53 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:51 taavi@deploy1002: taavi: Backport for extdist: REL1_40 is stable, REL1_38 is EOL synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 10:49 taavi@deploy1002: Started scap: Backport for extdist: REL1_40 is stable, REL1_38 is EOL
  • 10:47 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 10:41 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1062.eqiad.wmnet
  • 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1061.eqiad.wmnet
  • 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:10 stevemunene@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1061.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 10:08 stevemunene@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics1061.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1001"
  • 10:05 stevemunene@cumin1001: START - Cookbook sre.dns.netbox
  • 09:58 stevemunene@cumin1001: START - Cookbook sre.hosts.decommission for hosts analytics1061.eqiad.wmnet
  • 09:35 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on A:swift-fe
  • 09:28 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 09:13 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 09:11 elukey: restart kube-apiserver on ml-serve-ctrl2* as attempt to fix LIST-related latency issues
  • 09:10 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.41.0-wmf.16 refs T340244
  • 08:55 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 08:55 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 08:51 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 08:50 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 08:49 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 08:49 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 08:45 fabfur: reenabled puppet on cp1075.eqiad.wmnet, cp2027.codfw.wmnet, cp3050.esams.wmnet
  • 08:39 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
  • 08:17 fabfur: disabling puppet temporary on cp1075.eqiad.wmnet, cp2027.codfw.wmnet, cp3050.esams.wmnet to apply 935760 (T340983)
  • 08:03 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade
  • 07:31 kart_: Updated MinT to 2023-07-06-051402-production
  • 07:29 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
  • 07:29 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 07:25 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 07:23 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 07:17 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 07:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 07:09 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 07:04 stevemunene@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on 9 hosts with reason: Stopping puppet and hadoop-hdfs-datanode services then decommissioning the hosts
  • 07:04 stevemunene@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on 9 hosts with reason: Stopping puppet and hadoop-hdfs-datanode services then decommissioning the hosts
  • 06:54 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: GitLab minor version upgrade
  • 02:17 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/opentelemetry-collector: apply
  • 02:16 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/opentelemetry-collector: apply
  • 02:06 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 02:05 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 02:05 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/opentelemetry-collector: apply
  • 02:05 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/opentelemetry-collector: apply
  • 00:22 eileen: civicrm upgraded from 4ca2008d to 0ddd1a51
  • 00:03 rzl@deploy1002: helmfile [staging] DONE helmfile.d/services/opentelemetry-collector: apply
  • 00:02 rzl@deploy1002: helmfile [staging] START helmfile.d/services/opentelemetry-collector: apply

2023-07-05

  • 22:52 bking@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 22:38 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host an-test-worker1003.eqiad.wmnet
  • 22:36 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts xhgui2002
  • 22:36 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:36 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 22:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-test-worker1003.eqiad.wmnet
  • 22:35 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui2002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 22:33 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 22:28 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts xhgui2002
  • 22:28 denisse@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts xhgui1002
  • 22:28 denisse@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:28 denisse@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui1002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 22:27 denisse@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: xhgui1002 decommissioned, removing all IPs except the asset tag one - denisse@cumin1001"
  • 22:24 denisse@cumin1001: START - Cookbook sre.dns.netbox
  • 22:23 mutante: registry1003 - sudo systemctl start build-hompage
  • 22:17 denisse@cumin1001: START - Cookbook sre.hosts.decommission for hosts xhgui1002
  • 21:23 bking@cumin1001: START - Cookbook sre.wdqs.data-transfer
  • 21:04 urbanecm@deploy1002: Finished scap: Backport for Optimize SVG wordmarks, enable Wikimania wordmark, fix techconduct (T338162) (duration: 08m 22s)
  • 20:57 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Optimize SVG wordmarks, enable Wikimania wordmark, fix techconduct (T338162) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:55 urbanecm@deploy1002: Started scap: Backport for Optimize SVG wordmarks, enable Wikimania wordmark, fix techconduct (T338162)
  • 20:55 urbanecm@deploy1002: Finished scap: Backport for Update various logos where SVGs are available (T338162) (duration: 11m 10s)
  • 20:45 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Update various logos where SVGs are available (T338162) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:44 urbanecm@deploy1002: Started scap: Backport for Update various logos where SVGs are available (T338162)
  • 20:31 urbanecm@deploy1002: Finished scap: Backport for Add language button at the top of the Main page of Italian Wikivoyage (T337666) (duration: 12m 38s)
  • 20:20 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Add language button at the top of the Main page of Italian Wikivoyage (T337666) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:19 urbanecm@deploy1002: Started scap: Backport for Add language button at the top of the Main page of Italian Wikivoyage (T337666)
  • 20:17 urbanecm@deploy1002: Finished scap: Backport for Disable the Nearby feature on some sister projects (T341133) (duration: 13m 12s)
  • 20:05 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Disable the Nearby feature on some sister projects (T341133) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:04 urbanecm@deploy1002: Started scap: Backport for Disable the Nearby feature on some sister projects (T341133)
  • 18:36 sukhe: re-enable puppet in A:dns-rec to finish merging CR 933497 and run-agent: T340479
  • 18:25 denisse: disable puppet on webperf1003 to test PHP memory changes for XHGui
  • 18:25 denisse: disable puppet on webperf1003
  • 18:20 sukhe: disable puppet on A:dns-rec to merge CR 933497
  • 17:40 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
  • 17:07 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 17:07 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 17:03 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 17:02 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 16:33 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab minor version upgrade
  • 16:25 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host an-test-worker1003.eqiad.wmnet
  • 16:25 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host an-test-worker1003.eqiad.wmnet
  • 16:06 jelto@cumin1001: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
  • 15:55 fabfur: re-enabled puppet in all cp- hosts (done @2023-07-05 14:22:57 UTC)
  • 15:38 mlitn@deploy1002: Finished deploy [airflow-dags/platform_eng@a97da10]: (no justification provided) (duration: 00m 25s)
  • 15:38 mlitn@deploy1002: Started deploy [airflow-dags/platform_eng@a97da10]: (no justification provided)
  • 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
  • 15:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 15:26 sukhe: reprepro -C component/dnsdist include bullseye-wikimedia dnsdist_1.8.0-1+wmf11u1_amd64.changes
  • 15:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
  • 15:23 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
  • 15:09 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1024.eqiad.wmnet
  • 15:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
  • 15:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 15:05 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host an-test-worker1003.eqiad.wmnet
  • 15:00 jelto@cumin1001: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: GitLab minor version upgrade
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 14:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
  • 14:48 btullis@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 14:44 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2013.codfw.wmnet
  • 14:44 vgutierrez@cumin1001: START - Cookbook sre.hosts.remove-downtime for lvs2013.codfw.wmnet
  • 14:42 sukhe: re-enable puppet and start pybal on lvs2013
  • 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
  • 14:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
  • 14:38 vgutierrez: pool cdn service in cp2027.codfw.wmnet,cp1075.eqiad.wmnet,cp3050.esams.wmnet
  • 14:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
  • 14:29 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:27 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:27 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:24 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:24 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:22 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:22 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:19 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:18 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 14:18 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:18 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:18 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
  • 14:18 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:17 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 14:17 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:16 vgutierrez: depool cdn service in cp2027.codfw.wmnet,cp1075.eqiad.wmnet,cp3050.esams.wmnet
  • 14:16 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:16 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 14:15 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:14 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:11 fabfur: disabling puppet in all cp- hosts for error in configuration
  • 14:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
  • 14:08 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
  • 14:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
  • 14:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
  • 13:57 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
  • 13:57 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1021.eqiad.wmnet
  • 13:55 elukey: expand kafka topic partitions from 1 to 5 for {codfw,eqiad}.mediawiki.job.RecordLintJob and {eqiad,codfw}.mediawiki.job.refreshLinks on kafka-main eqiad/codfw - T338357
  • 13:53 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable Extension:RelatedArticles for desktop on frwikinews (T341105) (duration: 08m 54s)
  • 13:51 vgutierrez@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: mgmt interface issues
  • 13:51 vgutierrez@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on lvs2013.codfw.wmnet with reason: mgmt interface issues
  • 13:45 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and anzx: Backport for Enable Extension:RelatedArticles for desktop on frwikinews (T341105) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:44 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable Extension:RelatedArticles for desktop on frwikinews (T341105)
  • 13:41 sukhe: disable puppet and stop pybal on lvs2013: T340960
  • 13:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
  • 13:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
  • 13:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
  • 13:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
  • 13:02 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:54 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1019.eqiad.wmnet
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1019.eqiad.wmnet
  • 12:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
  • 12:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1019.eqiad.wmnet
  • 12:34 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudsw-b1.private.codfw.wikimedia.cloud on all recursors
  • 12:34 aborrero@cumin2002: START - Cookbook sre.dns.wipe-cache cloudsw-b1.private.codfw.wikimedia.cloud on all recursors
  • 12:31 aborrero@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:31 aborrero@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-b1 codfw - aborrero@cumin2002"
  • 12:30 aborrero@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-b1 codfw - aborrero@cumin2002"
  • 12:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet
  • 12:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow4002.ulsfo.wmnet
  • 12:28 aborrero@cumin2002: START - Cookbook sre.dns.netbox
  • 12:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow4002.ulsfo.wmnet
  • 12:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts urldownloader1001.wikimedia.org
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:59 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:58 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:55 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts urldownloader1001.wikimedia.org
  • 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts urldownloader1002.wikimedia.org
  • 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:50 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:47 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:47 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:45 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader1002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet
  • 11:45 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:45 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:43 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:39 btullis@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-jumbo-eqiad cluster: Roll restart of jvm daemons.
  • 11:37 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts urldownloader1002.wikimedia.org
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts urldownloader2002.wikimedia.org
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:34 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader2002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 11:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
  • 11:25 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts urldownloader2002.wikimedia.org
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts urldownloader2001.wikimedia.org
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
  • 11:09 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: urldownloader2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 11:06 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:01 btullis@cumin1001: START - Cookbook sre.presto.roll-restart-workers for Presto analytics cluster: Roll restart of all Presto's jvm daemons.
  • 11:00 mvernon@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe
  • 11:00 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts urldownloader2001.wikimedia.org
  • 10:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet
  • 10:52 mvernon@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe
  • 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet
  • 10:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
  • 10:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
  • 10:41 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 10:40 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 10:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet
  • 10:22 godog: restore US business hours escalation - T340763
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet
  • 10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
  • 10:14 jmm@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast2003.wikimedia.org
  • 10:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
  • 10:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet
  • 10:05 jmm@cumin1001: START - Cookbook sre.hosts.reboot-single for host bast2003.wikimedia.org
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
  • 09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:48 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw - aborrero@cumin1001"
  • 09:47 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw - aborrero@cumin1001"
  • 09:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1015.eqiad.wmnet
  • 09:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
  • 09:43 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 09:41 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:41 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw - aborrero@cumin1001"
  • 09:41 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:41 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:40 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw - aborrero@cumin1001"
  • 09:39 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:39 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
  • 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
  • 09:38 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 09:37 claime: running puppet on 'A:cp-text and P:trafficserver::backend' - T341078
  • 09:36 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:36 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:35 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:35 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:31 claime: Sending 0.5% of global traffic to mw-on-k8s - T341078
  • 09:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
  • 09:29 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow1002.eqiad.wmnet to drbd
  • 09:27 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:27 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:26 cgoubert@deploy1002: Finished scap: (no justification provided) (duration: 02m 19s)
  • 09:25 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:24 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:24 claime: redeploy mw-on-k8s following quota update - T341114
  • 09:24 cgoubert@deploy1002: Started scap: (no justification provided)
  • 09:23 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:22 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:22 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:21 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:21 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:19 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:19 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:18 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:18 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:17 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow1002.eqiad.wmnet to drbd
  • 09:15 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1014.eqiad.wmnet
  • 09:11 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:10 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:10 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:10 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:09 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:09 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:08 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:08 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:08 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:08 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:07 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 09:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 09:07 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:07 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:07 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 09:06 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:06 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:04 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:04 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:03 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 09:02 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:02 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
  • 09:01 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
  • 09:01 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
  • 08:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
  • 08:53 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:52 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 08:52 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:45 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:45 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:44 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.41.0-wmf.16 refs T340244
  • 08:40 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:40 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:39 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:34 btullis@cumin1001: END (PASS) - Cookbook sre.hadoop.roll-restart-workers (exit_code=0) restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 08:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
  • 08:26 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop test cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
  • 08:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
  • 08:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 08:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 08:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 08:10 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 08:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1012.eqiad.wmnet
  • 00:09 zabe@deploy1002: Finished scap: update interwiki cache (duration: 06m 51s)
  • 00:02 zabe@deploy1002: Started scap: update interwiki cache

2023-07-04

  • 23:58 zabe@deploy1002: Finished scap: T335969 (duration: 07m 40s)
  • 23:52 zabe@deploy1002: zabe: T335969 synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 23:50 zabe@deploy1002: Started scap: T335969
  • 23:50 zabe: create Wikipedia Ghanaian Pidgin # T335969
  • 22:57 zabe@deploy1002: Finished scap: Backport for Remove migrateStewards.php reference (duration: 07m 23s)
  • 22:52 zabe@deploy1002: taavi and zabe: Backport for Remove migrateStewards.php reference synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 22:50 zabe@deploy1002: Started scap: Backport for Remove migrateStewards.php reference
  • 22:46 zabe@deploy1002: Finished scap: Backport for Stop setting $wgCommentTempTableSchemaMigrationStage (T299954) (duration: 07m 56s)
  • 22:39 zabe@deploy1002: zabe: Backport for Stop setting $wgCommentTempTableSchemaMigrationStage (T299954) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 22:38 zabe@deploy1002: Started scap: Backport for Stop setting $wgCommentTempTableSchemaMigrationStage (T299954)
  • 19:38 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:38 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:37 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:37 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:29 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:29 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P49512 and previous config saved to /var/cache/conftool/dbconfig/20230704-192646-ladsgroup.json
  • 19:23 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:23 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:21 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P49511 and previous config saved to /var/cache/conftool/dbconfig/20230704-191142-ladsgroup.json
  • 19:09 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 19:07 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:07 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:01 jgleeson: payments-wiki upgraded from cbc0b454 to d76b9085
  • 18:59 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P49510 and previous config saved to /var/cache/conftool/dbconfig/20230704-185637-ladsgroup.json
  • 18:56 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 18:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P49509 and previous config saved to /var/cache/conftool/dbconfig/20230704-184132-ladsgroup.json
  • 18:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 18:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 18:38 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db2165 T339223', diff saved to https://phabricator.wikimedia.org/P49508 and previous config saved to /var/cache/conftool/dbconfig/20230704-183748-ladsgroup.json
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db2161 to s8 primary T339223', diff saved to https://phabricator.wikimedia.org/P49507 and previous config saved to /var/cache/conftool/dbconfig/20230704-183434-ladsgroup.json
  • 18:32 Amir1: Starting s8 codfw failover from db2165 to db2161 - T339223
  • 18:31 sukhe: finished running homer for adding fabfur [pushed to all 55 devices successfully]
  • 18:25 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 18:06 sukhe: enable puppet on A:wikidough to roll out CR 863295
  • 18:01 sukhe: disable puppet on A:wikidough to roll out CR 863295
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db2161 with weight 0 T339223', diff saved to https://phabricator.wikimedia.org/P49506 and previous config saved to /var/cache/conftool/dbconfig/20230704-175604-ladsgroup.json
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T339223
  • 17:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T339223
  • 17:36 sukhe: [correction] homer "*" commit "Gerrit: 935479 add fabfur"
  • 17:36 sukhe: homer "*" commit "Gerrit: 935479 add fabur"
  • 16:14 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard-next.discovery.wmnet on all recursors
  • 16:14 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard-next.discovery.wmnet on all recursors
  • 16:14 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet on all recursors
  • 16:14 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet on all recursors
  • 16:03 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet on all recursors
  • 16:03 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet on all recursors
  • 16:03 jbond@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=puppetboard,name=codfw
  • 16:02 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.discovery.wmnet on all recursors
  • 16:02 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.discovery.wmnet on all recursors
  • 15:57 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard.wikimedia.org on all recursors
  • 15:57 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard.wikimedia.org on all recursors
  • 15:56 jbond@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=puppetboard,name=codfw
  • 15:46 Emperor: delete swift container global-data-elastic-backups in AUTH_search account T341081
  • 15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 15:26 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 15:26 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 15:25 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 15:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 15:24 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 15:19 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 15:12 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 15:08 jbond@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) puppetboard-next.wikimedia.org on all recursors
  • 15:08 jbond@cumin1001: START - Cookbook sre.dns.wipe-cache puppetboard-next.wikimedia.org on all recursors
  • 15:04 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:04 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-c8.private.eqiad.wikimedia.cloud - aborrero@cumin1001"
  • 15:03 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-c8.private.eqiad.wikimedia.cloud - aborrero@cumin1001"
  • 15:01 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 15:01 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 aborrero@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-c8.eqiad.codfw.wikimedia.cloud - aborrero@cumin1001"
  • 15:00 aborrero@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudsw-c8.eqiad.codfw.wikimedia.cloud - aborrero@cumin1001"
  • 14:58 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:58 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:58 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 14:56 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:56 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:53 claime: Deploying encrypted rsync to deployment servers - T289857
  • 14:52 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:52 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:50 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:50 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:49 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:49 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:46 jbond@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=puppetboard-next,name=codfw
  • 14:43 cgoubert@deploy1002: Finished scap: (no justification provided) (duration: 02m 12s)
  • 14:42 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:42 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:41 cgoubert@deploy1002: Started scap: (no justification provided)
  • 14:41 claime: redeploying mw-on-k8s
  • 14:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:40 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:40 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:39 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:38 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:38 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:37 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:36 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:36 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:33 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:33 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:32 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 14:31 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 14:29 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:29 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:29 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:29 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:27 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:27 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:23 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:22 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:20 gmodena@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:20 gmodena@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:18 gmodena@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:18 gmodena@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:16 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:16 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:16 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:16 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002) (duration: 18m 41s)
  • 14:03 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:03 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:03 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 14:02 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:02 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 14:02 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 14:02 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:01 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 14:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 14:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 14:00 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 14:00 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 14:00 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 14:00 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 13:59 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and urbanecm: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:58 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 13:58 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 13:57 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002)
  • 13:55 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: sync
  • 13:55 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: sync
  • 13:51 jiji@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:50 jiji@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:50 jiji@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:50 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox: apply
  • 13:50 jiji@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:49 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:45 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventstreams: apply
  • 13:44 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002) (duration: 08m 25s)
  • 13:40 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/zotero: apply
  • 13:39 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 13:38 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/wikifeeds: apply
  • 13:37 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/toolhub: apply
  • 13:37 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/termbox: apply
  • 13:37 lucaswerkmeister-wmde@deploy1002: urbanecm and lucaswerkmeister-wmde: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:36 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/tegola-vector-tiles: apply
  • 13:36 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/similar-users: apply
  • 13:35 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for DeleteAction: Avoid displaying the form unconditionally (T341002)
  • 13:35 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox-timeline: apply
  • 13:34 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:34 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox-media: apply
  • 13:34 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/shellbox-constraints: apply
  • 13:33 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/recommendation-api: apply
  • 13:32 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 13:32 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/rdf-streaming-updater: apply
  • 13:32 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 13:31 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 13:31 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 13:28 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 13:28 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 13:27 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/push-notifications: apply
  • 13:26 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/proton: apply
  • 13:22 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/mobileapps: apply
  • 13:18 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/miscweb: apply
  • 13:17 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/machinetranslation: apply
  • 13:11 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/linkrecommendation: apply
  • 13:10 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/image-suggestion: apply
  • 13:09 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventstreams-internal: apply
  • 13:08 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventgate-main: apply
  • 13:05 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventgate-logging-external: apply
  • 13:03 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventgate-analytics-external: apply
  • 13:02 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/eventgate-analytics: apply
  • 13:01 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/zotero: apply
  • 13:01 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/wikifeeds: apply
  • 13:01 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/echostore: apply
  • 13:00 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/device-analytics: apply
  • 12:59 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/developer-portal: apply
  • 12:57 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/cxserver: apply
  • 12:56 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/citoid: apply
  • 12:55 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/toolhub: apply
  • 12:55 jayme@deploy1002: helmfile [eqiad] OK helmfile.d/services/blubberoid: apply
  • 12:54 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/termbox: apply
  • 12:53 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/tegola-vector-tiles: apply
  • 12:51 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:51 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:50 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:50 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/similar-users: apply
  • 12:49 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:49 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox-timeline: apply
  • 12:48 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:48 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:48 jiji@deploy1002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
  • 12:48 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox-syntaxhighlight: apply
  • 12:48 jiji@deploy1002: helmfile [staging] START helmfile.d/services/ipoid: apply
  • 12:48 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox-media: apply
  • 12:47 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox-constraints: apply
  • 12:47 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/shellbox: apply
  • 12:47 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/recommendation-api: apply
  • 12:46 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/rdf-streaming-updater: apply
  • 12:45 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/push-notifications: apply
  • 12:44 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/proton: apply
  • 12:42 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/mobileapps: apply
  • 12:41 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/miscweb: apply
  • 12:40 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/machinetranslation: apply
  • 12:34 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/linkrecommendation: apply
  • 12:31 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 12:31 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 12:30 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/image-suggestion: apply
  • 12:29 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventstreams-internal: apply
  • 12:29 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventstreams: apply
  • 12:28 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventgate-main: apply
  • 12:27 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventgate-logging-external: apply
  • 12:27 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 12:26 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 12:25 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventgate-analytics-external: apply
  • 12:24 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/eventgate-analytics: apply
  • 12:23 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/echostore: apply
  • 12:22 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/device-analytics: apply
  • 12:21 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/developer-portal: apply
  • 12:20 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/cxserver: apply
  • 12:20 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/citoid: apply
  • 12:19 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/blubberoid: apply
  • 12:18 jayme@deploy1002: helmfile [codfw] OK helmfile.d/services/apertium: apply
  • 12:13 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 12:12 jayme@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 11:55 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/similar-users: apply
  • 11:53 jayme@deploy1002: helmfile [staging] FAIL (1) helmfile.d/services/miscweb: apply
  • 11:48 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/toolhub: apply
  • 11:48 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox: apply
  • 11:47 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventstreams-internal: apply
  • 11:47 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventgate-main: apply
  • 11:46 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:46 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventgate-analytics: apply
  • 11:45 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/linkrecommendation: apply
  • 11:45 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/mobileapps: apply
  • 11:43 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/device-analytics: apply
  • 11:43 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/citoid: apply
  • 11:42 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox-media: apply
  • 11:41 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/zotero: apply
  • 11:41 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/cxserver: apply
  • 11:40 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventgate-analytics-external: apply
  • 11:33 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/apertium: apply
  • 11:32 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/tegola-vector-tiles: apply
  • 11:31 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox-timeline: apply
  • 11:29 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/shellbox-constraints: apply
  • 11:28 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/api-gateway: apply
  • 11:28 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/wikifeeds: apply
  • 11:27 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/termbox: apply
  • 11:26 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/blubberoid: apply
  • 11:26 jayme@deploy1002: helmfile [staging] FAIL (1) helmfile.d/services/similar-users: apply
  • 11:17 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:17 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:12 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/push-notifications: apply
  • 11:10 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/rdf-streaming-updater: apply
  • 11:08 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:08 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:04 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:04 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:03 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:03 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:53 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventstreams: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/echostore: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/image-suggestion: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/recommendation-api: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/machinetranslation: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/sessionstore: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/developer-portal: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/eventgate-logging-external: -i apply
  • 10:52 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/rest-gateway: -i apply
  • 10:20 jayme@deploy1002: helmfile [staging] FAIL (3) helmfile.d/services/mw-api-int: -i apply
  • 10:20 jayme@deploy1002: helmfile [staging] OK helmfile.d/services/rest-gateway: -i apply
  • 10:05 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 10:05 jayme@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 10:05 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:05 jayme@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:04 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.41.0-wmf.16 refs T340244
  • 09:55 jnuche@deploy1002: Pruned MediaWiki: 1.41.0-wmf.13 (duration: 02m 11s)
  • 09:52 jnuche@deploy1002: Finished scap: testwikis wikis to 1.41.0-wmf.16 refs T340244 (duration: 50m 51s)
  • 09:48 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 09:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 09:47 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 09:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 09:46 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 09:46 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 09:43 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:42 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:42 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:42 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:42 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:41 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:38 jayme: updated envoyproxy to 1.23.10 on all nodes - T300324
  • 09:37 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:37 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:36 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 09:36 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 09:07 btullis@cumin1001: END (FAIL) - Cookbook sre.hadoop.roll-restart-workers (exit_code=99) restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.
  • 09:02 jnuche@deploy1002: Started scap: testwikis wikis to 1.41.0-wmf.16 refs T340244
  • 08:56 btullis@cumin1001: START - Cookbook sre.hadoop.roll-restart-workers restart workers for Hadoop analytics cluster: Roll restart of jvm daemons for openjdk upgrade.

2023-07-03

  • 22:00 eileen: civicrm upgraded from 9e04c92d to 4ca2008d
  • 20:18 jiji@cumin1001: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 20:15 effie: restarting swift proxies
  • 20:14 jiji@cumin1001: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:eqiad and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 19:11 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 19:11 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 19:10 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 19:09 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 19:09 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 19:09 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:10 effie: restarting pybal on lvs2013
  • 16:04 effie: restarting pybal on lvs2014
  • 15:57 effie: restarting pybal on lvs2014
  • 15:52 effie: restarting pybal on lvs1019
  • 15:49 effie: restarting pybal on lvs1020
  • 15:34 jiji@cumin1001: conftool action : set/weight=10; selector: name=kubestagemaster1002.eqiad.wmnet
  • 15:34 jiji@cumin1001: conftool action : set/weight=10; selector: name=kubestagemaster2002.codfw.wmnet
  • 15:34 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=kubestagemaster1002.eqiad.wmnet
  • 15:34 jiji@cumin1001: conftool action : set/pooled=yes; selector: name=kubestagemaster2002.codfw.wmnet
  • 15:14 jiji@cumin1001: conftool action : set/weight=10; selector: dc=codfw,cluster=kubernetes-staging,service=kubemaster
  • 15:12 jiji@cumin1001: conftool action : set/weight=10; selector: name=kubestagemaster1002.eqiad.wmnet
  • 15:12 jiji@cumin1001: conftool action : set/weight=10; selector: name=kubestagemaster1001.eqiad.wmnet
  • 15:09 moritzm: installing Java 8 security updates on Hadoop systems
  • 14:25 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2002.codfw.wmnet with OS bullseye
  • 14:04 jiji@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
  • 14:01 jiji@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2002.codfw.wmnet with reason: host reimage
  • 13:50 jiji@cumin1001: START - Cookbook sre.hosts.reimage for host kubestagemaster2002.codfw.wmnet with OS bullseye
  • 13:37 moritzm: installing openjdk-8 security updates
  • 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1011.eqiad.wmnet
  • 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1011.eqiad.wmnet
  • 13:28 urbanecm: UTC afternoon B&C window done
  • 13:27 urbanecm@deploy1002: Finished scap: Backport for SpecialLog: Fix issues related to IP users (T338042 T340929) (duration: 08m 32s)
  • 13:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
  • 13:22 volans@cumin1001: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 13:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
  • 13:22 volans@cumin1001: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 13:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1011.eqiad.wmnet
  • 13:21 urbanecm: Run `wikiadmin2023@10.64.16.184(idwiki)> DELETE FROM `category` WHERE cat_title = ; ` (T336780)
  • 13:20 urbanecm@deploy1002: func and urbanecm: Backport for SpecialLog: Fix issues related to IP users (T338042 T340929) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:19 urbanecm@deploy1002: Started scap: Backport for SpecialLog: Fix issues related to IP users (T338042 T340929)
  • 13:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
  • 13:18 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1011.eqiad.wmnet
  • 13:10 urbanecm@deploy1002: Finished scap: Backport for Set wgCollectionDisableSidebarLink for nowiki (T340981) (duration: 08m 09s)
  • 13:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
  • 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
  • 13:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
  • 13:04 urbanecm@deploy1002: jhsoby and urbanecm: Backport for Set wgCollectionDisableSidebarLink for nowiki (T340981) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 13:02 urbanecm@deploy1002: Started scap: Backport for Set wgCollectionDisableSidebarLink for nowiki (T340981)
  • 13:00 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 13:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
  • 12:59 kart_: Updated MinT to 2023-06-29-061037-production (T340709 + Fixed repeatation with Santali)
  • 12:57 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 12:51 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 12:50 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 12:46 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 12:42 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
  • 12:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 12:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
  • 12:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
  • 12:33 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 12:31 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 12:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
  • 12:23 kart_: Updated cxserver to 2023-07-03-045311-production (T285217)
  • 12:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
  • 12:18 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:17 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:12 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Awjrichards out of all services on: 1271 hosts
  • 12:12 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:12 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Awjrichards out of all services on: 1271 hosts
  • 12:12 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Awjrichards out of all services on: 20 hosts
  • 12:12 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Awjrichards out of all services on: 20 hosts
  • 12:12 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Awjrichards out of all services on: 760 hosts
  • 12:11 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Awjrichards out of all services on: 760 hosts
  • 12:09 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:08 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging JMinor out of all services on: 760 hosts
  • 12:04 jmm@cumin2002: START - Cookbook sre.idm.logout Logging JMinor out of all services on: 760 hosts
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging JMinor out of all services on: 1271 hosts
  • 12:03 jmm@cumin2002: START - Cookbook sre.idm.logout Logging JMinor out of all services on: 1271 hosts
  • 12:03 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging JMinor out of all services on: 20 hosts
  • 12:03 jmm@cumin2002: START - Cookbook sre.idm.logout Logging JMinor out of all services on: 20 hosts
  • 11:38 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:35 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:35 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:33 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 11:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jameel Kaisar out of all services on: 20 hosts
  • 11:26 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jameel Kaisar out of all services on: 20 hosts
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jameel Kaisar out of all services on: 1271 hosts
  • 11:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jameel Kaisar out of all services on: 1271 hosts
  • 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jameel Kaisar out of all services on: 760 hosts
  • 11:24 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jameel Kaisar out of all services on: 760 hosts
  • 11:16 jiji@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:16 jiji@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add VIP for kubestagemaster - jiji@cumin1001"
  • 11:15 jiji@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add VIP for kubestagemaster - jiji@cumin1001"
  • 11:11 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 760 hosts
  • 11:10 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 760 hosts
  • 11:10 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 1271 hosts
  • 11:09 jiji@cumin1001: START - Cookbook sre.dns.netbox
  • 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 1271 hosts
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 20 hosts
  • 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 20 hosts
  • 11:09 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Appledora out of all services on: 20 hosts
  • 11:09 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Appledora out of all services on: 20 hosts
  • 10:53 cgoubert@deploy1002: helmfile [eqiad] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:53 cgoubert@deploy1002: helmfile [eqiad] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:53 cgoubert@deploy1002: helmfile [eqiad] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 10:53 cgoubert@deploy1002: helmfile [eqiad] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:53 cgoubert@deploy1002: helmfile [codfw] [main] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:53 cgoubert@deploy1002: helmfile [codfw] [canary] DONE helmfile.d/services/mw-jobrunner : sync
  • 10:52 cgoubert@deploy1002: helmfile [codfw] [canary] START helmfile.d/services/mw-jobrunner : sync
  • 10:52 cgoubert@deploy1002: helmfile [codfw] [main] START helmfile.d/services/mw-jobrunner : sync
  • 10:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:51 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:51 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:51 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:50 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:50 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:49 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:49 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:48 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 10:48 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:48 topranks: Re-activating Vodafone DE peering at AMS-IX T340670
  • 10:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:47 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:47 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:46 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:42 jayme: imported envoyproxy 1.23.10 to buster-wikimedia, bullseye-wikimedia, bookworm-wikimedia - T300324
  • 10:19 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:18 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:18 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:17 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:03 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 09:58 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 09:58 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 09:58 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 09:58 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 09:57 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:57 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 09:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Appledora out of all services on: 1271 hosts
  • 09:44 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Appledora out of all services on: 1271 hosts
  • 09:37 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Appledora out of all services on: 760 hosts
  • 09:37 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Appledora out of all services on: 760 hosts
  • 09:36 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 09:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 09:35 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 09:34 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 09:34 ladsgroup@deploy1002: Finished scap: Backport for Set externallinks migration to read new everywhere except commons (T335343) (duration: 10m 46s)
  • 09:34 volans@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP host
  • 09:34 volans@cumin1001: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on debmonitor2003.codfw.wmnet with reason: WIP host
  • 09:25 ladsgroup@deploy1002: ladsgroup: Backport for Set externallinks migration to read new everywhere except commons (T335343) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:24 ladsgroup@deploy1002: Started scap: Backport for Set externallinks migration to read new everywhere except commons (T335343)
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bruno Scarone out of all services on: 760 hosts
  • 09:19 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bruno Scarone out of all services on: 760 hosts
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bruno Scarone out of all services on: 1271 hosts
  • 09:18 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bruno Scarone out of all services on: 1271 hosts
  • 09:18 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bruno Scarone out of all services on: 20 hosts
  • 09:17 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bruno Scarone out of all services on: 20 hosts
  • 09:13 lucaswerkmeister-wmde:: Deployed security patch for T339016
  • 09:05 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Barakat Ajadi out of all services on: 4 hosts
  • 09:04 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Barakat Ajadi out of all services on: 4 hosts
  • 08:58 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Michael.hay out of all services on: 20 hosts
  • 08:56 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Michael.hay out of all services on: 20 hosts
  • 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Michael.hay out of all services on: 1271 hosts
  • 08:55 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Michael.hay out of all services on: 1271 hosts
  • 08:55 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Michael.hay out of all services on: 760 hosts
  • 08:55 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Michael.hay out of all services on: 760 hosts
  • 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Skye Berghel out of all services on: 760 hosts
  • 08:54 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Skye Berghel out of all services on: 760 hosts
  • 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Skye Berghel out of all services on: 1271 hosts
  • 08:53 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Skye Berghel out of all services on: 1271 hosts
  • 08:53 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Skye Berghel out of all services on: 20 hosts
  • 08:53 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Skye Berghel out of all services on: 20 hosts
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tom Magerlein out of all services on: 20 hosts
  • 08:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Tom Magerlein out of all services on: 20 hosts
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tom Magerlein out of all services on: 1271 hosts
  • 08:52 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Tom Magerlein out of all services on: 1271 hosts
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Tom Magerlein out of all services on: 760 hosts
  • 08:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Tom Magerlein out of all services on: 760 hosts
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Damiendf out of all services on: 760 hosts
  • 08:51 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Damiendf out of all services on: 760 hosts
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Damiendf out of all services on: 1271 hosts
  • 08:50 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Damiendf out of all services on: 1271 hosts
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Damiendf out of all services on: 20 hosts
  • 08:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Damiendf out of all services on: 20 hosts
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dasm out of all services on: 20 hosts
  • 08:49 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dasm out of all services on: 20 hosts
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging David.pujol out of all services on: 1271 hosts
  • 08:48 jmm@cumin2002: START - Cookbook sre.idm.logout Logging David.pujol out of all services on: 1271 hosts
  • 08:48 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging David.pujol out of all services on: 760 hosts
  • 08:48 jmm@cumin2002: START - Cookbook sre.idm.logout Logging David.pujol out of all services on: 760 hosts
  • 08:48 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dasm out of all services on: 760 hosts
  • 08:47 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dasm out of all services on: 760 hosts
  • 08:47 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dasm out of all services on: 1271 hosts
  • 08:46 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dasm out of all services on: 1271 hosts
  • 08:46 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Dasm out of all services on: 20 hosts
  • 08:46 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Dasm out of all services on: 20 hosts
  • 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Aranyap out of all services on: 20 hosts
  • 08:45 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Aranyap out of all services on: 20 hosts
  • 08:45 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Aranyap out of all services on: 1271 hosts
  • 08:45 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Aranyap out of all services on: 1271 hosts
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Aranyap out of all services on: 760 hosts
  • 08:44 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Aranyap out of all services on: 760 hosts
  • 08:44 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/datahub: sync on main
  • 08:33 btullis@deploy1002: helmfile [staging] START helmfile.d/services/datahub: apply on main
  • 07:54 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: sync
  • 07:54 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: sync
  • 07:32 taavi@deploy1002: Finished scap: Backport for Update plwiki autopromote per consensus (T340397) (duration: 07m 48s)
  • 07:25 taavi@deploy1002: msz2001 and taavi: Backport for Update plwiki autopromote per consensus (T340397) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 07:24 taavi@deploy1002: Started scap: Backport for Update plwiki autopromote per consensus (T340397)
  • 07:22 taavi@deploy1002: Finished scap: Backport for Enable edit-in-sequence in Italian Wikisource (T340847) (duration: 18m 21s)
  • 07:13 taavi@deploy1002: soda and taavi: Backport for Enable edit-in-sequence in Italian Wikisource (T340847) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:04 taavi@deploy1002: Started scap: Backport for Enable edit-in-sequence in Italian Wikisource (T340847)

2023-07-01

Other archives

2000s

2010s

2020s