Jump to content

Server Admin Log/Archive 58

From Wikitech

2022-10-31

  • 22:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T318605)', diff saved to https://phabricator.wikimedia.org/P37282 and previous config saved to /var/cache/conftool/dbconfig/20221031-222151-ladsgroup.json
  • 22:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P37281 and previous config saved to /var/cache/conftool/dbconfig/20221031-220645-ladsgroup.json
  • 21:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P37280 and previous config saved to /var/cache/conftool/dbconfig/20221031-215138-ladsgroup.json
  • 21:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T318605)', diff saved to https://phabricator.wikimedia.org/P37279 and previous config saved to /var/cache/conftool/dbconfig/20221031-213632-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T318605)', diff saved to https://phabricator.wikimedia.org/P37278 and previous config saved to /var/cache/conftool/dbconfig/20221031-212749-ladsgroup.json
  • 21:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 21:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T318605)', diff saved to https://phabricator.wikimedia.org/P37277 and previous config saved to /var/cache/conftool/dbconfig/20221031-212717-ladsgroup.json
  • 21:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P37276 and previous config saved to /var/cache/conftool/dbconfig/20221031-211210-ladsgroup.json
  • 20:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P37275 and previous config saved to /var/cache/conftool/dbconfig/20221031-205703-ladsgroup.json
  • 20:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T318605)', diff saved to https://phabricator.wikimedia.org/P37274 and previous config saved to /var/cache/conftool/dbconfig/20221031-204157-ladsgroup.json
  • 20:33 otto@deploy1002: Synchronized wmf-config/InitialiseSettings.php: No-op sync of InitialiseSettings.php to declare stream rc0.mediawiki.page_change. This stream is disabled everywhere by default, and only enabled in beta for now. - T311129 (duration: 03m 42s)
  • 20:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T318605)', diff saved to https://phabricator.wikimedia.org/P37273 and previous config saved to /var/cache/conftool/dbconfig/20221031-203319-ladsgroup.json
  • 20:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 20:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T318605)', diff saved to https://phabricator.wikimedia.org/P37272 and previous config saved to /var/cache/conftool/dbconfig/20221031-203258-ladsgroup.json
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 20:32 cjming: end of UTC late backport window
  • 20:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 20:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 20:26 cjming@deploy1002: Finished scap: Backport for cirrus: Correct comments in ProductionServices.php (T262630) (duration: 05m 57s)
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 20:20 cjming@deploy1002: cjming and ebernhardson: Backport for cirrus: Correct comments in ProductionServices.php (T262630) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:20 cjming@deploy1002: Started scap: Backport for cirrus: Correct comments in ProductionServices.php (T262630)
  • 20:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 20:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P37271 and previous config saved to /var/cache/conftool/dbconfig/20221031-201751-ladsgroup.json
  • 20:17 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 20:17 cjming@deploy1002: Finished scap: Backport for Update sample rate for edit attempt stream to 1 for group 0. (T312016) (duration: 04m 17s)
  • 20:13 cjming@deploy1002: cjming and cjming: Backport for Update sample rate for edit attempt stream to 1 for group 0. (T312016) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:12 cjming@deploy1002: Started scap: Backport for Update sample rate for edit attempt stream to 1 for group 0. (T312016)
  • 20:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 20:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 20:09 cjming@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on itwiki (T314318) (duration: 05m 44s)
  • 20:03 cjming@deploy1002: cjming and arlolra: Backport for Disable wgParserEnableLegacyMediaDOM on itwiki (T314318) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:03 cjming@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on itwiki (T314318)
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P37270 and previous config saved to /var/cache/conftool/dbconfig/20221031-200245-ladsgroup.json
  • 19:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 19:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 19:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 19:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 19:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T318605)', diff saved to https://phabricator.wikimedia.org/P37269 and previous config saved to /var/cache/conftool/dbconfig/20221031-194738-ladsgroup.json
  • 19:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T318605)', diff saved to https://phabricator.wikimedia.org/P37268 and previous config saved to /var/cache/conftool/dbconfig/20221031-194614-ladsgroup.json
  • 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P37267 and previous config saved to /var/cache/conftool/dbconfig/20221031-193108-ladsgroup.json
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P37265 and previous config saved to /var/cache/conftool/dbconfig/20221031-191601-ladsgroup.json
  • 19:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 19:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 19:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 19:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T318605)', diff saved to https://phabricator.wikimedia.org/P37264 and previous config saved to /var/cache/conftool/dbconfig/20221031-190054-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T318605)', diff saved to https://phabricator.wikimedia.org/P37263 and previous config saved to /var/cache/conftool/dbconfig/20221031-184729-ladsgroup.json
  • 18:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 18:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37262 and previous config saved to /var/cache/conftool/dbconfig/20221031-184707-ladsgroup.json
  • 18:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P37261 and previous config saved to /var/cache/conftool/dbconfig/20221031-183201-ladsgroup.json
  • 18:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T318955)', diff saved to https://phabricator.wikimedia.org/P37260 and previous config saved to /var/cache/conftool/dbconfig/20221031-183052-ladsgroup.json
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P37259 and previous config saved to /var/cache/conftool/dbconfig/20221031-181654-ladsgroup.json
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P37258 and previous config saved to /var/cache/conftool/dbconfig/20221031-181546-ladsgroup.json
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37257 and previous config saved to /var/cache/conftool/dbconfig/20221031-180148-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T318605)', diff saved to https://phabricator.wikimedia.org/P37256 and previous config saved to /var/cache/conftool/dbconfig/20221031-180049-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P37255 and previous config saved to /var/cache/conftool/dbconfig/20221031-180039-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37254 and previous config saved to /var/cache/conftool/dbconfig/20221031-180021-ladsgroup.json
  • 17:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 17:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T318955)', diff saved to https://phabricator.wikimedia.org/P37250 and previous config saved to /var/cache/conftool/dbconfig/20221031-174301-ladsgroup.json
  • 17:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T318955)', diff saved to https://phabricator.wikimedia.org/P37249 and previous config saved to /var/cache/conftool/dbconfig/20221031-174052-ladsgroup.json
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P37248 and previous config saved to /var/cache/conftool/dbconfig/20221031-173008-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P37247 and previous config saved to /var/cache/conftool/dbconfig/20221031-172755-ladsgroup.json
  • 17:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P37246 and previous config saved to /var/cache/conftool/dbconfig/20221031-172545-ladsgroup.json
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37245 and previous config saved to /var/cache/conftool/dbconfig/20221031-171501-ladsgroup.json
  • 17:14 mutante: contint1001 - racadm serveraction powercyle - crashed
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P37244 and previous config saved to /var/cache/conftool/dbconfig/20221031-171248-ladsgroup.json
  • 17:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P37243 and previous config saved to /var/cache/conftool/dbconfig/20221031-171039-ladsgroup.json
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37242 and previous config saved to /var/cache/conftool/dbconfig/20221031-170935-ladsgroup.json
  • 17:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T318605)', diff saved to https://phabricator.wikimedia.org/P37241 and previous config saved to /var/cache/conftool/dbconfig/20221031-170925-ladsgroup.json
  • 17:02 mutante: contint1001 - just went fully down without maintenance work, fortunately 2001 is the prod CI server currently
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T318955)', diff saved to https://phabricator.wikimedia.org/P37240 and previous config saved to /var/cache/conftool/dbconfig/20221031-165742-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T318955)', diff saved to https://phabricator.wikimedia.org/P37239 and previous config saved to /var/cache/conftool/dbconfig/20221031-165532-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T318955)', diff saved to https://phabricator.wikimedia.org/P37238 and previous config saved to /var/cache/conftool/dbconfig/20221031-165532-ladsgroup.json
  • 16:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 16:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 16:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T318955)', diff saved to https://phabricator.wikimedia.org/P37237 and previous config saved to /var/cache/conftool/dbconfig/20221031-165511-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P37236 and previous config saved to /var/cache/conftool/dbconfig/20221031-165418-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T318955)', diff saved to https://phabricator.wikimedia.org/P37235 and previous config saved to /var/cache/conftool/dbconfig/20221031-164431-ladsgroup.json
  • 16:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37234 and previous config saved to /var/cache/conftool/dbconfig/20221031-164409-ladsgroup.json
  • 16:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P37233 and previous config saved to /var/cache/conftool/dbconfig/20221031-164004-ladsgroup.json
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P37232 and previous config saved to /var/cache/conftool/dbconfig/20221031-163912-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P37231 and previous config saved to /var/cache/conftool/dbconfig/20221031-162903-ladsgroup.json
  • 16:25 hashar@deploy1002: Finished deploy [integration/docroot@0ff8642]: build: Use disableProcessTimeout() for serve commands only (duration: 00m 25s)
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P37230 and previous config saved to /var/cache/conftool/dbconfig/20221031-162458-ladsgroup.json
  • 16:24 hashar@deploy1002: Started deploy [integration/docroot@0ff8642]: build: Use disableProcessTimeout() for serve commands only
  • 16:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T318605)', diff saved to https://phabricator.wikimedia.org/P37229 and previous config saved to /var/cache/conftool/dbconfig/20221031-162405-ladsgroup.json
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37228 and previous config saved to /var/cache/conftool/dbconfig/20221031-162311-ladsgroup.json
  • 16:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T318605)', diff saved to https://phabricator.wikimedia.org/P37227 and previous config saved to /var/cache/conftool/dbconfig/20221031-162249-ladsgroup.json
  • 16:17 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet,service=varnish-fe
  • 16:17 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet,service=ats-be
  • 16:17 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet,service=ats-tls
  • 16:15 aborrero@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T318605)', diff saved to https://phabricator.wikimedia.org/P37226 and previous config saved to /var/cache/conftool/dbconfig/20221031-161448-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T318605)', diff saved to https://phabricator.wikimedia.org/P37225 and previous config saved to /var/cache/conftool/dbconfig/20221031-161426-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P37224 and previous config saved to /var/cache/conftool/dbconfig/20221031-161356-ladsgroup.json
  • 16:13 aborrero@cumin1001: START - Cookbook sre.dns.netbox
  • 16:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T318955)', diff saved to https://phabricator.wikimedia.org/P37223 and previous config saved to /var/cache/conftool/dbconfig/20221031-160951-ladsgroup.json
  • 16:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P37222 and previous config saved to /var/cache/conftool/dbconfig/20221031-160743-ladsgroup.json
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T318955)', diff saved to https://phabricator.wikimedia.org/P37221 and previous config saved to /var/cache/conftool/dbconfig/20221031-160641-ladsgroup.json
  • 16:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 16:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T318955)', diff saved to https://phabricator.wikimedia.org/P37220 and previous config saved to /var/cache/conftool/dbconfig/20221031-160620-ladsgroup.json
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P37219 and previous config saved to /var/cache/conftool/dbconfig/20221031-155919-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37218 and previous config saved to /var/cache/conftool/dbconfig/20221031-155850-ladsgroup.json
  • 15:56 ryankemper: [Elastic] `ryankemper@elastic2052:~$ sudo reboot` to grab latest kernel
  • 15:54 ryankemper: [Elastic] `ryankemper@elastic2043:~$ sudo pool` (cluster back to green and DIMM A2 has been switched out by dc-ops); marked as `Active` in netbox
  • 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.drmrs.wmnet,service=varnish-fe
  • 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.drmrs.wmnet,service=ats-be
  • 15:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.drmrs.wmnet,service=ats-tls
  • 15:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P37217 and previous config saved to /var/cache/conftool/dbconfig/20221031-155236-ladsgroup.json
  • 15:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P37216 and previous config saved to /var/cache/conftool/dbconfig/20221031-155113-ladsgroup.json
  • 15:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.drmrs.wmnet,service=varnish-fe
  • 15:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.drmrs.wmnet,service=ats-be
  • 15:50 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.drmrs.wmnet,service=ats-tls
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37215 and previous config saved to /var/cache/conftool/dbconfig/20221031-154638-ladsgroup.json
  • 15:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37214 and previous config saved to /var/cache/conftool/dbconfig/20221031-154627-ladsgroup.json
  • 15:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P37213 and previous config saved to /var/cache/conftool/dbconfig/20221031-154413-ladsgroup.json
  • 15:43 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 03m 34s)
  • 15:39 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 03m 43s)
  • 15:39 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:38 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:38 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 15:37 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 15:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T318605)', diff saved to https://phabricator.wikimedia.org/P37212 and previous config saved to /var/cache/conftool/dbconfig/20221031-153730-ladsgroup.json
  • 15:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P37211 and previous config saved to /var/cache/conftool/dbconfig/20221031-153607-ladsgroup.json
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P37210 and previous config saved to /var/cache/conftool/dbconfig/20221031-153121-ladsgroup.json
  • 15:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T318605)', diff saved to https://phabricator.wikimedia.org/P37209 and previous config saved to /var/cache/conftool/dbconfig/20221031-152906-ladsgroup.json
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T318955)', diff saved to https://phabricator.wikimedia.org/P37206 and previous config saved to /var/cache/conftool/dbconfig/20221031-151851-ladsgroup.json
  • 15:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P37205 and previous config saved to /var/cache/conftool/dbconfig/20221031-151612-ladsgroup.json
  • 15:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37204 and previous config saved to /var/cache/conftool/dbconfig/20221031-150919-ladsgroup.json
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 75%: Maint done', diff saved to https://phabricator.wikimedia.org/P37203 and previous config saved to /var/cache/conftool/dbconfig/20221031-150517-ladsgroup.json
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37202 and previous config saved to /var/cache/conftool/dbconfig/20221031-150105-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P37201 and previous config saved to /var/cache/conftool/dbconfig/20221031-145413-ladsgroup.json
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 50%: Maint done', diff saved to https://phabricator.wikimedia.org/P37200 and previous config saved to /var/cache/conftool/dbconfig/20221031-145012-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37199 and previous config saved to /var/cache/conftool/dbconfig/20221031-144840-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T318955)', diff saved to https://phabricator.wikimedia.org/P37198 and previous config saved to /var/cache/conftool/dbconfig/20221031-144819-ladsgroup.json
  • 14:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T318605)', diff saved to https://phabricator.wikimedia.org/P37197 and previous config saved to /var/cache/conftool/dbconfig/20221031-144511-ladsgroup.json
  • 14:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 14:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 14:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37196 and previous config saved to /var/cache/conftool/dbconfig/20221031-144449-ladsgroup.json
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P37195 and previous config saved to /var/cache/conftool/dbconfig/20221031-143906-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1100 (re)pooling @ 25%: Maint done', diff saved to https://phabricator.wikimedia.org/P37194 and previous config saved to /var/cache/conftool/dbconfig/20221031-143507-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T318605)', diff saved to https://phabricator.wikimedia.org/P37193 and previous config saved to /var/cache/conftool/dbconfig/20221031-143458-ladsgroup.json
  • 14:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 14:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37192 and previous config saved to /var/cache/conftool/dbconfig/20221031-143430-ladsgroup.json
  • 14:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P37191 and previous config saved to /var/cache/conftool/dbconfig/20221031-143312-ladsgroup.json
  • 14:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P37190 and previous config saved to /var/cache/conftool/dbconfig/20221031-142942-ladsgroup.json
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37189 and previous config saved to /var/cache/conftool/dbconfig/20221031-142400-ladsgroup.json
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P37188 and previous config saved to /var/cache/conftool/dbconfig/20221031-141924-ladsgroup.json
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P37187 and previous config saved to /var/cache/conftool/dbconfig/20221031-141806-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37186 and previous config saved to /var/cache/conftool/dbconfig/20221031-141701-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P37185 and previous config saved to /var/cache/conftool/dbconfig/20221031-141436-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37184 and previous config saved to /var/cache/conftool/dbconfig/20221031-141404-ladsgroup.json
  • 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T318955)', diff saved to https://phabricator.wikimedia.org/P37183 and previous config saved to /var/cache/conftool/dbconfig/20221031-141342-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P37182 and previous config saved to /var/cache/conftool/dbconfig/20221031-140417-ladsgroup.json
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T318955)', diff saved to https://phabricator.wikimedia.org/P37181 and previous config saved to /var/cache/conftool/dbconfig/20221031-140259-ladsgroup.json
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P37180 and previous config saved to /var/cache/conftool/dbconfig/20221031-140153-ladsgroup.json
  • 13:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37179 and previous config saved to /var/cache/conftool/dbconfig/20221031-135929-ladsgroup.json
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P37178 and previous config saved to /var/cache/conftool/dbconfig/20221031-135836-ladsgroup.json
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T318955)', diff saved to https://phabricator.wikimedia.org/P37177 and previous config saved to /var/cache/conftool/dbconfig/20221031-135039-ladsgroup.json
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 13:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T318955)', diff saved to https://phabricator.wikimedia.org/P37176 and previous config saved to /var/cache/conftool/dbconfig/20221031-135013-ladsgroup.json
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37175 and previous config saved to /var/cache/conftool/dbconfig/20221031-134911-ladsgroup.json
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T318955)', diff saved to https://phabricator.wikimedia.org/P37170 and previous config saved to /var/cache/conftool/dbconfig/20221031-132823-ladsgroup.json
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:10 urbanecm@deploy1002: urbanecm and mlitn: Backport for Update i18n for ca, nb, fi & hu (T300064) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P37165 and previous config saved to /var/cache/conftool/dbconfig/20221031-131028-ladsgroup.json
  • 13:10 urbanecm@deploy1002: Started scap: Backport for Update i18n for ca, nb, fi & hu (T300064)
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P37164 and previous config saved to /var/cache/conftool/dbconfig/20221031-130834-marostegui.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37163 and previous config saved to /var/cache/conftool/dbconfig/20221031-130651-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T318605)', diff saved to https://phabricator.wikimedia.org/P37162 and previous config saved to /var/cache/conftool/dbconfig/20221031-130629-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T318955)', diff saved to https://phabricator.wikimedia.org/P37161 and previous config saved to /var/cache/conftool/dbconfig/20221031-130454-ladsgroup.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P37160 and previous config saved to /var/cache/conftool/dbconfig/20221031-130244-marostegui.json
  • 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37159 and previous config saved to /var/cache/conftool/dbconfig/20221031-130217-ladsgroup.json
  • 13:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db1098:3316', diff saved to https://phabricator.wikimedia.org/P37158 and previous config saved to /var/cache/conftool/dbconfig/20221031-130016-marostegui.json
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P37157 and previous config saved to /var/cache/conftool/dbconfig/20221031-125521-ladsgroup.json
  • 12:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37156 and previous config saved to /var/cache/conftool/dbconfig/20221031-125509-ladsgroup.json
  • 12:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T318955)', diff saved to https://phabricator.wikimedia.org/P37155 and previous config saved to /var/cache/conftool/dbconfig/20221031-125350-ladsgroup.json
  • 12:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 12:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T318955)', diff saved to https://phabricator.wikimedia.org/P37154 and previous config saved to /var/cache/conftool/dbconfig/20221031-125329-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P37153 and previous config saved to /var/cache/conftool/dbconfig/20221031-125123-ladsgroup.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P37152 and previous config saved to /var/cache/conftool/dbconfig/20221031-124836-marostegui.json
  • 12:47 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P37151 and previous config saved to /var/cache/conftool/dbconfig/20221031-124711-ladsgroup.json
  • 12:46 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 12:42 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 12:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T318955)', diff saved to https://phabricator.wikimedia.org/P37150 and previous config saved to /var/cache/conftool/dbconfig/20221031-124015-ladsgroup.json
  • 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P37149 and previous config saved to /var/cache/conftool/dbconfig/20221031-123822-ladsgroup.json
  • 12:36 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 12:36 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P37148 and previous config saved to /var/cache/conftool/dbconfig/20221031-123616-ladsgroup.json
  • 12:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37147 and previous config saved to /var/cache/conftool/dbconfig/20221031-123330-marostegui.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37146 and previous config saved to /var/cache/conftool/dbconfig/20221031-123222-marostegui.json
  • 12:32 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:32 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37145 and previous config saved to /var/cache/conftool/dbconfig/20221031-123211-marostegui.json
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P37144 and previous config saved to /var/cache/conftool/dbconfig/20221031-123204-ladsgroup.json
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P37143 and previous config saved to /var/cache/conftool/dbconfig/20221031-122314-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T318605)', diff saved to https://phabricator.wikimedia.org/P37142 and previous config saved to /var/cache/conftool/dbconfig/20221031-122109-ladsgroup.json
  • 12:18 gehel: repooling wdqs1007 - catched up on lag - T322010
  • 12:17 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P37141 and previous config saved to /var/cache/conftool/dbconfig/20221031-121705-marostegui.json
  • 12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37140 and previous config saved to /var/cache/conftool/dbconfig/20221031-121658-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T318605)', diff saved to https://phabricator.wikimedia.org/P37139 and previous config saved to /var/cache/conftool/dbconfig/20221031-121108-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 12:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T318605)', diff saved to https://phabricator.wikimedia.org/P37138 and previous config saved to /var/cache/conftool/dbconfig/20221031-121043-ladsgroup.json
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T318955)', diff saved to https://phabricator.wikimedia.org/P37137 and previous config saved to /var/cache/conftool/dbconfig/20221031-120807-ladsgroup.json
  • 12:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T318605)', diff saved to https://phabricator.wikimedia.org/P37136 and previous config saved to /var/cache/conftool/dbconfig/20221031-120644-ladsgroup.json
  • 12:01 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P37135 and previous config saved to /var/cache/conftool/dbconfig/20221031-120158-marostegui.json
  • 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T318955)', diff saved to https://phabricator.wikimedia.org/P37134 and previous config saved to /var/cache/conftool/dbconfig/20221031-115639-ladsgroup.json
  • 11:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T318955)', diff saved to https://phabricator.wikimedia.org/P37133 and previous config saved to /var/cache/conftool/dbconfig/20221031-115618-ladsgroup.json
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P37132 and previous config saved to /var/cache/conftool/dbconfig/20221031-115536-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P37131 and previous config saved to /var/cache/conftool/dbconfig/20221031-115138-ladsgroup.json
  • 11:46 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37130 and previous config saved to /var/cache/conftool/dbconfig/20221031-114652-marostegui.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37129 and previous config saved to /var/cache/conftool/dbconfig/20221031-114443-marostegui.json
  • 11:44 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 11:44 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T321123)', diff saved to https://phabricator.wikimedia.org/P37128 and previous config saved to /var/cache/conftool/dbconfig/20221031-114337-marostegui.json
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P37127 and previous config saved to /var/cache/conftool/dbconfig/20221031-114111-ladsgroup.json
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P37126 and previous config saved to /var/cache/conftool/dbconfig/20221031-114030-ladsgroup.json
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T318955)', diff saved to https://phabricator.wikimedia.org/P37125 and previous config saved to /var/cache/conftool/dbconfig/20221031-113959-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T318955)', diff saved to https://phabricator.wikimedia.org/P37124 and previous config saved to /var/cache/conftool/dbconfig/20221031-113938-ladsgroup.json
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P37123 and previous config saved to /var/cache/conftool/dbconfig/20221031-113631-ladsgroup.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P37122 and previous config saved to /var/cache/conftool/dbconfig/20221031-112831-marostegui.json
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P37121 and previous config saved to /var/cache/conftool/dbconfig/20221031-112605-ladsgroup.json
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T318605)', diff saved to https://phabricator.wikimedia.org/P37120 and previous config saved to /var/cache/conftool/dbconfig/20221031-112523-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P37119 and previous config saved to /var/cache/conftool/dbconfig/20221031-112431-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T318605)', diff saved to https://phabricator.wikimedia.org/P37118 and previous config saved to /var/cache/conftool/dbconfig/20221031-112125-ladsgroup.json
  • 11:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37117 and previous config saved to /var/cache/conftool/dbconfig/20221031-111641-ladsgroup.json
  • 11:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 11:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P37116 and previous config saved to /var/cache/conftool/dbconfig/20221031-111324-marostegui.json
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T318605)', diff saved to https://phabricator.wikimedia.org/P37115 and previous config saved to /var/cache/conftool/dbconfig/20221031-111153-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37114 and previous config saved to /var/cache/conftool/dbconfig/20221031-111132-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T318955)', diff saved to https://phabricator.wikimedia.org/P37113 and previous config saved to /var/cache/conftool/dbconfig/20221031-111058-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P37112 and previous config saved to /var/cache/conftool/dbconfig/20221031-110925-ladsgroup.json
  • 11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T318955)', diff saved to https://phabricator.wikimedia.org/P37111 and previous config saved to /var/cache/conftool/dbconfig/20221031-110003-ladsgroup.json
  • 10:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 10:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T318955)', diff saved to https://phabricator.wikimedia.org/P37110 and previous config saved to /var/cache/conftool/dbconfig/20221031-105941-ladsgroup.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T321123)', diff saved to https://phabricator.wikimedia.org/P37109 and previous config saved to /var/cache/conftool/dbconfig/20221031-105818-marostegui.json
  • 10:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P37108 and previous config saved to /var/cache/conftool/dbconfig/20221031-105625-ladsgroup.json
  • 10:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 10:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37107 and previous config saved to /var/cache/conftool/dbconfig/20221031-105605-marostegui.json
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T318955)', diff saved to https://phabricator.wikimedia.org/P37106 and previous config saved to /var/cache/conftool/dbconfig/20221031-105418-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P37105 and previous config saved to /var/cache/conftool/dbconfig/20221031-104435-ladsgroup.json
  • 10:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37104 and previous config saved to /var/cache/conftool/dbconfig/20221031-104415-ladsgroup.json
  • 10:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T318955)', diff saved to https://phabricator.wikimedia.org/P37103 and previous config saved to /var/cache/conftool/dbconfig/20221031-104238-ladsgroup.json
  • 10:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 10:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37102 and previous config saved to /var/cache/conftool/dbconfig/20221031-104217-ladsgroup.json
  • 10:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P37101 and previous config saved to /var/cache/conftool/dbconfig/20221031-104119-ladsgroup.json
  • 10:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P37100 and previous config saved to /var/cache/conftool/dbconfig/20221031-104059-marostegui.json
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P37099 and previous config saved to /var/cache/conftool/dbconfig/20221031-102928-ladsgroup.json
  • 10:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P37098 and previous config saved to /var/cache/conftool/dbconfig/20221031-102908-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P37097 and previous config saved to /var/cache/conftool/dbconfig/20221031-102710-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T318605)', diff saved to https://phabricator.wikimedia.org/P37096 and previous config saved to /var/cache/conftool/dbconfig/20221031-102627-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37095 and previous config saved to /var/cache/conftool/dbconfig/20221031-102612-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T318605)', diff saved to https://phabricator.wikimedia.org/P37094 and previous config saved to /var/cache/conftool/dbconfig/20221031-102606-ladsgroup.json
  • 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P37093 and previous config saved to /var/cache/conftool/dbconfig/20221031-102552-marostegui.json
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T318955)', diff saved to https://phabricator.wikimedia.org/P37092 and previous config saved to /var/cache/conftool/dbconfig/20221031-101422-ladsgroup.json
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P37091 and previous config saved to /var/cache/conftool/dbconfig/20221031-101402-ladsgroup.json
  • 10:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P37090 and previous config saved to /var/cache/conftool/dbconfig/20221031-101203-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P37089 and previous config saved to /var/cache/conftool/dbconfig/20221031-101059-ladsgroup.json
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37088 and previous config saved to /var/cache/conftool/dbconfig/20221031-101046-marostegui.json
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37087 and previous config saved to /var/cache/conftool/dbconfig/20221031-100935-marostegui.json
  • 10:09 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37086 and previous config saved to /var/cache/conftool/dbconfig/20221031-100913-marostegui.json
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T318955)', diff saved to https://phabricator.wikimedia.org/P37085 and previous config saved to /var/cache/conftool/dbconfig/20221031-100316-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T318955)', diff saved to https://phabricator.wikimedia.org/P37084 and previous config saved to /var/cache/conftool/dbconfig/20221031-100255-ladsgroup.json
  • 09:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37083 and previous config saved to /var/cache/conftool/dbconfig/20221031-095855-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37082 and previous config saved to /var/cache/conftool/dbconfig/20221031-095657-ladsgroup.json
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P37081 and previous config saved to /var/cache/conftool/dbconfig/20221031-095551-ladsgroup.json
  • 09:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P37080 and previous config saved to /var/cache/conftool/dbconfig/20221031-095407-marostegui.json
  • 09:52 gehel: depooling wdqs1007 while it catches up on lag - T322010
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P37079 and previous config saved to /var/cache/conftool/dbconfig/20221031-094748-ladsgroup.json
  • 09:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37078 and previous config saved to /var/cache/conftool/dbconfig/20221031-094501-ladsgroup.json
  • 09:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 09:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37077 and previous config saved to /var/cache/conftool/dbconfig/20221031-094439-ladsgroup.json
  • 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T318605)', diff saved to https://phabricator.wikimedia.org/P37076 and previous config saved to /var/cache/conftool/dbconfig/20221031-094045-ladsgroup.json
  • 09:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P37075 and previous config saved to /var/cache/conftool/dbconfig/20221031-093900-marostegui.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P37074 and previous config saved to /var/cache/conftool/dbconfig/20221031-093242-ladsgroup.json
  • 09:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T318605)', diff saved to https://phabricator.wikimedia.org/P37073 and previous config saved to /var/cache/conftool/dbconfig/20221031-093102-ladsgroup.json
  • 09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T318605)', diff saved to https://phabricator.wikimedia.org/P37072 and previous config saved to /var/cache/conftool/dbconfig/20221031-093012-ladsgroup.json
  • 09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P37071 and previous config saved to /var/cache/conftool/dbconfig/20221031-092933-ladsgroup.json
  • 09:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37070 and previous config saved to /var/cache/conftool/dbconfig/20221031-092354-marostegui.json
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T321123)', diff saved to https://phabricator.wikimedia.org/P37069 and previous config saved to /var/cache/conftool/dbconfig/20221031-092242-marostegui.json
  • 09:22 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 09:22 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T321123)', diff saved to https://phabricator.wikimedia.org/P37068 and previous config saved to /var/cache/conftool/dbconfig/20221031-092221-marostegui.json
  • 09:17 Emperor: set thanos ring replicas to 3.40 T311690
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T318955)', diff saved to https://phabricator.wikimedia.org/P37067 and previous config saved to /var/cache/conftool/dbconfig/20221031-091735-ladsgroup.json
  • 09:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P37066 and previous config saved to /var/cache/conftool/dbconfig/20221031-091426-ladsgroup.json
  • 09:07 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P37065 and previous config saved to /var/cache/conftool/dbconfig/20221031-090714-marostegui.json
  • 09:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T318955)', diff saved to https://phabricator.wikimedia.org/P37064 and previous config saved to /var/cache/conftool/dbconfig/20221031-090640-ladsgroup.json
  • 09:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 09:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37063 and previous config saved to /var/cache/conftool/dbconfig/20221031-085920-ladsgroup.json
  • 08:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P37062 and previous config saved to /var/cache/conftool/dbconfig/20221031-085839-ladsgroup.json
  • 08:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 08:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 08:52 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P37061 and previous config saved to /var/cache/conftool/dbconfig/20221031-085208-marostegui.json
  • 08:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 08:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 08:48 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 08:48 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T318955)', diff saved to https://phabricator.wikimedia.org/P37060 and previous config saved to /var/cache/conftool/dbconfig/20221031-084751-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T321123)', diff saved to https://phabricator.wikimedia.org/P37059 and previous config saved to /var/cache/conftool/dbconfig/20221031-083701-marostegui.json
  • 08:34 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T321123)', diff saved to https://phabricator.wikimedia.org/P37058 and previous config saved to /var/cache/conftool/dbconfig/20221031-083449-marostegui.json
  • 08:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:34 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 08:34 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 08:33 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:33 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 08:33 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T321123)', diff saved to https://phabricator.wikimedia.org/P37057 and previous config saved to /var/cache/conftool/dbconfig/20221031-083342-marostegui.json
  • 08:18 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P37056 and previous config saved to /var/cache/conftool/dbconfig/20221031-081836-marostegui.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P37055 and previous config saved to /var/cache/conftool/dbconfig/20221031-080329-marostegui.json
  • 07:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T321123)', diff saved to https://phabricator.wikimedia.org/P37054 and previous config saved to /var/cache/conftool/dbconfig/20221031-074823-marostegui.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T321123)', diff saved to https://phabricator.wikimedia.org/P37053 and previous config saved to /var/cache/conftool/dbconfig/20221031-074611-marostegui.json
  • 07:46 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 07:45 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T321123)', diff saved to https://phabricator.wikimedia.org/P37052 and previous config saved to /var/cache/conftool/dbconfig/20221031-074549-marostegui.json
  • 07:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 07:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P37051 and previous config saved to /var/cache/conftool/dbconfig/20221031-073042-marostegui.json
  • 07:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P37050 and previous config saved to /var/cache/conftool/dbconfig/20221031-071536-marostegui.json
  • 07:14 kartik@deploy1002: Finished scap: Backport for Enable Section Translation in Hawaiian, Pashto and Xhosa WPs (T317289) (duration: 06m 48s)
  • 07:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 07:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 07:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 07:08 kartik@deploy1002: kartik and kartik: Backport for Enable Section Translation in Hawaiian, Pashto and Xhosa WPs (T317289) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:07 kartik@deploy1002: Started scap: Backport for Enable Section Translation in Hawaiian, Pashto and Xhosa WPs (T317289)
  • 07:02 ryankemper: [WDQS] `ryankemper@wdqs1007:~$ sudo systemctl restart wdqs-blazegraph.service`
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T321123)', diff saved to https://phabricator.wikimedia.org/P37049 and previous config saved to /var/cache/conftool/dbconfig/20221031-070029-marostegui.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T321123)', diff saved to https://phabricator.wikimedia.org/P37048 and previous config saved to /var/cache/conftool/dbconfig/20221031-065817-marostegui.json
  • 06:58 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 06:58 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T321123)', diff saved to https://phabricator.wikimedia.org/P37047 and previous config saved to /var/cache/conftool/dbconfig/20221031-065756-marostegui.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P37046 and previous config saved to /var/cache/conftool/dbconfig/20221031-064249-marostegui.json
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T321123)', diff saved to https://phabricator.wikimedia.org/P37044 and previous config saved to /var/cache/conftool/dbconfig/20221031-061236-marostegui.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T321123)', diff saved to https://phabricator.wikimedia.org/P37043 and previous config saved to /var/cache/conftool/dbconfig/20221031-061026-marostegui.json
  • 06:10 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 06:09 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 06:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 06:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1122.eqiad.wmnet with reason: Maintenance
  • 05:42 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 05:41 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply

2022-10-29

  • 11:25 taavi: deploy patch for T321971
  • 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply

2022-10-28

  • 20:42 mutante: clouddumps* - deployed gerrit:848444 - as kind of expected it fails - most likely the project dirs are not automatically created before rsync runs the first time - T57503
  • 20:37 mutante: clouddumps1001 - puppet run after merging gerrit:848441 for kiwix, changed ferm status from "stopped" to "running". manually ran 'sudo systemctl start kiwix-mirror-update' T57503
  • 19:17 mutante: contint* - changing source for scap repo to gitlab - gerrit:850246 T321847
  • 18:54 ebernhardson@deploy1002: Finished deploy [wikimedia/discovery/analytics@2326f9c]: Import cirrus indexes to hdfs (duration: 02m 07s)
  • 18:52 ebernhardson@deploy1002: Started deploy [wikimedia/discovery/analytics@2326f9c]: Import cirrus indexes to hdfs
  • 18:44 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:11 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 18:08 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 18:08 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 17:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
  • 17:31 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4052.ulsfo.wmnet,service=varnish-fe
  • 17:31 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4052.ulsfo.wmnet,service=ats-tls
  • 17:31 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4052.ulsfo.wmnet,service=ats-be
  • 17:28 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4052.ulsfo.wmnet with OS buster
  • 17:09 cparle@deploy1002: Finished deploy [airflow-dags/platform_eng@c849762]: (no justification provided) (duration: 00m 05s)
  • 17:09 cparle@deploy1002: Started deploy [airflow-dags/platform_eng@c849762]: (no justification provided)
  • 17:07 xcollazo@deploy1002: Finished deploy [airflow-dags/platform_eng@c849762]: (no justification provided) (duration: 00m 11s)
  • 17:07 xcollazo@deploy1002: Started deploy [airflow-dags/platform_eng@c849762]: (no justification provided)
  • 17:02 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 16:57 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4052.ulsfo.wmnet with reason: host reimage
  • 16:38 mforns@deploy1002: Finished deploy [airflow-dags/analytics@62b4181]: testing scap since we are having problems with other instances (duration: 00m 04s)
  • 16:38 mforns@deploy1002: Started deploy [airflow-dags/analytics@62b4181]: testing scap since we are having problems with other instances
  • 16:31 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 16:31 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:31 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 16:31 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T321123)', diff saved to https://phabricator.wikimedia.org/P37038 and previous config saved to /var/cache/conftool/dbconfig/20221028-163102-marostegui.json
  • 16:29 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4052.ulsfo.wmnet with OS buster
  • 16:29 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 16:27 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4052.ulsfo.wmnet with OS buster
  • 16:27 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 16:24 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:22 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:21 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 16:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:16 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P37037 and previous config saved to /var/cache/conftool/dbconfig/20221028-161555-marostegui.json
  • 16:14 cjming: deployed ReadingLists on beta cluster for authenticated users - https://gerrit.wikimedia.org/r/850516 (https://phabricator.wikimedia.org/T317935)
  • 16:13 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 16:07 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4010.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:06 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs4010.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:05 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:04 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:04 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4009.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:03 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs4009.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:02 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 16:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P37036 and previous config saved to /var/cache/conftool/dbconfig/20221028-160047-marostegui.json
  • 15:50 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 15:50 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 15:49 robh@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 15:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T321123)', diff saved to https://phabricator.wikimedia.org/P37035 and previous config saved to /var/cache/conftool/dbconfig/20221028-154541-marostegui.json
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1197 (T321123)', diff saved to https://phabricator.wikimedia.org/P37034 and previous config saved to /var/cache/conftool/dbconfig/20221028-154328-marostegui.json
  • 15:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 15:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 15:43 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T321123)', diff saved to https://phabricator.wikimedia.org/P37033 and previous config saved to /var/cache/conftool/dbconfig/20221028-154307-marostegui.json
  • 15:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P37031 and previous config saved to /var/cache/conftool/dbconfig/20221028-152800-marostegui.json
  • 15:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P37030 and previous config saved to /var/cache/conftool/dbconfig/20221028-151252-marostegui.json
  • 15:12 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 15:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 14:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T321123)', diff saved to https://phabricator.wikimedia.org/P37029 and previous config saved to /var/cache/conftool/dbconfig/20221028-145746-marostegui.json
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T321123)', diff saved to https://phabricator.wikimedia.org/P37028 and previous config saved to /var/cache/conftool/dbconfig/20221028-145533-marostegui.json
  • 14:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 14:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 14:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T321123)', diff saved to https://phabricator.wikimedia.org/P37027 and previous config saved to /var/cache/conftool/dbconfig/20221028-145512-marostegui.json
  • 14:53 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 14:42 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 14:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P37026 and previous config saved to /var/cache/conftool/dbconfig/20221028-144005-marostegui.json
  • 14:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain
  • 14:37 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to plain
  • 14:24 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P37025 and previous config saved to /var/cache/conftool/dbconfig/20221028-142459-marostegui.json
  • 14:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T321123)', diff saved to https://phabricator.wikimedia.org/P37024 and previous config saved to /var/cache/conftool/dbconfig/20221028-140952-marostegui.json
  • 14:06 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T321123)', diff saved to https://phabricator.wikimedia.org/P37023 and previous config saved to /var/cache/conftool/dbconfig/20221028-140613-marostegui.json
  • 14:06 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 14:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P37022 and previous config saved to /var/cache/conftool/dbconfig/20221028-140552-marostegui.json
  • 13:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P37021 and previous config saved to /var/cache/conftool/dbconfig/20221028-135045-marostegui.json
  • 13:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P37020 and previous config saved to /var/cache/conftool/dbconfig/20221028-133538-marostegui.json
  • 13:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd
  • 13:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P37019 and previous config saved to /var/cache/conftool/dbconfig/20221028-132032-marostegui.json
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P37018 and previous config saved to /var/cache/conftool/dbconfig/20221028-131920-marostegui.json
  • 13:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P37017 and previous config saved to /var/cache/conftool/dbconfig/20221028-131905-root.json
  • 13:19 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T321123)', diff saved to https://phabricator.wikimedia.org/P37016 and previous config saved to /var/cache/conftool/dbconfig/20221028-131858-marostegui.json
  • 13:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P37015 and previous config saved to /var/cache/conftool/dbconfig/20221028-131353-root.json
  • 13:12 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1002.eqiad.wmnet to drbd
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P37014 and previous config saved to /var/cache/conftool/dbconfig/20221028-130851-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P37009 and previous config saved to /var/cache/conftool/dbconfig/20221028-124849-root.json
  • 12:48 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P37008 and previous config saved to /var/cache/conftool/dbconfig/20221028-124845-marostegui.json
  • 12:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P37007 and previous config saved to /var/cache/conftool/dbconfig/20221028-124343-root.json
  • 12:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P37006 and previous config saved to /var/cache/conftool/dbconfig/20221028-123842-root.json
  • 12:15 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P36998 and previous config saved to /var/cache/conftool/dbconfig/20221028-121557-marostegui.json
  • 12:13 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36997 and previous config saved to /var/cache/conftool/dbconfig/20221028-121333-root.json
  • 12:08 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36996 and previous config saved to /var/cache/conftool/dbconfig/20221028-120832-root.json
  • 12:03 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36995 and previous config saved to /var/cache/conftool/dbconfig/20221028-120334-root.json
  • 12:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P36994 and previous config saved to /var/cache/conftool/dbconfig/20221028-120050-marostegui.json
  • 11:58 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36993 and previous config saved to /var/cache/conftool/dbconfig/20221028-115828-root.json
  • 11:53 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36992 and previous config saved to /var/cache/conftool/dbconfig/20221028-115327-root.json
  • 11:48 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36991 and previous config saved to /var/cache/conftool/dbconfig/20221028-114829-root.json
  • 11:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T321123)', diff saved to https://phabricator.wikimedia.org/P36990 and previous config saved to /var/cache/conftool/dbconfig/20221028-114544-marostegui.json
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T321123)', diff saved to https://phabricator.wikimedia.org/P36989 and previous config saved to /var/cache/conftool/dbconfig/20221028-114332-marostegui.json
  • 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36988 and previous config saved to /var/cache/conftool/dbconfig/20221028-114323-root.json
  • 11:43 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 11:43 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 11:42 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36987 and previous config saved to /var/cache/conftool/dbconfig/20221028-114253-marostegui.json
  • 11:38 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36986 and previous config saved to /var/cache/conftool/dbconfig/20221028-113822-root.json
  • 11:33 marostegui@cumin1001: dbctl commit (dc=all): 'es1028 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36985 and previous config saved to /var/cache/conftool/dbconfig/20221028-113324-root.json
  • 11:28 marostegui@cumin1001: dbctl commit (dc=all): 'es1027 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36984 and previous config saved to /var/cache/conftool/dbconfig/20221028-112818-root.json
  • 11:27 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P36983 and previous config saved to /var/cache/conftool/dbconfig/20221028-112746-marostegui.json
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti4003.ulsfo.wmnet
  • 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:23 marostegui@cumin1001: dbctl commit (dc=all): 'es1026 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36982 and previous config saved to /var/cache/conftool/dbconfig/20221028-112317-root.json
  • 11:20 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:18 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1026 es1027 es1028 for upgrade', diff saved to https://phabricator.wikimedia.org/P36981 and previous config saved to /var/cache/conftool/dbconfig/20221028-111805-root.json
  • 11:17 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es1029 as es1 master, es1030 as es2 master, es1031 as es3 master', diff saved to https://phabricator.wikimedia.org/P36980 and previous config saved to /var/cache/conftool/dbconfig/20221028-111707-marostegui.json
  • 11:15 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:12 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P36979 and previous config saved to /var/cache/conftool/dbconfig/20221028-111240-marostegui.json
  • 11:11 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4003.ulsfo.wmnet
  • 11:05 cparle@deploy1002: Finished deploy [airflow-dags/platform_eng@c849762]: (no justification provided) (duration: 00m 15s)
  • 11:05 cparle@deploy1002: Started deploy [airflow-dags/platform_eng@c849762]: (no justification provided)
  • 10:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti4003.ulsfo.wmnet with reason: Remove from cluster for eventual decom
  • 10:57 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti4003.ulsfo.wmnet with reason: Remove from cluster for eventual decom
  • 10:57 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36978 and previous config saved to /var/cache/conftool/dbconfig/20221028-105733-marostegui.json
  • 10:55 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1146:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36977 and previous config saved to /var/cache/conftool/dbconfig/20221028-105520-marostegui.json
  • 10:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1146.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T321123)', diff saved to https://phabricator.wikimedia.org/P36976 and previous config saved to /var/cache/conftool/dbconfig/20221028-105438-marostegui.json
  • 10:39 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P36975 and previous config saved to /var/cache/conftool/dbconfig/20221028-103932-marostegui.json
  • 10:14 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 10:09 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T321123)', diff saved to https://phabricator.wikimedia.org/P36973 and previous config saved to /var/cache/conftool/dbconfig/20221028-100918-marostegui.json
  • 10:07 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T321123)', diff saved to https://phabricator.wikimedia.org/P36972 and previous config saved to /var/cache/conftool/dbconfig/20221028-100706-marostegui.json
  • 10:07 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 10:06 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36971 and previous config saved to /var/cache/conftool/dbconfig/20221028-100644-marostegui.json
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow1002.eqiad.wmnet
  • 09:53 moritzm: drain ganeti4003 for eventual decom T317247
  • 09:51 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P36970 and previous config saved to /var/cache/conftool/dbconfig/20221028-095138-marostegui.json
  • 09:46 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM netflow1002.eqiad.wmnet
  • 09:36 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P36969 and previous config saved to /var/cache/conftool/dbconfig/20221028-093631-marostegui.json
  • 09:21 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36968 and previous config saved to /var/cache/conftool/dbconfig/20221028-092125-marostegui.json
  • 09:19 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36967 and previous config saved to /var/cache/conftool/dbconfig/20221028-091912-marostegui.json
  • 09:19 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:18 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo and group 1
  • 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 09:18 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 09:18 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 09:17 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo and group 1
  • 09:15 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 09:15 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2107.codfw.wmnet with reason: Maintenance
  • 09:05 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo and group 1
  • 09:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo and group 1
  • 09:05 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4006.ulsfo.wmnet to cluster eqiad and group A
  • 09:05 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster eqiad and group A
  • 09:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
  • 08:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
  • 08:20 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:17 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 07:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 07:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 07:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 05:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 05:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T318950)', diff saved to https://phabricator.wikimedia.org/P36965 and previous config saved to /var/cache/conftool/dbconfig/20221028-051110-ladsgroup.json
  • 04:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P36964 and previous config saved to /var/cache/conftool/dbconfig/20221028-045603-ladsgroup.json
  • 04:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P36963 and previous config saved to /var/cache/conftool/dbconfig/20221028-044057-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T318950)', diff saved to https://phabricator.wikimedia.org/P36962 and previous config saved to /var/cache/conftool/dbconfig/20221028-042550-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T318950)', diff saved to https://phabricator.wikimedia.org/P36961 and previous config saved to /var/cache/conftool/dbconfig/20221028-042443-ladsgroup.json
  • 04:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 04:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T318950)', diff saved to https://phabricator.wikimedia.org/P36960 and previous config saved to /var/cache/conftool/dbconfig/20221028-042421-ladsgroup.json
  • 04:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P36959 and previous config saved to /var/cache/conftool/dbconfig/20221028-040915-ladsgroup.json
  • 03:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P36958 and previous config saved to /var/cache/conftool/dbconfig/20221028-035409-ladsgroup.json
  • 03:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T318950)', diff saved to https://phabricator.wikimedia.org/P36957 and previous config saved to /var/cache/conftool/dbconfig/20221028-033902-ladsgroup.json
  • 03:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P36954 and previous config saved to /var/cache/conftool/dbconfig/20221028-032127-ladsgroup.json
  • 03:17 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 03:12 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 03:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P36953 and previous config saved to /var/cache/conftool/dbconfig/20221028-030620-ladsgroup.json
  • 03:05 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 02:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T318950)', diff saved to https://phabricator.wikimedia.org/P36952 and previous config saved to /var/cache/conftool/dbconfig/20221028-025113-ladsgroup.json
  • 02:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T318950)', diff saved to https://phabricator.wikimedia.org/P36951 and previous config saved to /var/cache/conftool/dbconfig/20221028-025006-ladsgroup.json
  • 02:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 02:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 02:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T318950)', diff saved to https://phabricator.wikimedia.org/P36950 and previous config saved to /var/cache/conftool/dbconfig/20221028-024944-ladsgroup.json
  • 02:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P36949 and previous config saved to /var/cache/conftool/dbconfig/20221028-023438-ladsgroup.json
  • 02:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P36948 and previous config saved to /var/cache/conftool/dbconfig/20221028-021932-ladsgroup.json
  • 02:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T318950)', diff saved to https://phabricator.wikimedia.org/P36947 and previous config saved to /var/cache/conftool/dbconfig/20221028-020425-ladsgroup.json
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T318950)', diff saved to https://phabricator.wikimedia.org/P36946 and previous config saved to /var/cache/conftool/dbconfig/20221028-020117-ladsgroup.json
  • 02:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 02:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 02:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 02:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 02:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 02:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T318950)', diff saved to https://phabricator.wikimedia.org/P36945 and previous config saved to /var/cache/conftool/dbconfig/20221028-020024-ladsgroup.json
  • 01:55 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4048.ulsfo.wmnet with OS buster
  • 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P36944 and previous config saved to /var/cache/conftool/dbconfig/20221028-014517-ladsgroup.json
  • 01:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T318950)', diff saved to https://phabricator.wikimedia.org/P36942 and previous config saved to /var/cache/conftool/dbconfig/20221028-011505-ladsgroup.json
  • 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T318950)', diff saved to https://phabricator.wikimedia.org/P36941 and previous config saved to /var/cache/conftool/dbconfig/20221028-011357-ladsgroup.json
  • 01:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T318950)', diff saved to https://phabricator.wikimedia.org/P36940 and previous config saved to /var/cache/conftool/dbconfig/20221028-011335-ladsgroup.json
  • 01:13 ejegg: civicrm upgraded from 4cb2d91e to 6f511710
  • 01:00 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS buster
  • 00:59 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4048.ulsfo.wmnet with OS buster
  • 00:58 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS buster
  • 00:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P36939 and previous config saved to /var/cache/conftool/dbconfig/20221028-005829-ladsgroup.json
  • 00:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS buster
  • 00:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P36938 and previous config saved to /var/cache/conftool/dbconfig/20221028-004322-ladsgroup.json
  • 00:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
  • 00:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T318950)', diff saved to https://phabricator.wikimedia.org/P36937 and previous config saved to /var/cache/conftool/dbconfig/20221028-002816-ladsgroup.json
  • 00:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T318950)', diff saved to https://phabricator.wikimedia.org/P36936 and previous config saved to /var/cache/conftool/dbconfig/20221028-002708-ladsgroup.json
  • 00:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 00:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 00:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 00:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T318950)', diff saved to https://phabricator.wikimedia.org/P36935 and previous config saved to /var/cache/conftool/dbconfig/20221028-002631-ladsgroup.json
  • 00:26 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
  • 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P36934 and previous config saved to /var/cache/conftool/dbconfig/20221028-001124-ladsgroup.json
  • 00:00 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS buster

2022-10-27

  • 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132', diff saved to https://phabricator.wikimedia.org/P36933 and previous config saved to /var/cache/conftool/dbconfig/20221027-235618-ladsgroup.json
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1132 (T318950)', diff saved to https://phabricator.wikimedia.org/P36932 and previous config saved to /var/cache/conftool/dbconfig/20221027-234111-ladsgroup.json
  • 23:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1132 (T318950)', diff saved to https://phabricator.wikimedia.org/P36931 and previous config saved to /var/cache/conftool/dbconfig/20221027-233903-ladsgroup.json
  • 23:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 23:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1132.eqiad.wmnet with reason: Maintenance
  • 23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T318950)', diff saved to https://phabricator.wikimedia.org/P36930 and previous config saved to /var/cache/conftool/dbconfig/20221027-233842-ladsgroup.json
  • 23:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P36929 and previous config saved to /var/cache/conftool/dbconfig/20221027-232335-ladsgroup.json
  • 23:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P36928 and previous config saved to /var/cache/conftool/dbconfig/20221027-230828-ladsgroup.json
  • 23:00 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1006
  • 23:00 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:57 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 22:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T318950)', diff saved to https://phabricator.wikimedia.org/P36927 and previous config saved to /var/cache/conftool/dbconfig/20221027-225322-ladsgroup.json
  • 22:53 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1006
  • 22:51 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts labstore1007
  • 22:51 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:49 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 22:44 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts labstore1007
  • 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T318950)', diff saved to https://phabricator.wikimedia.org/P36926 and previous config saved to /var/cache/conftool/dbconfig/20221027-224413-ladsgroup.json
  • 22:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 22:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 22:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T318950)', diff saved to https://phabricator.wikimedia.org/P36925 and previous config saved to /var/cache/conftool/dbconfig/20221027-224350-ladsgroup.json
  • 22:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P36924 and previous config saved to /var/cache/conftool/dbconfig/20221027-222844-ladsgroup.json
  • 22:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P36923 and previous config saved to /var/cache/conftool/dbconfig/20221027-221337-ladsgroup.json
  • 21:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T318950)', diff saved to https://phabricator.wikimedia.org/P36922 and previous config saved to /var/cache/conftool/dbconfig/20221027-215831-ladsgroup.json
  • 21:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T318950)', diff saved to https://phabricator.wikimedia.org/P36921 and previous config saved to /var/cache/conftool/dbconfig/20221027-215723-ladsgroup.json
  • 21:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 21:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 21:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T318950)', diff saved to https://phabricator.wikimedia.org/P36920 and previous config saved to /var/cache/conftool/dbconfig/20221027-215701-ladsgroup.json
  • 21:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 21:46 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 21:46 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4052
  • 21:46 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4052
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P36919 and previous config saved to /var/cache/conftool/dbconfig/20221027-214154-ladsgroup.json
  • 21:41 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 21:34 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P36918 and previous config saved to /var/cache/conftool/dbconfig/20221027-212648-ladsgroup.json
  • 21:20 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4052.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:13 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bullseye
  • 21:12 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4052.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T318950)', diff saved to https://phabricator.wikimedia.org/P36917 and previous config saved to /var/cache/conftool/dbconfig/20221027-211142-ladsgroup.json
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T318950)', diff saved to https://phabricator.wikimedia.org/P36916 and previous config saved to /var/cache/conftool/dbconfig/20221027-211034-ladsgroup.json
  • 21:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 21:10 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4052.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T318950)', diff saved to https://phabricator.wikimedia.org/P36915 and previous config saved to /var/cache/conftool/dbconfig/20221027-211012-ladsgroup.json
  • 21:02 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4052.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:56 sukhe: sudo ipmitool -I lanplus -H "cp4052.mgmt.ulsfo.wmnet" -U root -E chassis power cycle
  • 20:56 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage
  • 20:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4052.ulsfo.wmnet with OS buster
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P36914 and previous config saved to /var/cache/conftool/dbconfig/20221027-205505-ladsgroup.json
  • 20:53 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage
  • 20:50 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4052.ulsfo.wmnet with OS buster
  • 20:47 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for enwiki, enwiktionary (T300770)
  • 20:47 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for commonswiki (T300770)
  • 20:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T318950)', diff saved to https://phabricator.wikimedia.org/P36912 and previous config saved to /var/cache/conftool/dbconfig/20221027-202452-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 (T318950)', diff saved to https://phabricator.wikimedia.org/P36911 and previous config saved to /var/cache/conftool/dbconfig/20221027-202345-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 20:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T318950)', diff saved to https://phabricator.wikimedia.org/P36910 and previous config saved to /var/cache/conftool/dbconfig/20221027-202323-ladsgroup.json
  • 20:16 kindrobot: End of UTC late backport deployment window
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 20:15 kindrobot@deploy1002: Finished scap: Backport for Deploy Research Incentive survey on enwiki (T318333) (duration: 06m 32s)
  • 20:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 20:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 20:09 kindrobot@deploy1002: kindrobot and dani: Backport for Deploy Research Incentive survey on enwiki (T318333) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:08 kindrobot@deploy1002: Started scap: Backport for Deploy Research Incentive survey on enwiki (T318333)
  • 20:08 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS buster
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P36909 and previous config saved to /var/cache/conftool/dbconfig/20221027-200817-ladsgroup.json
  • 20:08 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:59 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T318950)', diff saved to https://phabricator.wikimedia.org/P36908 and previous config saved to /var/cache/conftool/dbconfig/20221027-195634-ladsgroup.json
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P36907 and previous config saved to /var/cache/conftool/dbconfig/20221027-195310-ladsgroup.json
  • 19:51 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:50 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:50 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:49 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P36906 and previous config saved to /var/cache/conftool/dbconfig/20221027-194127-ladsgroup.json
  • 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T318950)', diff saved to https://phabricator.wikimedia.org/P36905 and previous config saved to /var/cache/conftool/dbconfig/20221027-193803-ladsgroup.json
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T318950)', diff saved to https://phabricator.wikimedia.org/P36904 and previous config saved to /var/cache/conftool/dbconfig/20221027-193656-ladsgroup.json
  • 19:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36903 and previous config saved to /var/cache/conftool/dbconfig/20221027-193617-ladsgroup.json
  • 19:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P36902 and previous config saved to /var/cache/conftool/dbconfig/20221027-192621-ladsgroup.json
  • 19:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P36901 and previous config saved to /var/cache/conftool/dbconfig/20221027-192110-ladsgroup.json
  • 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T318950)', diff saved to https://phabricator.wikimedia.org/P36900 and previous config saved to /var/cache/conftool/dbconfig/20221027-191114-ladsgroup.json
  • 19:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T318950)', diff saved to https://phabricator.wikimedia.org/P36899 and previous config saved to /var/cache/conftool/dbconfig/20221027-190904-ladsgroup.json
  • 19:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 19:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 19:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T318950)', diff saved to https://phabricator.wikimedia.org/P36898 and previous config saved to /var/cache/conftool/dbconfig/20221027-190843-ladsgroup.json
  • 19:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311', diff saved to https://phabricator.wikimedia.org/P36897 and previous config saved to /var/cache/conftool/dbconfig/20221027-190604-ladsgroup.json
  • 18:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P36896 and previous config saved to /var/cache/conftool/dbconfig/20221027-185336-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36895 and previous config saved to /var/cache/conftool/dbconfig/20221027-185057-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1105:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36894 and previous config saved to /var/cache/conftool/dbconfig/20221027-184949-ladsgroup.json
  • 18:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1105.eqiad.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36893 and previous config saved to /var/cache/conftool/dbconfig/20221027-184928-ladsgroup.json
  • 18:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P36892 and previous config saved to /var/cache/conftool/dbconfig/20221027-183830-ladsgroup.json
  • 18:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P36891 and previous config saved to /var/cache/conftool/dbconfig/20221027-183421-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T318950)', diff saved to https://phabricator.wikimedia.org/P36890 and previous config saved to /var/cache/conftool/dbconfig/20221027-182323-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T318950)', diff saved to https://phabricator.wikimedia.org/P36889 and previous config saved to /var/cache/conftool/dbconfig/20221027-182113-ladsgroup.json
  • 18:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318950)', diff saved to https://phabricator.wikimedia.org/P36888 and previous config saved to /var/cache/conftool/dbconfig/20221027-182051-ladsgroup.json
  • 18:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P36887 and previous config saved to /var/cache/conftool/dbconfig/20221027-181915-ladsgroup.json
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P36886 and previous config saved to /var/cache/conftool/dbconfig/20221027-180545-ladsgroup.json
  • 18:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36885 and previous config saved to /var/cache/conftool/dbconfig/20221027-180408-ladsgroup.json
  • 18:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36884 and previous config saved to /var/cache/conftool/dbconfig/20221027-180301-ladsgroup.json
  • 18:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 18:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 17:52 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P36883 and previous config saved to /var/cache/conftool/dbconfig/20221027-175038-ladsgroup.json
  • 17:45 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:45 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:42 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P36882 and previous config saved to /var/cache/conftool/dbconfig/20221027-174219-ladsgroup.json
  • 17:42 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4051.ulsfo.wmnet with OS buster
  • 17:39 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4006.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:38 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4006
  • 17:37 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4006
  • 17:37 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4002
  • 17:37 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4002
  • 17:37 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-stretch1001.eqiad.wmnet with OS bullseye
  • 17:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T318950)', diff saved to https://phabricator.wikimedia.org/P36881 and previous config saved to /var/cache/conftool/dbconfig/20221027-173532-ladsgroup.json
  • 17:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T318950)', diff saved to https://phabricator.wikimedia.org/P36880 and previous config saved to /var/cache/conftool/dbconfig/20221027-173322-ladsgroup.json
  • 17:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 17:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 17:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36879 and previous config saved to /var/cache/conftool/dbconfig/20221027-173255-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P36878 and previous config saved to /var/cache/conftool/dbconfig/20221027-172712-ladsgroup.json
  • 17:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P36877 and previous config saved to /var/cache/conftool/dbconfig/20221027-171749-ladsgroup.json
  • 17:14 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P36876 and previous config saved to /var/cache/conftool/dbconfig/20221027-171205-ladsgroup.json
  • 17:11 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4051.ulsfo.wmnet with reason: host reimage
  • 17:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P36875 and previous config saved to /var/cache/conftool/dbconfig/20221027-170242-ladsgroup.json
  • 16:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P36874 and previous config saved to /var/cache/conftool/dbconfig/20221027-165659-ladsgroup.json
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:53 dancy@deploy1002: Sync cancelled.
  • 16:52 dancy@deploy1002: dancy: testing mw-debug synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 16:52 dancy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:52 dancy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T318605)', diff saved to https://phabricator.wikimedia.org/P36873 and previous config saved to /var/cache/conftool/dbconfig/20221027-165052-ladsgroup.json
  • 16:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36872 and previous config saved to /var/cache/conftool/dbconfig/20221027-165031-ladsgroup.json
  • 16:48 dancy@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:48 dancy@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36871 and previous config saved to /var/cache/conftool/dbconfig/20221027-164735-ladsgroup.json
  • 16:47 dancy@deploy1002: Started scap: testing mw-debug
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36870 and previous config saved to /var/cache/conftool/dbconfig/20221027-164626-ladsgroup.json
  • 16:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 16:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36869 and previous config saved to /var/cache/conftool/dbconfig/20221027-164615-ladsgroup.json
  • 16:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4051.ulsfo.wmnet with OS buster
  • 16:39 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4050.ulsfo.wmnet with OS buster
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P36868 and previous config saved to /var/cache/conftool/dbconfig/20221027-163524-ladsgroup.json
  • 16:33 dancy@deploy1002: scap failed: CalledProcessError Command 'sudo -u mwbuilder /usr/bin/make -C /srv/mwbuilder/release/make-container-image -f Makefile build-and-push-all-images http_proxy=http://webproxy.eqiad.wmnet:8080 https_proxy=http://webproxy.eqiad.wmnet:8080 GIT_BASE=https://gerrit.wikimedia.org/r/ MW_CONFIG_BRANCH=master workdir_volume=/srv/mediawiki-staging mv_image_name=docker-registry.discovery.wmnet/restric
  • 16:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P36867 and previous config saved to /var/cache/conftool/dbconfig/20221027-163109-ladsgroup.json
  • 16:27 dancy@deploy1002: Started scap: testing mw-debug
  • 16:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2005.codfw.wmnet
  • 16:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2005.codfw.wmnet
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P36866 and previous config saved to /var/cache/conftool/dbconfig/20221027-162018-ladsgroup.json
  • 16:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2004.codfw.wmnet
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P36865 and previous config saved to /var/cache/conftool/dbconfig/20221027-161602-ladsgroup.json
  • 16:15 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
  • 16:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2004.codfw.wmnet
  • 16:11 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4050.ulsfo.wmnet with reason: host reimage
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36864 and previous config saved to /var/cache/conftool/dbconfig/20221027-160511-ladsgroup.json
  • 16:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36863 and previous config saved to /var/cache/conftool/dbconfig/20221027-160056-ladsgroup.json
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2167:3311 (T318950)', diff saved to https://phabricator.wikimedia.org/P36862 and previous config saved to /var/cache/conftool/dbconfig/20221027-155946-ladsgroup.json
  • 15:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:59 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T318950)', diff saved to https://phabricator.wikimedia.org/P36861 and previous config saved to /var/cache/conftool/dbconfig/20221027-155902-ladsgroup.json
  • 15:55 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2003-dev.codfw.wmnet with OS bullseye
  • 15:47 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:46 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4050.ulsfo.wmnet with OS buster
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P36860 and previous config saved to /var/cache/conftool/dbconfig/20221027-154356-ladsgroup.json
  • 15:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4041.ulsfo.wmnet with OS buster
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36859 and previous config saved to /var/cache/conftool/dbconfig/20221027-153143-ladsgroup.json
  • 15:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36858 and previous config saved to /var/cache/conftool/dbconfig/20221027-153121-ladsgroup.json
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P36857 and previous config saved to /var/cache/conftool/dbconfig/20221027-152849-ladsgroup.json
  • 15:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wcqs2002.codfw.wmnet
  • 15:26 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for wcqs2002.codfw.wmnet
  • 15:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on wcqs2003.codfw.wmnet with reason: data reload
  • 15:26 bking@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on wcqs2003.codfw.wmnet with reason: data reload
  • 15:26 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: data reload
  • 15:25 bking@cumin2002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on wcqs2002.codfw.wmnet with reason: data reload
  • 15:23 claime: Removed silence ProbeDown instance="mwdebug:4444"
  • 15:23 claime: k8s-experimental mwdebug service switched to new deployment mw-debug
  • 15:22 claime: Unpausing mwdebug k8s deployments
  • 15:19 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
  • 15:19 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 15:18 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 15:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 15:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 15:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P36856 and previous config saved to /var/cache/conftool/dbconfig/20221027-151615-ladsgroup.json
  • 15:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T321123)', diff saved to https://phabricator.wikimedia.org/P36855 and previous config saved to /var/cache/conftool/dbconfig/20221027-151604-marostegui.json
  • 15:15 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T318950)', diff saved to https://phabricator.wikimedia.org/P36854 and previous config saved to /var/cache/conftool/dbconfig/20221027-151343-ladsgroup.json
  • 15:12 claime: Silence ProbeDown instance="mwdebug:4444" for 1h
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T318950)', diff saved to https://phabricator.wikimedia.org/P36853 and previous config saved to /var/cache/conftool/dbconfig/20221027-151133-ladsgroup.json
  • 15:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 15:11 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2001-dev.codfw.wmnet with OS bullseye
  • 15:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T318950)', diff saved to https://phabricator.wikimedia.org/P36852 and previous config saved to /var/cache/conftool/dbconfig/20221027-151111-ladsgroup.json
  • 15:07 claime: Pausing mwdebug k8s deployments
  • 15:07 moritzm: installing node-moment security updates
  • 15:07 claime: Switching k8s-experimental mwdebug service
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P36851 and previous config saved to /var/cache/conftool/dbconfig/20221027-150108-ladsgroup.json
  • 15:00 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P36850 and previous config saved to /var/cache/conftool/dbconfig/20221027-150058-marostegui.json
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P36849 and previous config saved to /var/cache/conftool/dbconfig/20221027-145604-ladsgroup.json
  • 14:51 moritzm: installing krb5 bugfix updates from Bullseye point release
  • 14:50 aborrero@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
  • 14:50 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2001-dev.codfw.wmnet with reason: host reimage
  • 14:49 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS buster
  • 14:48 moritzm: installing twitter-bootstrap4 security updates
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36848 and previous config saved to /var/cache/conftool/dbconfig/20221027-144602-ladsgroup.json
  • 14:45 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P36847 and previous config saved to /var/cache/conftool/dbconfig/20221027-144551-marostegui.json
  • 14:45 aborrero@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2003-dev.codfw.wmnet with reason: host reimage
  • 14:41 aborrero@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2003-dev.codfw.wmnet with reason: host reimage
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P36846 and previous config saved to /var/cache/conftool/dbconfig/20221027-144058-ladsgroup.json
  • 14:34 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS buster
  • 14:30 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T321123)', diff saved to https://phabricator.wikimedia.org/P36845 and previous config saved to /var/cache/conftool/dbconfig/20221027-143045-marostegui.json
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T321123)', diff saved to https://phabricator.wikimedia.org/P36844 and previous config saved to /var/cache/conftool/dbconfig/20221027-142656-marostegui.json
  • 14:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 14:26 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36843 and previous config saved to /var/cache/conftool/dbconfig/20221027-142634-marostegui.json
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T318950)', diff saved to https://phabricator.wikimedia.org/P36842 and previous config saved to /var/cache/conftool/dbconfig/20221027-142552-ladsgroup.json
  • 14:25 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2003-dev.codfw.wmnet with OS bullseye
  • 14:24 aborrero@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2001-dev.codfw.wmnet with OS bullseye
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T318950)', diff saved to https://phabricator.wikimedia.org/P36841 and previous config saved to /var/cache/conftool/dbconfig/20221027-142342-ladsgroup.json
  • 14:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T318950)', diff saved to https://phabricator.wikimedia.org/P36840 and previous config saved to /var/cache/conftool/dbconfig/20221027-142320-ladsgroup.json
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3316 (T318605)', diff saved to https://phabricator.wikimedia.org/P36839 and previous config saved to /var/cache/conftool/dbconfig/20221027-141326-ladsgroup.json
  • 14:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P36838 and previous config saved to /var/cache/conftool/dbconfig/20221027-141304-ladsgroup.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P36837 and previous config saved to /var/cache/conftool/dbconfig/20221027-141128-marostegui.json
  • 14:08 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 14:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P36836 and previous config saved to /var/cache/conftool/dbconfig/20221027-140814-ladsgroup.json
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022 (T321312)', diff saved to https://phabricator.wikimedia.org/P36835 and previous config saved to /var/cache/conftool/dbconfig/20221027-140708-ladsgroup.json
  • 14:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 14:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T318950)', diff saved to https://phabricator.wikimedia.org/P36834 and previous config saved to /var/cache/conftool/dbconfig/20221027-140043-ladsgroup.json
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P36833 and previous config saved to /var/cache/conftool/dbconfig/20221027-135757-ladsgroup.json
  • 13:56 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P36832 and previous config saved to /var/cache/conftool/dbconfig/20221027-135621-marostegui.json
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P36831 and previous config saved to /var/cache/conftool/dbconfig/20221027-135307-ladsgroup.json
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022', diff saved to https://phabricator.wikimedia.org/P36830 and previous config saved to /var/cache/conftool/dbconfig/20221027-135201-ladsgroup.json
  • 13:46 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P36829 and previous config saved to /var/cache/conftool/dbconfig/20221027-134537-ladsgroup.json
  • 13:45 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:44 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P36828 and previous config saved to /var/cache/conftool/dbconfig/20221027-134251-ladsgroup.json
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36827 and previous config saved to /var/cache/conftool/dbconfig/20221027-134115-marostegui.json
  • 13:40 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 13:39 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS buster
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36826 and previous config saved to /var/cache/conftool/dbconfig/20221027-133848-marostegui.json
  • 13:38 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 13:38 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 13:38 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T321123)', diff saved to https://phabricator.wikimedia.org/P36825 and previous config saved to /var/cache/conftool/dbconfig/20221027-133827-marostegui.json
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T318950)', diff saved to https://phabricator.wikimedia.org/P36824 and previous config saved to /var/cache/conftool/dbconfig/20221027-133801-ladsgroup.json
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022', diff saved to https://phabricator.wikimedia.org/P36823 and previous config saved to /var/cache/conftool/dbconfig/20221027-133654-ladsgroup.json
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T318950)', diff saved to https://phabricator.wikimedia.org/P36822 and previous config saved to /var/cache/conftool/dbconfig/20221027-133551-ladsgroup.json
  • 13:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T318950)', diff saved to https://phabricator.wikimedia.org/P36821 and previous config saved to /var/cache/conftool/dbconfig/20221027-133522-ladsgroup.json
  • 13:32 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P36820 and previous config saved to /var/cache/conftool/dbconfig/20221027-133031-ladsgroup.json
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:29 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Define a default value for wgPageTriageMaxAge (T310974) (duration: 05m 33s)
  • 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T321312)', diff saved to https://phabricator.wikimedia.org/P36819 and previous config saved to /var/cache/conftool/dbconfig/20221027-132814-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P36818 and previous config saved to /var/cache/conftool/dbconfig/20221027-132743-ladsgroup.json
  • 13:24 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for Define a default value for wgPageTriageMaxAge (T310974) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:23 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Define a default value for wgPageTriageMaxAge (T310974)
  • 13:23 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P36817 and previous config saved to /var/cache/conftool/dbconfig/20221027-132320-marostegui.json
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:22 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable show nearby feature on de.wikivoyage (T320692) (duration: 05m 35s)
  • 13:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2022 (T321312)', diff saved to https://phabricator.wikimedia.org/P36816 and previous config saved to /var/cache/conftool/dbconfig/20221027-132148-ladsgroup.json
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P36815 and previous config saved to /var/cache/conftool/dbconfig/20221027-132016-ladsgroup.json
  • 13:16 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and wmde-fisch: Backport for Enable show nearby feature on de.wikivoyage (T320692) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:16 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable show nearby feature on de.wikivoyage (T320692)
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T318950)', diff saved to https://phabricator.wikimedia.org/P36814 and previous config saved to /var/cache/conftool/dbconfig/20221027-131524-ladsgroup.json
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P36813 and previous config saved to /var/cache/conftool/dbconfig/20221027-131308-ladsgroup.json
  • 13:12 vgutierrez: pool cp5007
  • 13:12 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Enable source links on Translation ns on bnwikisource (T53980) (duration: 05m 40s)
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T318950)', diff saved to https://phabricator.wikimedia.org/P36812 and previous config saved to /var/cache/conftool/dbconfig/20221027-131127-ladsgroup.json
  • 13:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T318950)', diff saved to https://phabricator.wikimedia.org/P36811 and previous config saved to /var/cache/conftool/dbconfig/20221027-131117-ladsgroup.json
  • 13:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:08 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P36810 and previous config saved to /var/cache/conftool/dbconfig/20221027-130814-marostegui.json
  • 13:06 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and bodhisattwa: Backport for Enable source links on Translation ns on bnwikisource (T53980) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:06 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Enable source links on Translation ns on bnwikisource (T53980)
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P36809 and previous config saved to /var/cache/conftool/dbconfig/20221027-130509-ladsgroup.json
  • 13:03 vgutierrez: depool cp5007
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2022 (T321312)', diff saved to https://phabricator.wikimedia.org/P36808 and previous config saved to /var/cache/conftool/dbconfig/20221027-130135-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021 (T321312)', diff saved to https://phabricator.wikimedia.org/P36807 and previous config saved to /var/cache/conftool/dbconfig/20221027-130110-ladsgroup.json
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022', diff saved to https://phabricator.wikimedia.org/P36806 and previous config saved to /var/cache/conftool/dbconfig/20221027-125801-ladsgroup.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P36805 and previous config saved to /var/cache/conftool/dbconfig/20221027-125610-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T318605)', diff saved to https://phabricator.wikimedia.org/P36804 and previous config saved to /var/cache/conftool/dbconfig/20221027-125456-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:53 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T321123)', diff saved to https://phabricator.wikimedia.org/P36803 and previous config saved to /var/cache/conftool/dbconfig/20221027-125307-marostegui.json
  • 12:52 ladsgroup@deploy1002: Finished scap: Backport for maintenance: Use $this->waitForReplication() (T298485) (duration: 04m 40s)
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T321123)', diff saved to https://phabricator.wikimedia.org/P36802 and previous config saved to /var/cache/conftool/dbconfig/20221027-125042-marostegui.json
  • 12:50 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 12:50 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 12:50 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36801 and previous config saved to /var/cache/conftool/dbconfig/20221027-125020-marostegui.json
  • 12:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T318950)', diff saved to https://phabricator.wikimedia.org/P36800 and previous config saved to /var/cache/conftool/dbconfig/20221027-125002-ladsgroup.json
  • 12:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:48 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for maintenance: Use $this->waitForReplication() (T298485) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T318950)', diff saved to https://phabricator.wikimedia.org/P36799 and previous config saved to /var/cache/conftool/dbconfig/20221027-124752-ladsgroup.json
  • 12:47 ladsgroup@deploy1002: Started scap: Backport for maintenance: Use $this->waitForReplication() (T298485)
  • 12:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T318950)', diff saved to https://phabricator.wikimedia.org/P36798 and previous config saved to /var/cache/conftool/dbconfig/20221027-124731-ladsgroup.json
  • 12:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021', diff saved to https://phabricator.wikimedia.org/P36797 and previous config saved to /var/cache/conftool/dbconfig/20221027-124603-ladsgroup.json
  • 12:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1022 (T321312)', diff saved to https://phabricator.wikimedia.org/P36796 and previous config saved to /var/cache/conftool/dbconfig/20221027-124255-ladsgroup.json
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P36795 and previous config saved to /var/cache/conftool/dbconfig/20221027-124104-ladsgroup.json
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P36794 and previous config saved to /var/cache/conftool/dbconfig/20221027-123513-marostegui.json
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P36793 and previous config saved to /var/cache/conftool/dbconfig/20221027-123224-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021', diff saved to https://phabricator.wikimedia.org/P36792 and previous config saved to /var/cache/conftool/dbconfig/20221027-123057-ladsgroup.json
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T318950)', diff saved to https://phabricator.wikimedia.org/P36791 and previous config saved to /var/cache/conftool/dbconfig/20221027-122557-ladsgroup.json
  • 12:20 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P36790 and previous config saved to /var/cache/conftool/dbconfig/20221027-122007-marostegui.json
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P36789 and previous config saved to /var/cache/conftool/dbconfig/20221027-121717-ladsgroup.json
  • 12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2021 (T321312)', diff saved to https://phabricator.wikimedia.org/P36788 and previous config saved to /var/cache/conftool/dbconfig/20221027-121550-ladsgroup.json
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36787 and previous config saved to /var/cache/conftool/dbconfig/20221027-121441-root.json
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36786 and previous config saved to /var/cache/conftool/dbconfig/20221027-121432-root.json
  • 12:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36785 and previous config saved to /var/cache/conftool/dbconfig/20221027-121425-root.json
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1022 (T321312)', diff saved to https://phabricator.wikimedia.org/P36784 and previous config saved to /var/cache/conftool/dbconfig/20221027-121323-ladsgroup.json
  • 12:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1022.eqiad.wmnet with reason: Maintenance
  • 12:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021 (T321312)', diff saved to https://phabricator.wikimedia.org/P36783 and previous config saved to /var/cache/conftool/dbconfig/20221027-121259-ladsgroup.json
  • 12:05 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36782 and previous config saved to /var/cache/conftool/dbconfig/20221027-120500-marostegui.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T321123)', diff saved to https://phabricator.wikimedia.org/P36781 and previous config saved to /var/cache/conftool/dbconfig/20221027-120234-marostegui.json
  • 12:02 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:02 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 12:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T318950)', diff saved to https://phabricator.wikimedia.org/P36780 and previous config saved to /var/cache/conftool/dbconfig/20221027-120211-ladsgroup.json
  • 12:01 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on D{lvs200[7-8].codfw.wmnet} and A:lvs
  • 12:00 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on D{lvs200[7-8].codfw.wmnet} and A:lvs
  • 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2116 (T318950)', diff saved to https://phabricator.wikimedia.org/P36778 and previous config saved to /var/cache/conftool/dbconfig/20221027-120001-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T318950)', diff saved to https://phabricator.wikimedia.org/P36777 and previous config saved to /var/cache/conftool/dbconfig/20221027-115939-ladsgroup.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36776 and previous config saved to /var/cache/conftool/dbconfig/20221027-115936-root.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36775 and previous config saved to /var/cache/conftool/dbconfig/20221027-115927-root.json
  • 11:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36774 and previous config saved to /var/cache/conftool/dbconfig/20221027-115920-root.json
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021', diff saved to https://phabricator.wikimedia.org/P36773 and previous config saved to /var/cache/conftool/dbconfig/20221027-115753-ladsgroup.json
  • 11:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 11:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P36772 and previous config saved to /var/cache/conftool/dbconfig/20221027-115157-ladsgroup.json
  • 11:47 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P36771 and previous config saved to /var/cache/conftool/dbconfig/20221027-114706-marostegui.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36770 and previous config saved to /var/cache/conftool/dbconfig/20221027-114432-root.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36769 and previous config saved to /var/cache/conftool/dbconfig/20221027-114422-root.json
  • 11:44 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36768 and previous config saved to /var/cache/conftool/dbconfig/20221027-114416-root.json
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021', diff saved to https://phabricator.wikimedia.org/P36767 and previous config saved to /var/cache/conftool/dbconfig/20221027-114246-ladsgroup.json
  • 11:39 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:38 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P36765 and previous config saved to /var/cache/conftool/dbconfig/20221027-113651-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T318950)', diff saved to https://phabricator.wikimedia.org/P36764 and previous config saved to /var/cache/conftool/dbconfig/20221027-113554-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T318950)', diff saved to https://phabricator.wikimedia.org/P36763 and previous config saved to /var/cache/conftool/dbconfig/20221027-113544-ladsgroup.json
  • 11:35 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:35 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of netflow1002.eqiad.wmnet to plain
  • 11:32 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of netflow1002.eqiad.wmnet to plain
  • 11:32 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P36762 and previous config saved to /var/cache/conftool/dbconfig/20221027-113159-marostegui.json
  • 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103', diff saved to https://phabricator.wikimedia.org/P36761 and previous config saved to /var/cache/conftool/dbconfig/20221027-112927-ladsgroup.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36760 and previous config saved to /var/cache/conftool/dbconfig/20221027-112927-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36759 and previous config saved to /var/cache/conftool/dbconfig/20221027-112917-root.json
  • 11:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36758 and previous config saved to /var/cache/conftool/dbconfig/20221027-112911-root.json
  • 11:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1021 (T321312)', diff saved to https://phabricator.wikimedia.org/P36757 and previous config saved to /var/cache/conftool/dbconfig/20221027-112740-ladsgroup.json
  • 11:24 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:24 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P36756 and previous config saved to /var/cache/conftool/dbconfig/20221027-112144-ladsgroup.json
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P36755 and previous config saved to /var/cache/conftool/dbconfig/20221027-112037-ladsgroup.json
  • 11:16 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T321123)', diff saved to https://phabricator.wikimedia.org/P36754 and previous config saved to /var/cache/conftool/dbconfig/20221027-111653-marostegui.json
  • 11:15 oblivian@cumin1001: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:15 oblivian@cumin1001: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs4005.ulsfo.wmnet} and A:lvs
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T321123)', diff saved to https://phabricator.wikimedia.org/P36753 and previous config saved to /var/cache/conftool/dbconfig/20221027-111427-marostegui.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36752 and previous config saved to /var/cache/conftool/dbconfig/20221027-111422-root.json
  • 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2103 (T318950)', diff saved to https://phabricator.wikimedia.org/P36751 and previous config saved to /var/cache/conftool/dbconfig/20221027-111414-ladsgroup.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36750 and previous config saved to /var/cache/conftool/dbconfig/20221027-111412-root.json
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36749 and previous config saved to /var/cache/conftool/dbconfig/20221027-111406-root.json
  • 11:14 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 11:14 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T321123)', diff saved to https://phabricator.wikimedia.org/P36748 and previous config saved to /var/cache/conftool/dbconfig/20221027-111401-marostegui.json
  • 11:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2103 (T318950)', diff saved to https://phabricator.wikimedia.org/P36747 and previous config saved to /var/cache/conftool/dbconfig/20221027-111204-ladsgroup.json
  • 11:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2103.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 11:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2021 (T321312)', diff saved to https://phabricator.wikimedia.org/P36746 and previous config saved to /var/cache/conftool/dbconfig/20221027-111009-ladsgroup.json
  • 11:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2021.codfw.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1021 (T321312)', diff saved to https://phabricator.wikimedia.org/P36745 and previous config saved to /var/cache/conftool/dbconfig/20221027-110920-ladsgroup.json
  • 11:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1021.eqiad.wmnet with reason: Maintenance
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P36744 and previous config saved to /var/cache/conftool/dbconfig/20221027-110638-ladsgroup.json
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P36743 and previous config saved to /var/cache/conftool/dbconfig/20221027-110531-ladsgroup.json
  • 11:05 moritzm: installing nodejs security updates on buster
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T318950)', diff saved to https://phabricator.wikimedia.org/P36742 and previous config saved to /var/cache/conftool/dbconfig/20221027-110301-ladsgroup.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36740 and previous config saved to /var/cache/conftool/dbconfig/20221027-105910-root.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36739 and previous config saved to /var/cache/conftool/dbconfig/20221027-105907-root.json
  • 10:59 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36738 and previous config saved to /var/cache/conftool/dbconfig/20221027-105901-root.json
  • 10:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P36737 and previous config saved to /var/cache/conftool/dbconfig/20221027-105855-marostegui.json
  • 10:35 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be1059.eqiad.wmnet
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P36728 and previous config saved to /var/cache/conftool/dbconfig/20221027-103248-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T318605)', diff saved to https://phabricator.wikimedia.org/P36727 and previous config saved to /var/cache/conftool/dbconfig/20221027-103236-ladsgroup.json
  • 10:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 10:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P36726 and previous config saved to /var/cache/conftool/dbconfig/20221027-103214-ladsgroup.json
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P36725 and previous config saved to /var/cache/conftool/dbconfig/20221027-103110-ladsgroup.json
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'es2028 (re)pooling @ 1%: After upgrade', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20221027-102852-root.json
  • 10:30 marostegui@cumin1001: dbctl commit (dc=all): 'es2027 (re)pooling @ 1%: After upgrade', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20221027-102847-root.json
  • 10:29 marostegui@cumin1001: dbctl commit (dc=all): 'es2026 (re)pooling @ 1%: After upgrade', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20221027-102843-root.json
  • 10:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T321123)', diff saved to https://phabricator.wikimedia.org/P36724 and previous config saved to /var/cache/conftool/dbconfig/20221027-102611-marostegui.json
  • 10:26 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 10:25 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 10:25 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T321123)', diff saved to https://phabricator.wikimedia.org/P36723 and previous config saved to /var/cache/conftool/dbconfig/20221027-102550-marostegui.json
  • 10:22 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2026 es2027 es2028 for upgrade', diff saved to https://phabricator.wikimedia.org/P36722 and previous config saved to /var/cache/conftool/dbconfig/20221027-102209-root.json
  • 10:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host stat1009.eqiad.wmnet
  • 10:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T321312)', diff saved to https://phabricator.wikimedia.org/P36721 and previous config saved to /var/cache/conftool/dbconfig/20221027-101848-ladsgroup.json
  • 10:18 marostegui@cumin1001: dbctl commit (dc=all): 'Promote es2030 as es1 master, es2031 as es2 master, es2029 as es3 master', diff saved to https://phabricator.wikimedia.org/P36720 and previous config saved to /var/cache/conftool/dbconfig/20221027-101842-marostegui.json
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T318950)', diff saved to https://phabricator.wikimedia.org/P36719 and previous config saved to /var/cache/conftool/dbconfig/20221027-101742-ladsgroup.json
  • 10:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P36718 and previous config saved to /var/cache/conftool/dbconfig/20221027-101708-ladsgroup.json
  • 10:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P36717 and previous config saved to /var/cache/conftool/dbconfig/20221027-101604-ladsgroup.json
  • 10:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host stat1009.eqiad.wmnet
  • 10:14 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 codfw master" (duration: 04m 29s)
  • 10:12 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 10:10 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P36716 and previous config saved to /var/cache/conftool/dbconfig/20221027-101043-marostegui.json
  • 10:09 marostegui@deploy1002: marostegui and marostegui: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 codfw master" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T318950)', diff saved to https://phabricator.wikimedia.org/P36715 and previous config saved to /var/cache/conftool/dbconfig/20221027-100948-ladsgroup.json
  • 10:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 10:09 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Promote pc2014 to pc1 codfw master"
  • 10:09 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 10:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 10:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T318950)', diff saved to https://phabricator.wikimedia.org/P36714 and previous config saved to /var/cache/conftool/dbconfig/20221027-100915-ladsgroup.json
  • 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025 (T321312)', diff saved to https://phabricator.wikimedia.org/P36713 and previous config saved to /var/cache/conftool/dbconfig/20221027-100547-ladsgroup.json
  • 10:03 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P36712 and previous config saved to /var/cache/conftool/dbconfig/20221027-100303-ladsgroup.json
  • 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
  • 10:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P36711 and previous config saved to /var/cache/conftool/dbconfig/20221027-100201-ladsgroup.json
  • 10:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T318950)', diff saved to https://phabricator.wikimedia.org/P36710 and previous config saved to /var/cache/conftool/dbconfig/20221027-100057-ladsgroup.json
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
  • 09:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T318950)', diff saved to https://phabricator.wikimedia.org/P36709 and previous config saved to /var/cache/conftool/dbconfig/20221027-095700-ladsgroup.json
  • 09:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P36708 and previous config saved to /var/cache/conftool/dbconfig/20221027-095649-ladsgroup.json
  • 09:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
  • 09:55 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P36707 and previous config saved to /var/cache/conftool/dbconfig/20221027-095537-marostegui.json
  • 09:54 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
  • 09:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P36706 and previous config saved to /var/cache/conftool/dbconfig/20221027-095408-ladsgroup.json
  • 09:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025', diff saved to https://phabricator.wikimedia.org/P36705 and previous config saved to /var/cache/conftool/dbconfig/20221027-095041-ladsgroup.json
  • 09:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025', diff saved to https://phabricator.wikimedia.org/P36704 and previous config saved to /var/cache/conftool/dbconfig/20221027-094756-ladsgroup.json
  • 09:47 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P36703 and previous config saved to /var/cache/conftool/dbconfig/20221027-094655-ladsgroup.json
  • 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
  • 09:46 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc2014 to pc1 codfw master (duration: 04m 26s)
  • 09:42 marostegui@deploy1002: marostegui and marostegui: Backport for ProductionServices.php: Promote pc2014 to pc1 codfw master synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P36702 and previous config saved to /var/cache/conftool/dbconfig/20221027-094143-ladsgroup.json
  • 09:41 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc2014 to pc1 codfw master
  • 09:40 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T321123)', diff saved to https://phabricator.wikimedia.org/P36701 and previous config saved to /var/cache/conftool/dbconfig/20221027-094030-marostegui.json
  • 09:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P36700 and previous config saved to /var/cache/conftool/dbconfig/20221027-093902-ladsgroup.json
  • 09:38 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
  • 09:38 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T321123)', diff saved to https://phabricator.wikimedia.org/P36699 and previous config saved to /var/cache/conftool/dbconfig/20221027-093804-marostegui.json
  • 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 09:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 09:37 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 09:37 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025', diff saved to https://phabricator.wikimedia.org/P36698 and previous config saved to /var/cache/conftool/dbconfig/20221027-093534-ladsgroup.json
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'Repool db2117', diff saved to https://phabricator.wikimedia.org/P36697 and previous config saved to /var/cache/conftool/dbconfig/20221027-093519-marostegui.json
  • 09:34 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host puppetdb-test2001.codfw.wmnet
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1025 (T321312)', diff saved to https://phabricator.wikimedia.org/P36696 and previous config saved to /var/cache/conftool/dbconfig/20221027-093250-ladsgroup.json
  • 09:28 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P36694 and previous config saved to /var/cache/conftool/dbconfig/20221027-092842-marostegui.json
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P36693 and previous config saved to /var/cache/conftool/dbconfig/20221027-092636-ladsgroup.json
  • 09:24 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host puppetdb-test2001.codfw.wmnet
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T318950)', diff saved to https://phabricator.wikimedia.org/P36692 and previous config saved to /var/cache/conftool/dbconfig/20221027-092355-ladsgroup.json
  • 09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2025 (T321312)', diff saved to https://phabricator.wikimedia.org/P36691 and previous config saved to /var/cache/conftool/dbconfig/20221027-092028-ladsgroup.json
  • 09:17 moritzm: failover ganeti master in ulsfo to ganeti4008, unblocking future decom of ganeti4003 T317247
  • 09:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T318950)', diff saved to https://phabricator.wikimedia.org/P36689 and previous config saved to /var/cache/conftool/dbconfig/20221027-091603-ladsgroup.json
  • 09:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 09:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T318950)', diff saved to https://phabricator.wikimedia.org/P36688 and previous config saved to /var/cache/conftool/dbconfig/20221027-091536-ladsgroup.json
  • 09:13 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P36687 and previous config saved to /var/cache/conftool/dbconfig/20221027-091336-marostegui.json
  • 09:13 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:13 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T318605)', diff saved to https://phabricator.wikimedia.org/P36686 and previous config saved to /var/cache/conftool/dbconfig/20221027-091249-ladsgroup.json
  • 09:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 09:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P36685 and previous config saved to /var/cache/conftool/dbconfig/20221027-091227-ladsgroup.json
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P36684 and previous config saved to /var/cache/conftool/dbconfig/20221027-091130-ladsgroup.json
  • 09:10 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
  • 09:10 elukey@deploy1002: helmfile [codfw] START helmfile.d/admin 'sync'.
  • 09:10 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 09:10 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 09:09 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 09:09 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 09:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P36683 and previous config saved to /var/cache/conftool/dbconfig/20221027-090030-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1025 (T321312)', diff saved to https://phabricator.wikimedia.org/P36682 and previous config saved to /var/cache/conftool/dbconfig/20221027-085934-ladsgroup.json
  • 08:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1025.eqiad.wmnet with reason: Maintenance
  • 08:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T321312)', diff saved to https://phabricator.wikimedia.org/P36681 and previous config saved to /var/cache/conftool/dbconfig/20221027-085859-ladsgroup.json
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T321123)', diff saved to https://phabricator.wikimedia.org/P36680 and previous config saved to /var/cache/conftool/dbconfig/20221027-085829-marostegui.json
  • 08:57 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:57 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 08:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P36679 and previous config saved to /var/cache/conftool/dbconfig/20221027-085720-ladsgroup.json
  • 08:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T321123)', diff saved to https://phabricator.wikimedia.org/P36678 and previous config saved to /var/cache/conftool/dbconfig/20221027-085617-marostegui.json
  • 08:56 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 08:56 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 08:55 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 08:55 marostegui@deploy1002: Finished scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" (duration: 04m 52s)
  • 08:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:55 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 08:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:50 marostegui@deploy1002: marostegui and marostegui: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 08:50 marostegui@deploy1002: Started scap: Backport for Revert "ProductionServices.php: Promote pc1014 to pc3 master"
  • 08:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P36676 and previous config saved to /var/cache/conftool/dbconfig/20221027-084523-ladsgroup.json
  • 08:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P36675 and previous config saved to /var/cache/conftool/dbconfig/20221027-084352-ladsgroup.json
  • 08:43 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:43 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P36674 and previous config saved to /var/cache/conftool/dbconfig/20221027-084214-ladsgroup.json
  • 08:37 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:37 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:36 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:36 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:32 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 08:32 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 08:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T318950)', diff saved to https://phabricator.wikimedia.org/P36673 and previous config saved to /var/cache/conftool/dbconfig/20221027-083017-ladsgroup.json
  • 08:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024', diff saved to https://phabricator.wikimedia.org/P36672 and previous config saved to /var/cache/conftool/dbconfig/20221027-082846-ladsgroup.json
  • 08:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P36671 and previous config saved to /var/cache/conftool/dbconfig/20221027-082707-ladsgroup.json
  • 08:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T318950)', diff saved to https://phabricator.wikimedia.org/P36670 and previous config saved to /var/cache/conftool/dbconfig/20221027-082211-ladsgroup.json
  • 08:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 08:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2025 (T321312)', diff saved to https://phabricator.wikimedia.org/P36669 and previous config saved to /var/cache/conftool/dbconfig/20221027-082157-ladsgroup.json
  • 08:21 marostegui@deploy1002: Finished scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master (duration: 04m 22s)
  • 08:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023 (T321312)', diff saved to https://phabricator.wikimedia.org/P36668 and previous config saved to /var/cache/conftool/dbconfig/20221027-082131-ladsgroup.json
  • 08:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:20 elukey: powercycle elastic2043 - no mgmt console tty available, not responsive to ssh, memory/dimm errors in `racadm getsel`
  • 08:19 jbond: upload vim python3-stdlib-extensions to buster componet/python39
  • 08:17 marostegui@deploy1002: marostegui and marostegui: Backport for ProductionServices.php: Promote pc1014 to pc3 master synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 08:17 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1009.eqiad.wmnet to cluster eqiad and group C
  • 08:17 marostegui@deploy1002: Started scap: Backport for ProductionServices.php: Promote pc1014 to pc3 master
  • 08:16 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1009.eqiad.wmnet to cluster eqiad and group C
  • 08:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 08:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 08:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T318950)', diff saved to https://phabricator.wikimedia.org/P36667 and previous config saved to /var/cache/conftool/dbconfig/20221027-081534-ladsgroup.json
  • 08:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es1024 (T321312)', diff saved to https://phabricator.wikimedia.org/P36666 and previous config saved to /var/cache/conftool/dbconfig/20221027-081339-ladsgroup.json
  • 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:11 jnuche@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.7 refs T320512
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T318950)', diff saved to https://phabricator.wikimedia.org/P36665 and previous config saved to /var/cache/conftool/dbconfig/20221027-081113-ladsgroup.json
  • 08:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T318950)', diff saved to https://phabricator.wikimedia.org/P36664 and previous config saved to /var/cache/conftool/dbconfig/20221027-081103-ladsgroup.json
  • 08:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023', diff saved to https://phabricator.wikimedia.org/P36663 and previous config saved to /var/cache/conftool/dbconfig/20221027-080625-ladsgroup.json
  • 08:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P36662 and previous config saved to /var/cache/conftool/dbconfig/20221027-080027-ladsgroup.json
  • 07:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P36661 and previous config saved to /var/cache/conftool/dbconfig/20221027-075556-ladsgroup.json
  • 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagetcd1005.eqiad.wmnet to plain
  • 07:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es1024 (T321312)', diff saved to https://phabricator.wikimedia.org/P36660 and previous config saved to /var/cache/conftool/dbconfig/20221027-075433-ladsgroup.json
  • 07:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
  • 07:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es1024.eqiad.wmnet with reason: Maintenance
  • 07:53 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagetcd1005.eqiad.wmnet to plain
  • 07:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T318605)', diff saved to https://phabricator.wikimedia.org/P36659 and previous config saved to /var/cache/conftool/dbconfig/20221027-075327-ladsgroup.json
  • 07:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 07:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023', diff saved to https://phabricator.wikimedia.org/P36658 and previous config saved to /var/cache/conftool/dbconfig/20221027-075118-ladsgroup.json
  • 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagetcd1005.eqiad.wmnet to drbd
  • 07:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P36657 and previous config saved to /var/cache/conftool/dbconfig/20221027-074521-ladsgroup.json
  • 07:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P36656 and previous config saved to /var/cache/conftool/dbconfig/20221027-074050-ladsgroup.json
  • 07:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
  • 07:39 dcausse: restarting blazegraph on wdqs1016 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 07:38 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagetcd1005.eqiad.wmnet to drbd
  • 07:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance es2023 (T321312)', diff saved to https://phabricator.wikimedia.org/P36655 and previous config saved to /var/cache/conftool/dbconfig/20221027-073612-ladsgroup.json
  • 07:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
  • 07:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T318950)', diff saved to https://phabricator.wikimedia.org/P36654 and previous config saved to /var/cache/conftool/dbconfig/20221027-073014-ladsgroup.json
  • 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1009.eqiad.wmnet with OS bullseye
  • 07:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T318950)', diff saved to https://phabricator.wikimedia.org/P36653 and previous config saved to /var/cache/conftool/dbconfig/20221027-072543-ladsgroup.json
  • 07:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling es2023 (T321312)', diff saved to https://phabricator.wikimedia.org/P36652 and previous config saved to /var/cache/conftool/dbconfig/20221027-072536-ladsgroup.json
  • 07:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2023.codfw.wmnet with reason: Maintenance
  • 07:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2023.codfw.wmnet with reason: Maintenance
  • 07:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T318950)', diff saved to https://phabricator.wikimedia.org/P36651 and previous config saved to /var/cache/conftool/dbconfig/20221027-072219-ladsgroup.json
  • 07:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 07:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T318950)', diff saved to https://phabricator.wikimedia.org/P36650 and previous config saved to /var/cache/conftool/dbconfig/20221027-072157-ladsgroup.json
  • 07:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T318950)', diff saved to https://phabricator.wikimedia.org/P36649 and previous config saved to /var/cache/conftool/dbconfig/20221027-072148-ladsgroup.json
  • 07:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 07:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 07:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T318950)', diff saved to https://phabricator.wikimedia.org/P36648 and previous config saved to /var/cache/conftool/dbconfig/20221027-071934-ladsgroup.json
  • 07:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1009.eqiad.wmnet with reason: host reimage
  • 07:09 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1009.eqiad.wmnet with reason: host reimage
  • 07:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P36647 and previous config saved to /var/cache/conftool/dbconfig/20221027-070644-ladsgroup.json
  • 07:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P36646 and previous config saved to /var/cache/conftool/dbconfig/20221027-070427-ladsgroup.json
  • 06:54 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1009.eqiad.wmnet with OS bullseye
  • 06:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P36645 and previous config saved to /var/cache/conftool/dbconfig/20221027-065138-ladsgroup.json
  • 06:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1009.eqiad.wmnet with reason: Remove from cluster for eventual reimage
  • 06:49 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1009.eqiad.wmnet with reason: Remove from cluster for eventual reimage
  • 06:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P36644 and previous config saved to /var/cache/conftool/dbconfig/20221027-064921-ladsgroup.json
  • 06:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T318950)', diff saved to https://phabricator.wikimedia.org/P36643 and previous config saved to /var/cache/conftool/dbconfig/20221027-063631-ladsgroup.json
  • 06:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 06:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1100.eqiad.wmnet with reason: Maintenance
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1100 T321178', diff saved to https://phabricator.wikimedia.org/P36639 and previous config saved to /var/cache/conftool/dbconfig/20221027-060654-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1130 to s5 primary and set section read-write T321178', diff saved to https://phabricator.wikimedia.org/P36638 and previous config saved to /var/cache/conftool/dbconfig/20221027-060137-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T321178', diff saved to https://phabricator.wikimedia.org/P36637 and previous config saved to /var/cache/conftool/dbconfig/20221027-060102-ladsgroup.json
  • 06:00 Amir1: Starting s5 eqiad failover from db1100 to db1130 - T321178
  • 05:35 marostegui: Deploy schema change on x1 T318518
  • 05:28 marostegui: dbmaint Switch x1 to SBR T318518
  • 05:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1130 with weight 0 T321178', diff saved to https://phabricator.wikimedia.org/P36636 and previous config saved to /var/cache/conftool/dbconfig/20221027-052127-ladsgroup.json
  • 05:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T321178
  • 05:20 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T321178
  • 02:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:56 tstarling@deploy1002: Synchronized wmf-config/UcfirstOverrides.php: T292552 final configuration (duration: 03m 54s)
  • 02:41 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:40 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:40 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:30 tstarling@deploy1002: Synchronized wmf-config/UcfirstOverrides.php: T292552 allow title case ligatures (duration: 03m 36s)
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:13 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS buster
  • 02:10 tstarling@deploy1002: Synchronized php-1.40.0-wmf.7/includes/language/Language.php: T292552 (duration: 03m 39s)
  • 02:06 tstarling@deploy1002: Synchronized php-1.40.0-wmf.6/includes/language/Language.php: T292552 (duration: 03m 40s)
  • 02:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 01:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 01:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 01:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 01:47 ejegg: payments-wiki upgraded from 4f923066 to 61cf970b
  • 01:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
  • 01:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
  • 01:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS buster
  • 01:11 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4043.ulsfo.wmnet with OS buster
  • 00:59 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS buster
  • 00:35 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS buster
  • 00:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
  • 00:05 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage

2022-10-26

  • 23:40 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS buster
  • 23:23 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4048.ulsfo.wmnet with OS buster
  • 23:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T321312)', diff saved to https://phabricator.wikimedia.org/P36635 and previous config saved to /var/cache/conftool/dbconfig/20221026-232136-ladsgroup.json
  • 23:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P36634 and previous config saved to /var/cache/conftool/dbconfig/20221026-230630-ladsgroup.json
  • 22:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P36633 and previous config saved to /var/cache/conftool/dbconfig/20221026-225123-ladsgroup.json
  • 22:49 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4048.ulsfo.wmnet with OS buster
  • 22:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T321312)', diff saved to https://phabricator.wikimedia.org/P36632 and previous config saved to /var/cache/conftool/dbconfig/20221026-223617-ladsgroup.json
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1198 (T321312)', diff saved to https://phabricator.wikimedia.org/P36631 and previous config saved to /var/cache/conftool/dbconfig/20221026-222956-ladsgroup.json
  • 22:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 22:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T321312)', diff saved to https://phabricator.wikimedia.org/P36630 and previous config saved to /var/cache/conftool/dbconfig/20221026-222932-ladsgroup.json
  • 22:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P36629 and previous config saved to /var/cache/conftool/dbconfig/20221026-221426-ladsgroup.json
  • 21:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P36628 and previous config saved to /var/cache/conftool/dbconfig/20221026-215919-ladsgroup.json
  • 21:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T321312)', diff saved to https://phabricator.wikimedia.org/P36627 and previous config saved to /var/cache/conftool/dbconfig/20221026-214412-ladsgroup.json
  • 21:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1189 (T321312)', diff saved to https://phabricator.wikimedia.org/P36626 and previous config saved to /var/cache/conftool/dbconfig/20221026-213801-ladsgroup.json
  • 21:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 21:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T321312)', diff saved to https://phabricator.wikimedia.org/P36625 and previous config saved to /var/cache/conftool/dbconfig/20221026-213737-ladsgroup.json
  • 21:28 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:23 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P36624 and previous config saved to /var/cache/conftool/dbconfig/20221026-212230-ladsgroup.json
  • 21:22 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:16 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:15 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:08 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179', diff saved to https://phabricator.wikimedia.org/P36623 and previous config saved to /var/cache/conftool/dbconfig/20221026-210724-ladsgroup.json
  • 20:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1179 (T321312)', diff saved to https://phabricator.wikimedia.org/P36622 and previous config saved to /var/cache/conftool/dbconfig/20221026-205218-ladsgroup.json
  • 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1179 (T321312)', diff saved to https://phabricator.wikimedia.org/P36621 and previous config saved to /var/cache/conftool/dbconfig/20221026-204553-ladsgroup.json
  • 20:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 20:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 20:45 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T321312)', diff saved to https://phabricator.wikimedia.org/P36620 and previous config saved to /var/cache/conftool/dbconfig/20221026-204529-ladsgroup.json
  • 20:42 urbanecm: Deploying security patch for T321733
  • 20:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P36619 and previous config saved to /var/cache/conftool/dbconfig/20221026-203022-ladsgroup.json
  • 20:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P36618 and previous config saved to /var/cache/conftool/dbconfig/20221026-201516-ladsgroup.json
  • 20:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4046.ulsfo.wmnet with OS buster
  • 20:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T321312)', diff saved to https://phabricator.wikimedia.org/P36617 and previous config saved to /var/cache/conftool/dbconfig/20221026-200009-ladsgroup.json
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1175 (T321312)', diff saved to https://phabricator.wikimedia.org/P36616 and previous config saved to /var/cache/conftool/dbconfig/20221026-195342-ladsgroup.json
  • 19:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T321312)', diff saved to https://phabricator.wikimedia.org/P36615 and previous config saved to /var/cache/conftool/dbconfig/20221026-195318-ladsgroup.json
  • 19:42 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
  • 19:39 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4046.ulsfo.wmnet with reason: host reimage
  • 19:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P36614 and previous config saved to /var/cache/conftool/dbconfig/20221026-193811-ladsgroup.json
  • 19:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P36613 and previous config saved to /var/cache/conftool/dbconfig/20221026-192305-ladsgroup.json
  • 19:13 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 19:10 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS buster
  • 19:10 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 19:08 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4046.ulsfo.wmnet with OS buster
  • 19:08 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T321312)', diff saved to https://phabricator.wikimedia.org/P36612 and previous config saved to /var/cache/conftool/dbconfig/20221026-190758-ladsgroup.json
  • 19:00 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS buster
  • 18:45 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 18:41 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS buster
  • 18:31 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 18:09 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS buster
  • 18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1166 (T321312)', diff saved to https://phabricator.wikimedia.org/P36611 and previous config saved to /var/cache/conftool/dbconfig/20221026-180742-ladsgroup.json
  • 18:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 18:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 18:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T321312)', diff saved to https://phabricator.wikimedia.org/P36610 and previous config saved to /var/cache/conftool/dbconfig/20221026-180718-ladsgroup.json
  • 18:01 Amir1: dbmaint on s8@eqiad (T321562)
  • 18:01 Amir1: dbmaint on s5@eqiad (T321562)
  • 18:00 Amir1: dbmaint on s3@eqiad (T321562)
  • 18:00 Amir1: dbmaint on s1@eqiad (T321562)
  • 17:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P36609 and previous config saved to /var/cache/conftool/dbconfig/20221026-175212-ladsgroup.json
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T321312)', diff saved to https://phabricator.wikimedia.org/P36607 and previous config saved to /var/cache/conftool/dbconfig/20221026-172159-ladsgroup.json
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1157 (T321312)', diff saved to https://phabricator.wikimedia.org/P36606 and previous config saved to /var/cache/conftool/dbconfig/20221026-171534-ladsgroup.json
  • 17:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 17:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 17:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T321312)', diff saved to https://phabricator.wikimedia.org/P36605 and previous config saved to /var/cache/conftool/dbconfig/20221026-171508-ladsgroup.json
  • 17:02 hashar@deploy1002: Finished deploy [releng/phatality@d8dfa72]: Update Phatality on codfw for OpenSearch Dashboard 2.2.0 # T304440 (duration: 00m 27s)
  • 17:01 hashar@deploy1002: Started deploy [releng/phatality@d8dfa72]: Update Phatality on codfw for OpenSearch Dashboard 2.2.0 # T304440
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P36604 and previous config saved to /var/cache/conftool/dbconfig/20221026-170001-ladsgroup.json
  • 16:54 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS buster
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112', diff saved to https://phabricator.wikimedia.org/P36603 and previous config saved to /var/cache/conftool/dbconfig/20221026-164455-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1112 (T321312)', diff saved to https://phabricator.wikimedia.org/P36602 and previous config saved to /var/cache/conftool/dbconfig/20221026-162948-ladsgroup.json
  • 16:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 16:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T318950)', diff saved to https://phabricator.wikimedia.org/P36601 and previous config saved to /var/cache/conftool/dbconfig/20221026-162549-ladsgroup.json
  • 16:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1112 (T321312)', diff saved to https://phabricator.wikimedia.org/P36600 and previous config saved to /var/cache/conftool/dbconfig/20221026-162316-ladsgroup.json
  • 16:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 16:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1112.eqiad.wmnet with reason: Maintenance
  • 16:11 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4040.ulsfo.wmnet with OS buster
  • 16:11 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM lists1001.wikimedia.org
  • 16:11 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM lists1001.wikimedia.org
  • 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P36599 and previous config saved to /var/cache/conftool/dbconfig/20221026-161042-ladsgroup.json
  • 16:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:04 urbanecm@deploy1002: Finished scap: Backport for Revert "kswiki: Switch to wikitext mentor provider back" (T310905) (duration: 04m 16s)
  • 16:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dumpsdata1007.eqiad.wmnet
  • 16:03 ladsgroup@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM lists1001.wikimedia.org
  • 16:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS buster
  • 16:00 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4040
  • 16:00 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4040
  • 16:00 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Revert "kswiki: Switch to wikitext mentor provider back" (T310905) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 16:00 urbanecm@deploy1002: Started scap: Backport for Revert "kswiki: Switch to wikitext mentor provider back" (T310905)
  • 15:59 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4041.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 15:58 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 15:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T321312)', diff saved to https://phabricator.wikimedia.org/P36598 and previous config saved to /var/cache/conftool/dbconfig/20221026-155738-ladsgroup.json
  • 15:57 volans@cumin2002: START - Cookbook sre.hosts.provision for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P36597 and previous config saved to /var/cache/conftool/dbconfig/20221026-155536-ladsgroup.json
  • 15:54 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dumpsdata1007.eqiad.wmnet
  • 15:54 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4040.ulsfo.wmnet with OS buster
  • 15:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf1003.eqiad.wmnet
  • 15:51 volans@cumin2002: START - Cookbook sre.hosts.provision for host cp4041.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 15:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf1003.eqiad.wmnet
  • 15:47 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4046.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 15:46 volans@cumin2002: START - Cookbook sre.hosts.provision for host cp4046.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 15:29 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 15:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4039.ulsfo.wmnet with OS buster
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137', diff saved to https://phabricator.wikimedia.org/P36591 and previous config saved to /var/cache/conftool/dbconfig/20221026-152724-ladsgroup.json
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P36590 and previous config saved to /var/cache/conftool/dbconfig/20221026-152327-ladsgroup.json
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P36589 and previous config saved to /var/cache/conftool/dbconfig/20221026-152259-ladsgroup.json
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1137 (T321312)', diff saved to https://phabricator.wikimedia.org/P36588 and previous config saved to /var/cache/conftool/dbconfig/20221026-151216-ladsgroup.json
  • 15:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P36587 and previous config saved to /var/cache/conftool/dbconfig/20221026-150821-ladsgroup.json
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P36586 and previous config saved to /var/cache/conftool/dbconfig/20221026-150752-ladsgroup.json
  • 15:02 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:59 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 14:59 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
  • 14:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1137 (T321312)', diff saved to https://phabricator.wikimedia.org/P36585 and previous config saved to /var/cache/conftool/dbconfig/20221026-145924-ladsgroup.json
  • 14:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 14:59 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 14:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1137.eqiad.wmnet with reason: Maintenance
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1103 (T321312)', diff saved to https://phabricator.wikimedia.org/P36584 and previous config saved to /var/cache/conftool/dbconfig/20221026-145848-ladsgroup.json
  • 14:56 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
  • 14:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318950)', diff saved to https://phabricator.wikimedia.org/P36581 and previous config saved to /var/cache/conftool/dbconfig/20221026-145314-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T318950)', diff saved to https://phabricator.wikimedia.org/P36580 and previous config saved to /var/cache/conftool/dbconfig/20221026-145246-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T318950)', diff saved to https://phabricator.wikimedia.org/P36579 and previous config saved to /var/cache/conftool/dbconfig/20221026-145148-ladsgroup.json
  • 14:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36578 and previous config saved to /var/cache/conftool/dbconfig/20221026-145138-ladsgroup.json
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T318950)', diff saved to https://phabricator.wikimedia.org/P36577 and previous config saved to /var/cache/conftool/dbconfig/20221026-145033-ladsgroup.json
  • 14:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T318950)', diff saved to https://phabricator.wikimedia.org/P36576 and previous config saved to /var/cache/conftool/dbconfig/20221026-145023-ladsgroup.json
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1103', diff saved to https://phabricator.wikimedia.org/P36575 and previous config saved to /var/cache/conftool/dbconfig/20221026-144341-ladsgroup.json
  • 14:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:38 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2004.codfw.wmnet
  • 14:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P36574 and previous config saved to /var/cache/conftool/dbconfig/20221026-143631-ladsgroup.json
  • 14:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:36 urbanecm@deploy1002: Finished scap: Backport for kswiki: Switch to wikitext mentor provider back (T310905) (duration: 04m 47s)
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P36573 and previous config saved to /var/cache/conftool/dbconfig/20221026-143516-ladsgroup.json
  • 14:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2004.codfw.wmnet
  • 14:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host webperf2003.codfw.wmnet
  • 14:31 urbanecm@deploy1002: urbanecm and urbanecm: Backport for kswiki: Switch to wikitext mentor provider back (T310905) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:31 urbanecm@deploy1002: Started scap: Backport for kswiki: Switch to wikitext mentor provider back (T310905)
  • 14:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS buster
  • 14:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host webperf2003.codfw.wmnet
  • 14:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1103', diff saved to https://phabricator.wikimedia.org/P36571 and previous config saved to /var/cache/conftool/dbconfig/20221026-142834-ladsgroup.json
  • 14:27 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36570 and previous config saved to /var/cache/conftool/dbconfig/20221026-142700-root.json
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36569 and previous config saved to /var/cache/conftool/dbconfig/20221026-142656-root.json
  • 14:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36568 and previous config saved to /var/cache/conftool/dbconfig/20221026-142651-root.json
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P36567 and previous config saved to /var/cache/conftool/dbconfig/20221026-142125-ladsgroup.json
  • 14:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P36566 and previous config saved to /var/cache/conftool/dbconfig/20221026-142010-ladsgroup.json
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36565 and previous config saved to /var/cache/conftool/dbconfig/20221026-141833-root.json
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36564 and previous config saved to /var/cache/conftool/dbconfig/20221026-141824-root.json
  • 14:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36563 and previous config saved to /var/cache/conftool/dbconfig/20221026-141823-root.json
  • 14:14 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS buster
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1103 (T321312)', diff saved to https://phabricator.wikimedia.org/P36562 and previous config saved to /var/cache/conftool/dbconfig/20221026-141328-ladsgroup.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36561 and previous config saved to /var/cache/conftool/dbconfig/20221026-141155-root.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36560 and previous config saved to /var/cache/conftool/dbconfig/20221026-141151-root.json
  • 14:11 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36559 and previous config saved to /var/cache/conftool/dbconfig/20221026-141146-root.json
  • 14:11 papaul: disable interface et-1/0/2 on cr1-eqiad to bounce fpc 1 pic0
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36558 and previous config saved to /var/cache/conftool/dbconfig/20221026-140618-ladsgroup.json
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1103 (T321312)', diff saved to https://phabricator.wikimedia.org/P36557 and previous config saved to /var/cache/conftool/dbconfig/20221026-140510-ladsgroup.json
  • 14:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1103.eqiad.wmnet with reason: Maintenance
  • 14:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T318950)', diff saved to https://phabricator.wikimedia.org/P36556 and previous config saved to /var/cache/conftool/dbconfig/20221026-140503-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1103.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36555 and previous config saved to /var/cache/conftool/dbconfig/20221026-140351-ladsgroup.json
  • 14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36554 and previous config saved to /var/cache/conftool/dbconfig/20221026-140328-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36553 and previous config saved to /var/cache/conftool/dbconfig/20221026-140320-root.json
  • 14:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36552 and previous config saved to /var/cache/conftool/dbconfig/20221026-140318-root.json
  • 14:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T321312)', diff saved to https://phabricator.wikimedia.org/P36551 and previous config saved to /var/cache/conftool/dbconfig/20221026-140312-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T318950)', diff saved to https://phabricator.wikimedia.org/P36550 and previous config saved to /var/cache/conftool/dbconfig/20221026-140250-ladsgroup.json
  • 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36549 and previous config saved to /var/cache/conftool/dbconfig/20221026-140219-ladsgroup.json
  • 13:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 13:45 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 13:44 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 13:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:44 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 13:43 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 13:43 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36539 and previous config saved to /var/cache/conftool/dbconfig/20221026-134145-root.json
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36538 and previous config saved to /var/cache/conftool/dbconfig/20221026-134141-root.json
  • 13:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36537 and previous config saved to /var/cache/conftool/dbconfig/20221026-134136-root.json
  • 13:39 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 13:38 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 13:38 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 13:38 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 13:37 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 13:37 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 13:35 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:34 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:33 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P36536 and previous config saved to /var/cache/conftool/dbconfig/20221026-133317-ladsgroup.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36535 and previous config saved to /var/cache/conftool/dbconfig/20221026-133312-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36534 and previous config saved to /var/cache/conftool/dbconfig/20221026-133308-root.json
  • 13:33 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36533 and previous config saved to /var/cache/conftool/dbconfig/20221026-133303-root.json
  • 13:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P36532 and previous config saved to /var/cache/conftool/dbconfig/20221026-133259-ladsgroup.json
  • 13:33 urbanecm: UTC afternoon B&C window done
  • 13:32 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P36531 and previous config saved to /var/cache/conftool/dbconfig/20221026-133206-ladsgroup.json
  • 13:30 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: Enable link recommendation frontend for 5th round (T304549) (duration: 05m 52s)
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:29 sukhe: sudo ipmitool -I lanplus -H "cp4039.mgmt.ulsfo.wmnet" -U root -E chassis power cycle
  • 13:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:28 moritzm: installing curl security updates on buster
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36530 and previous config saved to /var/cache/conftool/dbconfig/20221026-132640-root.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36529 and previous config saved to /var/cache/conftool/dbconfig/20221026-132637-root.json
  • 13:26 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36528 and previous config saved to /var/cache/conftool/dbconfig/20221026-132631-root.json
  • 13:25 urbanecm@deploy1002: urbanecm and kharlan: Backport for GrowthExperiments: Enable link recommendation frontend for 5th round (T304549) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:24 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: Enable link recommendation frontend for 5th round (T304549)
  • 13:23 urbanecm@deploy1002: Finished scap: Backport for [Growth] Enable structured mentor list everywhere (T310905) (duration: 10m 28s)
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:22 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS buster
  • 13:21 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:21 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T318950)', diff saved to https://phabricator.wikimedia.org/P36527 and previous config saved to /var/cache/conftool/dbconfig/20221026-131810-ladsgroup.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36526 and previous config saved to /var/cache/conftool/dbconfig/20221026-131807-root.json
  • 13:18 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36525 and previous config saved to /var/cache/conftool/dbconfig/20221026-131803-root.json
  • 13:17 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36524 and previous config saved to /var/cache/conftool/dbconfig/20221026-131758-root.json
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T321312)', diff saved to https://phabricator.wikimedia.org/P36523 and previous config saved to /var/cache/conftool/dbconfig/20221026-131752-ladsgroup.json
  • 13:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36522 and previous config saved to /var/cache/conftool/dbconfig/20221026-131659-ladsgroup.json
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T318950)', diff saved to https://phabricator.wikimedia.org/P36521 and previous config saved to /var/cache/conftool/dbconfig/20221026-131544-ladsgroup.json
  • 13:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 13:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36520 and previous config saved to /var/cache/conftool/dbconfig/20221026-131534-ladsgroup.json
  • 13:15 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:03 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36513 and previous config saved to /var/cache/conftool/dbconfig/20221026-130302-root.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36512 and previous config saved to /var/cache/conftool/dbconfig/20221026-130258-root.json
  • 13:02 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36511 and previous config saved to /var/cache/conftool/dbconfig/20221026-130253-root.json
  • 13:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P36510 and previous config saved to /var/cache/conftool/dbconfig/20221026-130027-ladsgroup.json
  • 12:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P36509 and previous config saved to /var/cache/conftool/dbconfig/20221026-125918-ladsgroup.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36508 and previous config saved to /var/cache/conftool/dbconfig/20221026-125630-root.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36507 and previous config saved to /var/cache/conftool/dbconfig/20221026-125626-root.json
  • 12:56 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36506 and previous config saved to /var/cache/conftool/dbconfig/20221026-125621-root.json
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P36505 and previous config saved to /var/cache/conftool/dbconfig/20221026-125604-ladsgroup.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36504 and previous config saved to /var/cache/conftool/dbconfig/20221026-124757-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36503 and previous config saved to /var/cache/conftool/dbconfig/20221026-124753-root.json
  • 12:47 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36502 and previous config saved to /var/cache/conftool/dbconfig/20221026-124748-root.json
  • 12:47 moritzm: installing isc-dhcp security updates
  • 12:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P36501 and previous config saved to /var/cache/conftool/dbconfig/20221026-124521-ladsgroup.json
  • 12:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P36500 and previous config saved to /var/cache/conftool/dbconfig/20221026-124411-ladsgroup.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1032 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36499 and previous config saved to /var/cache/conftool/dbconfig/20221026-124125-root.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1034 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36498 and previous config saved to /var/cache/conftool/dbconfig/20221026-124121-root.json
  • 12:41 marostegui@cumin1001: dbctl commit (dc=all): 'es1033 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36497 and previous config saved to /var/cache/conftool/dbconfig/20221026-124116-root.json
  • 12:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P36496 and previous config saved to /var/cache/conftool/dbconfig/20221026-124057-ladsgroup.json
  • 12:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to plain
  • 12:38 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to plain
  • 12:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1034 es1033 es1032', diff saved to https://phabricator.wikimedia.org/P36495 and previous config saved to /var/cache/conftool/dbconfig/20221026-123545-root.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2032 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36494 and previous config saved to /var/cache/conftool/dbconfig/20221026-123252-root.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2033 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36493 and previous config saved to /var/cache/conftool/dbconfig/20221026-123248-root.json
  • 12:32 marostegui@cumin1001: dbctl commit (dc=all): 'es2034 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36492 and previous config saved to /var/cache/conftool/dbconfig/20221026-123243-root.json
  • 12:32 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS buster
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36491 and previous config saved to /var/cache/conftool/dbconfig/20221026-123014-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T318950)', diff saved to https://phabricator.wikimedia.org/P36490 and previous config saved to /var/cache/conftool/dbconfig/20221026-122905-ladsgroup.json
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36489 and previous config saved to /var/cache/conftool/dbconfig/20221026-122748-ladsgroup.json
  • 12:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 12:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T318950)', diff saved to https://phabricator.wikimedia.org/P36488 and previous config saved to /var/cache/conftool/dbconfig/20221026-122726-ladsgroup.json
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T318950)', diff saved to https://phabricator.wikimedia.org/P36487 and previous config saved to /var/cache/conftool/dbconfig/20221026-122652-ladsgroup.json
  • 12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36486 and previous config saved to /var/cache/conftool/dbconfig/20221026-122641-ladsgroup.json
  • 12:26 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2034 es20233 es2032', diff saved to https://phabricator.wikimedia.org/P36485 and previous config saved to /var/cache/conftool/dbconfig/20221026-122632-root.json
  • 12:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T321312)', diff saved to https://phabricator.wikimedia.org/P36484 and previous config saved to /var/cache/conftool/dbconfig/20221026-122550-ladsgroup.json
  • 12:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to drbd
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1188 (T321312)', diff saved to https://phabricator.wikimedia.org/P36483 and previous config saved to /var/cache/conftool/dbconfig/20221026-121928-ladsgroup.json
  • 12:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T321312)', diff saved to https://phabricator.wikimedia.org/P36482 and previous config saved to /var/cache/conftool/dbconfig/20221026-121853-ladsgroup.json
  • 12:13 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1003.eqiad.wmnet to drbd
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P36481 and previous config saved to /var/cache/conftool/dbconfig/20221026-121220-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P36480 and previous config saved to /var/cache/conftool/dbconfig/20221026-121122-ladsgroup.json
  • 12:06 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS buster
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P36479 and previous config saved to /var/cache/conftool/dbconfig/20221026-120346-ladsgroup.json
  • 12:02 moritzm: draining ganeti1009 for eventual reimage T311687
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P36478 and previous config saved to /var/cache/conftool/dbconfig/20221026-115714-ladsgroup.json
  • 11:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P36477 and previous config saved to /var/cache/conftool/dbconfig/20221026-115615-ladsgroup.json
  • 11:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P36476 and previous config saved to /var/cache/conftool/dbconfig/20221026-114840-ladsgroup.json
  • 11:46 sukhe: sudo ipmitool -I lanplus -H "cp4046.mgmt.ulsfo.wmnet" -U root -E chassis power cycle
  • 11:45 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS buster
  • 11:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T318950)', diff saved to https://phabricator.wikimedia.org/P36475 and previous config saved to /var/cache/conftool/dbconfig/20221026-114207-ladsgroup.json
  • 11:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36474 and previous config saved to /var/cache/conftool/dbconfig/20221026-114109-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T318950)', diff saved to https://phabricator.wikimedia.org/P36473 and previous config saved to /var/cache/conftool/dbconfig/20221026-113941-ladsgroup.json
  • 11:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T318950)', diff saved to https://phabricator.wikimedia.org/P36472 and previous config saved to /var/cache/conftool/dbconfig/20221026-113925-ladsgroup.json
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36471 and previous config saved to /var/cache/conftool/dbconfig/20221026-113856-ladsgroup.json
  • 11:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 11:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 11:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T318950)', diff saved to https://phabricator.wikimedia.org/P36470 and previous config saved to /var/cache/conftool/dbconfig/20221026-113835-ladsgroup.json
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T321312)', diff saved to https://phabricator.wikimedia.org/P36469 and previous config saved to /var/cache/conftool/dbconfig/20221026-113333-ladsgroup.json
  • 11:33 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4046.ulsfo.wmnet with OS buster
  • 11:29 sukhe@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4046
  • 11:29 sukhe@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4046
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1182 (T321312)', diff saved to https://phabricator.wikimedia.org/P36468 and previous config saved to /var/cache/conftool/dbconfig/20221026-112634-ladsgroup.json
  • 11:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 11:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T321312)', diff saved to https://phabricator.wikimedia.org/P36467 and previous config saved to /var/cache/conftool/dbconfig/20221026-112609-ladsgroup.json
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P36466 and previous config saved to /var/cache/conftool/dbconfig/20221026-112419-ladsgroup.json
  • 11:24 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4046.ulsfo.wmnet with OS buster
  • 11:23 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1015.eqiad.wmnet to cluster eqiad and group B
  • 11:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P36465 and previous config saved to /var/cache/conftool/dbconfig/20221026-112328-ladsgroup.json
  • 11:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1015.eqiad.wmnet to cluster eqiad and group B
  • 11:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1023.eqiad.wmnet to cluster eqiad and group B
  • 11:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1023.eqiad.wmnet to cluster eqiad and group B
  • 11:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P36461 and previous config saved to /var/cache/conftool/dbconfig/20221026-105556-ladsgroup.json
  • 10:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T318950)', diff saved to https://phabricator.wikimedia.org/P36460 and previous config saved to /var/cache/conftool/dbconfig/20221026-105406-ladsgroup.json
  • 10:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T318950)', diff saved to https://phabricator.wikimedia.org/P36459 and previous config saved to /var/cache/conftool/dbconfig/20221026-105315-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T318950)', diff saved to https://phabricator.wikimedia.org/P36458 and previous config saved to /var/cache/conftool/dbconfig/20221026-105140-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T318950)', diff saved to https://phabricator.wikimedia.org/P36457 and previous config saved to /var/cache/conftool/dbconfig/20221026-105129-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T318950)', diff saved to https://phabricator.wikimedia.org/P36456 and previous config saved to /var/cache/conftool/dbconfig/20221026-105102-ladsgroup.json
  • 10:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 10:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36455 and previous config saved to /var/cache/conftool/dbconfig/20221026-105052-ladsgroup.json
  • 10:50 dcausse: restarting blazegraph on wdqs1007 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 10:47 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1023.eqiad.wmnet to cluster eqiad and group A
  • 10:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:46 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1023.eqiad.wmnet to cluster eqiad and group A
  • 10:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T321312)', diff saved to https://phabricator.wikimedia.org/P36454 and previous config saved to /var/cache/conftool/dbconfig/20221026-104050-ladsgroup.json
  • 10:38 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P36453 and previous config saved to /var/cache/conftool/dbconfig/20221026-103623-ladsgroup.json
  • 10:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P36452 and previous config saved to /var/cache/conftool/dbconfig/20221026-103545-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1162 (T321312)', diff saved to https://phabricator.wikimedia.org/P36451 and previous config saved to /var/cache/conftool/dbconfig/20221026-103432-ladsgroup.json
  • 10:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 10:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T321312)', diff saved to https://phabricator.wikimedia.org/P36450 and previous config saved to /var/cache/conftool/dbconfig/20221026-103407-ladsgroup.json
  • 10:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T321312)', diff saved to https://phabricator.wikimedia.org/P36449 and previous config saved to /var/cache/conftool/dbconfig/20221026-103138-ladsgroup.json
  • 10:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P36448 and previous config saved to /var/cache/conftool/dbconfig/20221026-102116-ladsgroup.json
  • 10:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P36447 and previous config saved to /var/cache/conftool/dbconfig/20221026-102039-ladsgroup.json
  • 10:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P36446 and previous config saved to /var/cache/conftool/dbconfig/20221026-101901-ladsgroup.json
  • 10:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P36445 and previous config saved to /var/cache/conftool/dbconfig/20221026-101631-ladsgroup.json
  • 10:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T318950)', diff saved to https://phabricator.wikimedia.org/P36444 and previous config saved to /var/cache/conftool/dbconfig/20221026-100610-ladsgroup.json
  • 10:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36443 and previous config saved to /var/cache/conftool/dbconfig/20221026-100532-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P36442 and previous config saved to /var/cache/conftool/dbconfig/20221026-100354-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T318950)', diff saved to https://phabricator.wikimedia.org/P36441 and previous config saved to /var/cache/conftool/dbconfig/20221026-100344-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T318950)', diff saved to https://phabricator.wikimedia.org/P36440 and previous config saved to /var/cache/conftool/dbconfig/20221026-100319-ladsgroup.json
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 10:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 10:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 10:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131', diff saved to https://phabricator.wikimedia.org/P36439 and previous config saved to /var/cache/conftool/dbconfig/20221026-100125-ladsgroup.json
  • 09:59 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9038
  • 09:59 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9038
  • 09:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 59605
  • 09:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 59605
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T321312)', diff saved to https://phabricator.wikimedia.org/P36438 and previous config saved to /var/cache/conftool/dbconfig/20221026-094841-ladsgroup.json
  • 09:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2131 (T321312)', diff saved to https://phabricator.wikimedia.org/P36437 and previous config saved to /var/cache/conftool/dbconfig/20221026-094619-ladsgroup.json
  • 09:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1156 (T321312)', diff saved to https://phabricator.wikimedia.org/P36436 and previous config saved to /var/cache/conftool/dbconfig/20221026-094226-ladsgroup.json
  • 09:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T321312)', diff saved to https://phabricator.wikimedia.org/P36435 and previous config saved to /var/cache/conftool/dbconfig/20221026-094154-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2131 (T321312)', diff saved to https://phabricator.wikimedia.org/P36434 and previous config saved to /var/cache/conftool/dbconfig/20221026-093842-ladsgroup.json
  • 09:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2131.codfw.wmnet with reason: Maintenance
  • 09:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T321312)', diff saved to https://phabricator.wikimedia.org/P36433 and previous config saved to /var/cache/conftool/dbconfig/20221026-093816-ladsgroup.json
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Maint finished', diff saved to https://phabricator.wikimedia.org/P36432 and previous config saved to /var/cache/conftool/dbconfig/20221026-093509-ladsgroup.json
  • 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 09:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P36431 and previous config saved to /var/cache/conftool/dbconfig/20221026-092647-ladsgroup.json
  • 09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1015.eqiad.wmnet with OS bullseye
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P36430 and previous config saved to /var/cache/conftool/dbconfig/20221026-092310-ladsgroup.json
  • 09:20 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Maint finished', diff saved to https://phabricator.wikimedia.org/P36429 and previous config saved to /var/cache/conftool/dbconfig/20221026-092004-ladsgroup.json
  • 09:17 jbond: add netbx yes will do thanks
  • 09:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129', diff saved to https://phabricator.wikimedia.org/P36428 and previous config saved to /var/cache/conftool/dbconfig/20221026-091141-ladsgroup.json
  • 09:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1015.eqiad.wmnet with reason: host reimage
  • 09:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115', diff saved to https://phabricator.wikimedia.org/P36427 and previous config saved to /var/cache/conftool/dbconfig/20221026-090803-ladsgroup.json
  • 09:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1015.eqiad.wmnet with reason: host reimage
  • 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Maint finished', diff saved to https://phabricator.wikimedia.org/P36426 and previous config saved to /var/cache/conftool/dbconfig/20221026-090459-ladsgroup.json
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1129 (T321312)', diff saved to https://phabricator.wikimedia.org/P36425 and previous config saved to /var/cache/conftool/dbconfig/20221026-085634-ladsgroup.json
  • 08:55 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:55 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2115 (T321312)', diff saved to https://phabricator.wikimedia.org/P36424 and previous config saved to /var/cache/conftool/dbconfig/20221026-085257-ladsgroup.json
  • 08:52 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1015.eqiad.wmnet with OS bullseye
  • 08:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1129 (T321312)', diff saved to https://phabricator.wikimedia.org/P36423 and previous config saved to /var/cache/conftool/dbconfig/20221026-085022-ladsgroup.json
  • 08:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 08:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1129.eqiad.wmnet with reason: Maintenance
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Maint finished', diff saved to https://phabricator.wikimedia.org/P36422 and previous config saved to /var/cache/conftool/dbconfig/20221026-084954-ladsgroup.json
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P36421 and previous config saved to /var/cache/conftool/dbconfig/20221026-084922-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2115 (T321312)', diff saved to https://phabricator.wikimedia.org/P36420 and previous config saved to /var/cache/conftool/dbconfig/20221026-084741-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2115.codfw.wmnet with reason: Maintenance
  • 08:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2115.codfw.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1102.eqiad.wmnet with reason: Maintenance
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36418 and previous config saved to /var/cache/conftool/dbconfig/20221026-083157-root.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36417 and previous config saved to /var/cache/conftool/dbconfig/20221026-083149-root.json
  • 08:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36416 and previous config saved to /var/cache/conftool/dbconfig/20221026-083142-root.json
  • 08:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1015.eqiad.wmnet with reason: Remove from cluster for eventual reimage
  • 08:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1015.eqiad.wmnet with reason: Remove from cluster for eventual reimage
  • 08:20 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 08:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 08:18 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 08:18 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 08:18 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 08:17 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36415 and previous config saved to /var/cache/conftool/dbconfig/20221026-081652-root.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36414 and previous config saved to /var/cache/conftool/dbconfig/20221026-081644-root.json
  • 08:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36413 and previous config saved to /var/cache/conftool/dbconfig/20221026-081637-root.json
  • 08:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:15 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 08:14 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:12 jnuche@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.7 refs T320512 (duration: 03m 46s)
  • 08:12 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 08:11 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 08:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:11 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 08:10 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 08:08 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.7 refs T320512
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36412 and previous config saved to /var/cache/conftool/dbconfig/20221026-080643-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36411 and previous config saved to /var/cache/conftool/dbconfig/20221026-080638-root.json
  • 08:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36410 and previous config saved to /var/cache/conftool/dbconfig/20221026-080631-root.json
  • 08:02 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 08:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36409 and previous config saved to /var/cache/conftool/dbconfig/20221026-080147-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36408 and previous config saved to /var/cache/conftool/dbconfig/20221026-080139-root.json
  • 08:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36407 and previous config saved to /var/cache/conftool/dbconfig/20221026-080132-root.json
  • 08:00 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 08:00 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 08:00 jbond: upload python3.9 packages for buster (component python39)
  • 07:57 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 07:57 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 07:56 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:56 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:55 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:55 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:53 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:52 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36406 and previous config saved to /var/cache/conftool/dbconfig/20221026-075138-root.json
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36405 and previous config saved to /var/cache/conftool/dbconfig/20221026-075133-root.json
  • 07:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36404 and previous config saved to /var/cache/conftool/dbconfig/20221026-075126-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36403 and previous config saved to /var/cache/conftool/dbconfig/20221026-074642-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36402 and previous config saved to /var/cache/conftool/dbconfig/20221026-074634-root.json
  • 07:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36401 and previous config saved to /var/cache/conftool/dbconfig/20221026-074627-root.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36400 and previous config saved to /var/cache/conftool/dbconfig/20221026-073633-root.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36399 and previous config saved to /var/cache/conftool/dbconfig/20221026-073628-root.json
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36398 and previous config saved to /var/cache/conftool/dbconfig/20221026-073621-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36397 and previous config saved to /var/cache/conftool/dbconfig/20221026-073137-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36396 and previous config saved to /var/cache/conftool/dbconfig/20221026-073129-root.json
  • 07:31 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36395 and previous config saved to /var/cache/conftool/dbconfig/20221026-073122-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36394 and previous config saved to /var/cache/conftool/dbconfig/20221026-072128-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36393 and previous config saved to /var/cache/conftool/dbconfig/20221026-072123-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36392 and previous config saved to /var/cache/conftool/dbconfig/20221026-072116-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36391 and previous config saved to /var/cache/conftool/dbconfig/20221026-071632-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36390 and previous config saved to /var/cache/conftool/dbconfig/20221026-071624-root.json
  • 07:16 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36389 and previous config saved to /var/cache/conftool/dbconfig/20221026-071617-root.json
  • 07:10 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 07:09 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 07:09 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 07:08 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36388 and previous config saved to /var/cache/conftool/dbconfig/20221026-070623-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36387 and previous config saved to /var/cache/conftool/dbconfig/20221026-070618-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36386 and previous config saved to /var/cache/conftool/dbconfig/20221026-070611-root.json
  • 07:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 25091
  • 07:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25091
  • 07:04 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 25091
  • 07:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 25091
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36385 and previous config saved to /var/cache/conftool/dbconfig/20221026-070127-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36384 and previous config saved to /var/cache/conftool/dbconfig/20221026-070119-root.json
  • 07:01 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36383 and previous config saved to /var/cache/conftool/dbconfig/20221026-070112-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36382 and previous config saved to /var/cache/conftool/dbconfig/20221026-065118-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36381 and previous config saved to /var/cache/conftool/dbconfig/20221026-065113-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36380 and previous config saved to /var/cache/conftool/dbconfig/20221026-065106-root.json
  • 06:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 16509
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2029 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36379 and previous config saved to /var/cache/conftool/dbconfig/20221026-064622-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36378 and previous config saved to /var/cache/conftool/dbconfig/20221026-064614-root.json
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'es2031 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36377 and previous config saved to /var/cache/conftool/dbconfig/20221026-064607-root.json
  • 06:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 16509
  • 06:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 63949
  • 06:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 63949
  • 06:38 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36376 and previous config saved to /var/cache/conftool/dbconfig/20221026-063613-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36375 and previous config saved to /var/cache/conftool/dbconfig/20221026-063608-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36374 and previous config saved to /var/cache/conftool/dbconfig/20221026-063601-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2029 es2030 es2031', diff saved to https://phabricator.wikimedia.org/P36373 and previous config saved to /var/cache/conftool/dbconfig/20221026-063524-root.json
  • 06:34 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 06:33 _joe_: build2001:~# docker-registryctl delete-tags docker-registry.discovery.wmnet/httpd-fcgi:2.4.38-7 (to fix the uid issues)
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1031 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36372 and previous config saved to /var/cache/conftool/dbconfig/20221026-062108-root.json
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'es1030 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36371 and previous config saved to /var/cache/conftool/dbconfig/20221026-062103-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'es1029 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36370 and previous config saved to /var/cache/conftool/dbconfig/20221026-062056-root.json
  • 06:18 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es1029 es1030 es1031', diff saved to https://phabricator.wikimedia.org/P36369 and previous config saved to /var/cache/conftool/dbconfig/20221026-061044-root.json
  • 06:08 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply

2022-10-25

  • 22:33 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4052.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:32 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4051.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:30 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:29 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:28 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4040
  • 22:28 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4040
  • 22:24 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4009.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:24 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4010.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:24 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:20 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs4010.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:19 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs4009.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:18 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:18 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:18 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4042.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:17 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4042.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:17 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:17 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4050.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:17 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4048.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:16 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4046.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:04 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4050.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:04 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4048.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:03 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4046.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 22:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T321312)', diff saved to https://phabricator.wikimedia.org/P36368 and previous config saved to /var/cache/conftool/dbconfig/20221025-220249-ladsgroup.json
  • 22:01 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:01 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4051
  • 22:00 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4051
  • 21:59 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:57 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4044.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:51 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp4035.ulsfo.wmnet
  • 21:51 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:50 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:50 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P36367 and previous config saved to /var/cache/conftool/dbconfig/20221025-214743-ladsgroup.json
  • 21:47 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs4010
  • 21:46 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:46 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs4010
  • 21:46 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs4009
  • 21:45 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs4009
  • 21:43 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4044.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:43 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4042.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:42 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4035.ulsfo.wmnet
  • 21:41 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:38 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp4033.ulsfo.wmnet
  • 21:38 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:37 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:37 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:34 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P36366 and previous config saved to /var/cache/conftool/dbconfig/20221025-213236-ladsgroup.json
  • 21:29 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4033.ulsfo.wmnet
  • 21:28 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4042.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:21 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 21:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T321312)', diff saved to https://phabricator.wikimedia.org/P36365 and previous config saved to /var/cache/conftool/dbconfig/20221025-211730-ladsgroup.json
  • 21:15 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4038.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:12 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp4033.ulsfo.wmnet
  • 21:12 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:12 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp4035.ulsfo.wmnet
  • 21:12 robh@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T321312)', diff saved to https://phabricator.wikimedia.org/P36364 and previous config saved to /var/cache/conftool/dbconfig/20221025-211125-ladsgroup.json
  • 21:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 21:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T321312)', diff saved to https://phabricator.wikimedia.org/P36363 and previous config saved to /var/cache/conftool/dbconfig/20221025-211058-ladsgroup.json
  • 21:08 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:08 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:06 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:05 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4040.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:03 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4035.ulsfo.wmnet
  • 21:03 volans@cumin2002: START - Cookbook sre.hosts.provision for host cp4038.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:03 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4033.ulsfo.wmnet
  • 21:03 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:01 cjming: end of UTC late backport window
  • 20:59 cjming@deploy1002: Finished scap: Backport for Revert tagline of zhwiki (cont.) (duration: 04m 49s)
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T321312)', diff saved to https://phabricator.wikimedia.org/P36362 and previous config saved to /var/cache/conftool/dbconfig/20221025-205902-ladsgroup.json
  • 20:59 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P36361 and previous config saved to /var/cache/conftool/dbconfig/20221025-205551-ladsgroup.json
  • 20:55 cjming@deploy1002: cjming and stang: Backport for Revert tagline of zhwiki (cont.) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:55 cjming@deploy1002: Started scap: Backport for Revert tagline of zhwiki (cont.)
  • 20:53 cjming@deploy1002: Finished scap: Backport for Revert tagline of zhwiki (duration: 09m 11s)
  • 20:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4052
  • 20:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4052
  • 20:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4050
  • 20:48 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4050
  • 20:48 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 20:44 cjming@deploy1002: cjming and stang: Backport for Revert tagline of zhwiki synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:44 cjming@deploy1002: Started scap: Backport for Revert tagline of zhwiki
  • 20:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P36360 and previous config saved to /var/cache/conftool/dbconfig/20221025-204356-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P36359 and previous config saved to /var/cache/conftool/dbconfig/20221025-204045-ladsgroup.json
  • 20:37 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cp4038.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:30 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:29 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:29 volans@cumin2002: START - Cookbook sre.hosts.provision for host cp4038.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P36358 and previous config saved to /var/cache/conftool/dbconfig/20221025-202849-ladsgroup.json
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:28 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4038.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:27 cjming@deploy1002: Finished scap: Backport for Update remaining Wikipedia logos (T319223) (duration: 06m 48s)
  • 20:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T321312)', diff saved to https://phabricator.wikimedia.org/P36357 and previous config saved to /var/cache/conftool/dbconfig/20221025-202538-ladsgroup.json
  • 20:24 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4034.ulsfo.wmnet
  • 20:24 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:23 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4036.ulsfo.wmnet
  • 20:23 robh@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:21 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 20:21 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 20:20 cjming@deploy1002: cjming and jdlrobson: Backport for Update remaining Wikipedia logos (T319223) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:20 cjming@deploy1002: Started scap: Backport for Update remaining Wikipedia logos (T319223)
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T321312)', diff saved to https://phabricator.wikimedia.org/P36356 and previous config saved to /var/cache/conftool/dbconfig/20221025-201918-ladsgroup.json
  • 20:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 20:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 20:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T321312)', diff saved to https://phabricator.wikimedia.org/P36355 and previous config saved to /var/cache/conftool/dbconfig/20221025-201852-ladsgroup.json
  • 20:15 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4036.ulsfo.wmnet
  • 20:14 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4034.ulsfo.wmnet
  • 20:14 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts cp4034.ulsfo.wmnet
  • 20:13 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4034.ulsfo.wmnet
  • 20:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T321312)', diff saved to https://phabricator.wikimedia.org/P36354 and previous config saved to /var/cache/conftool/dbconfig/20221025-201343-ladsgroup.json
  • 20:11 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4048
  • 20:11 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4048
  • 20:11 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4046
  • 20:11 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4046
  • 20:11 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4044
  • 20:10 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4044
  • 20:10 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4042
  • 20:10 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4042
  • 20:10 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4040
  • 20:10 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4040
  • 20:09 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T321312)', diff saved to https://phabricator.wikimedia.org/P36353 and previous config saved to /var/cache/conftool/dbconfig/20221025-200746-ladsgroup.json
  • 20:07 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 20:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 20:07 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 20:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T321312)', diff saved to https://phabricator.wikimedia.org/P36352 and previous config saved to /var/cache/conftool/dbconfig/20221025-200723-ladsgroup.json
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P36351 and previous config saved to /var/cache/conftool/dbconfig/20221025-200344-ladsgroup.json
  • 20:00 volans@cumin2002: START - Cookbook sre.hosts.provision for host cp4038.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:57 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4038
  • 19:56 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4038
  • 19:56 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:54 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 19:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P36350 and previous config saved to /var/cache/conftool/dbconfig/20221025-195216-ladsgroup.json
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P36349 and previous config saved to /var/cache/conftool/dbconfig/20221025-194838-ladsgroup.json
  • 19:39 cwhite: logstash opensearch 2.2.0 codfw transition complete T304440
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P36348 and previous config saved to /var/cache/conftool/dbconfig/20221025-193709-ladsgroup.json
  • 19:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T321312)', diff saved to https://phabricator.wikimedia.org/P36347 and previous config saved to /var/cache/conftool/dbconfig/20221025-193331-ladsgroup.json
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T321312)', diff saved to https://phabricator.wikimedia.org/P36345 and previous config saved to /var/cache/conftool/dbconfig/20221025-192556-ladsgroup.json
  • 19:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 19:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T321312)', diff saved to https://phabricator.wikimedia.org/P36344 and previous config saved to /var/cache/conftool/dbconfig/20221025-192526-ladsgroup.json
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T321312)', diff saved to https://phabricator.wikimedia.org/P36343 and previous config saved to /var/cache/conftool/dbconfig/20221025-192203-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T321312)', diff saved to https://phabricator.wikimedia.org/P36342 and previous config saved to /var/cache/conftool/dbconfig/20221025-191552-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T321312)', diff saved to https://phabricator.wikimedia.org/P36341 and previous config saved to /var/cache/conftool/dbconfig/20221025-191527-ladsgroup.json
  • 19:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P36340 and previous config saved to /var/cache/conftool/dbconfig/20221025-191020-ladsgroup.json
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P36339 and previous config saved to /var/cache/conftool/dbconfig/20221025-190021-ladsgroup.json
  • 18:59 inflatador: bking@elastic2070 'restarting elastic7 services to apply 838141'
  • 18:57 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4032.ulsfo.wmnet
  • 18:57 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:56 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P36338 and previous config saved to /var/cache/conftool/dbconfig/20221025-185513-ladsgroup.json
  • 18:53 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4028.ulsfo.wmnet
  • 18:53 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:53 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4030.ulsfo.wmnet
  • 18:52 robh@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:50 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:50 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:49 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4032.ulsfo.wmnet
  • 18:49 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4026.ulsfo.wmnet
  • 18:49 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:46 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:45 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4030.ulsfo.wmnet
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P36337 and previous config saved to /var/cache/conftool/dbconfig/20221025-184514-ladsgroup.json
  • 18:44 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4028.ulsfo.wmnet
  • 18:41 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4026.ulsfo.wmnet
  • 18:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T321312)', diff saved to https://phabricator.wikimedia.org/P36336 and previous config saved to /var/cache/conftool/dbconfig/20221025-184006-ladsgroup.json
  • 18:38 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4024.ulsfo.wmnet
  • 18:38 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:37 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4022.ulsfo.wmnet
  • 18:37 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:37 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:34 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T321312)', diff saved to https://phabricator.wikimedia.org/P36335 and previous config saved to /var/cache/conftool/dbconfig/20221025-183224-ladsgroup.json
  • 18:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 18:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 18:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T321312)', diff saved to https://phabricator.wikimedia.org/P36334 and previous config saved to /var/cache/conftool/dbconfig/20221025-183158-ladsgroup.json
  • 18:31 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4024.ulsfo.wmnet
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T321312)', diff saved to https://phabricator.wikimedia.org/P36333 and previous config saved to /var/cache/conftool/dbconfig/20221025-183008-ladsgroup.json
  • 18:29 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4022.ulsfo.wmnet
  • 18:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T321312)', diff saved to https://phabricator.wikimedia.org/P36332 and previous config saved to /var/cache/conftool/dbconfig/20221025-182402-ladsgroup.json
  • 18:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 18:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T321312)', diff saved to https://phabricator.wikimedia.org/P36331 and previous config saved to /var/cache/conftool/dbconfig/20221025-182336-ladsgroup.json
  • 18:22 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4043.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:20 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4039.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:19 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4041.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P36330 and previous config saved to /var/cache/conftool/dbconfig/20221025-181652-ladsgroup.json
  • 18:09 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4043.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P36329 and previous config saved to /var/cache/conftool/dbconfig/20221025-180830-ladsgroup.json
  • 18:07 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4041.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:07 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4039.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:05 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4043
  • 18:05 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4043
  • 18:04 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4041
  • 18:04 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4041
  • 18:04 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4039
  • 18:04 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4039
  • 18:03 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P36328 and previous config saved to /var/cache/conftool/dbconfig/20221025-180145-ladsgroup.json
  • 18:00 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:58 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:58 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4023.ulsfo.wmnet
  • 17:58 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:57 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4025.ulsfo.wmnet
  • 17:57 robh@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P36327 and previous config saved to /var/cache/conftool/dbconfig/20221025-175323-ladsgroup.json
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T321312)', diff saved to https://phabricator.wikimedia.org/P36326 and previous config saved to /var/cache/conftool/dbconfig/20221025-174639-ladsgroup.json
  • 17:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T321312)', diff saved to https://phabricator.wikimedia.org/P36325 and previous config saved to /var/cache/conftool/dbconfig/20221025-174013-ladsgroup.json
  • 17:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 17:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T321312)', diff saved to https://phabricator.wikimedia.org/P36324 and previous config saved to /var/cache/conftool/dbconfig/20221025-173817-ladsgroup.json
  • 17:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T321312)', diff saved to https://phabricator.wikimedia.org/P36323 and previous config saved to /var/cache/conftool/dbconfig/20221025-172909-ladsgroup.json
  • 17:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T321312)', diff saved to https://phabricator.wikimedia.org/P36322 and previous config saved to /var/cache/conftool/dbconfig/20221025-172844-ladsgroup.json
  • 17:25 mforns@deploy1002: Finished deploy [analytics/refinery@d3b7785] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d3b7785] (duration: 01m 04s)
  • 17:23 mforns@deploy1002: Started deploy [analytics/refinery@d3b7785] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@d3b7785]
  • 17:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 17:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P36321 and previous config saved to /var/cache/conftool/dbconfig/20221025-171337-ladsgroup.json
  • 17:12 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 17:05 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:05 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:02 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 17:02 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99)
  • 17:02 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 16:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P36320 and previous config saved to /var/cache/conftool/dbconfig/20221025-165831-ladsgroup.json
  • 16:53 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4025.ulsfo.wmnet
  • 16:52 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4023.ulsfo.wmnet
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T321312)', diff saved to https://phabricator.wikimedia.org/P36319 and previous config saved to /var/cache/conftool/dbconfig/20221025-165028-ladsgroup.json
  • 16:47 papaul: disable interface et-1/0/2 on cr2-eaid to bounce fpc 1 pic0
  • 16:43 volans: uploaded python3-gjson 0.2.1 to apt.wikimedia.org bullseye-wikimedia
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T321312)', diff saved to https://phabricator.wikimedia.org/P36318 and previous config saved to /var/cache/conftool/dbconfig/20221025-164324-ladsgroup.json
  • 16:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T321312)', diff saved to https://phabricator.wikimedia.org/P36317 and previous config saved to /var/cache/conftool/dbconfig/20221025-163719-ladsgroup.json
  • 16:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P36316 and previous config saved to /var/cache/conftool/dbconfig/20221025-163522-ladsgroup.json
  • 16:33 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P36315 and previous config saved to /var/cache/conftool/dbconfig/20221025-162015-ladsgroup.json
  • 16:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wcqs2002.codfw.wmnet
  • 16:06 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for wcqs2002.codfw.wmnet
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T321312)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20221025-160504-ladsgroup.json
  • 16:04 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 16:04 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97)
  • 16:04 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T321312)', diff saved to https://phabricator.wikimedia.org/P36313 and previous config saved to /var/cache/conftool/dbconfig/20221025-155855-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T321312)', diff saved to https://phabricator.wikimedia.org/P36312 and previous config saved to /var/cache/conftool/dbconfig/20221025-155828-ladsgroup.json
  • 15:56 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0)
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P36311 and previous config saved to /var/cache/conftool/dbconfig/20221025-154321-ladsgroup.json
  • 15:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8399
  • 15:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 8399
  • 15:30 claime: added package otelcol-contrib_0.62.1_linux_amd64.deb to component thirdparty/otelcol-contrib for bullseye-wikimedia and buster-wikimedia - T320551
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P36310 and previous config saved to /var/cache/conftool/dbconfig/20221025-152815-ladsgroup.json
  • 15:25 claime: added component thirdparty/otelcol-contrib to apt repository
  • 15:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T321312)', diff saved to https://phabricator.wikimedia.org/P36309 and previous config saved to /var/cache/conftool/dbconfig/20221025-151308-ladsgroup.json
  • 15:11 moritzm: installing isc-dhcp security updates
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T321312)', diff saved to https://phabricator.wikimedia.org/P36308 and previous config saved to /var/cache/conftool/dbconfig/20221025-150653-ladsgroup.json
  • 15:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 15:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 15:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T321312)', diff saved to https://phabricator.wikimedia.org/P36307 and previous config saved to /var/cache/conftool/dbconfig/20221025-150626-ladsgroup.json
  • 14:54 sukhe: running authdns-update for depooling ulsfo: Gerrit 849105
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P36306 and previous config saved to /var/cache/conftool/dbconfig/20221025-145120-ladsgroup.json
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T321312)', diff saved to https://phabricator.wikimedia.org/P36305 and previous config saved to /var/cache/conftool/dbconfig/20221025-144651-ladsgroup.json
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P36304 and previous config saved to /var/cache/conftool/dbconfig/20221025-143613-ladsgroup.json
  • 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P36303 and previous config saved to /var/cache/conftool/dbconfig/20221025-143144-ladsgroup.json
  • 14:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb1016.eqiad.wmnet with reason: db1154 having hw issues
  • 14:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb1016.eqiad.wmnet with reason: db1154 having hw issues
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T321312)', diff saved to https://phabricator.wikimedia.org/P36301 and previous config saved to /var/cache/conftool/dbconfig/20221025-142106-ladsgroup.json
  • 14:18 hashar: Restarting CI Jenkins
  • 14:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P36300 and previous config saved to /var/cache/conftool/dbconfig/20221025-141638-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T321312)', diff saved to https://phabricator.wikimedia.org/P36299 and previous config saved to /var/cache/conftool/dbconfig/20221025-141440-ladsgroup.json
  • 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T321312)', diff saved to https://phabricator.wikimedia.org/P36298 and previous config saved to /var/cache/conftool/dbconfig/20221025-141358-ladsgroup.json
  • 14:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T321312)', diff saved to https://phabricator.wikimedia.org/P36297 and previous config saved to /var/cache/conftool/dbconfig/20221025-140131-ladsgroup.json
  • 13:59 XioNoX: test bouncing VC port on asw2-d-eqiad
  • 13:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P36296 and previous config saved to /var/cache/conftool/dbconfig/20221025-135852-ladsgroup.json
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1201 (T321312)', diff saved to https://phabricator.wikimedia.org/P36295 and previous config saved to /var/cache/conftool/dbconfig/20221025-135515-ladsgroup.json
  • 13:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 13:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 13:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T321312)', diff saved to https://phabricator.wikimedia.org/P36294 and previous config saved to /var/cache/conftool/dbconfig/20221025-135451-ladsgroup.json
  • 13:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1020-1021].eqiad.wmnet with reason: db1154 having hw issues
  • 13:53 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1020-1021].eqiad.wmnet with reason: db1154 having hw issues
  • 13:53 jgiannelos@deploy1002: Finished deploy [restbase/deploy@5575605]: Update restbase to c1d391c7 (duration: 18m 14s)
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P36293 and previous config saved to /var/cache/conftool/dbconfig/20221025-134345-ladsgroup.json
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P36292 and previous config saved to /var/cache/conftool/dbconfig/20221025-133944-ladsgroup.json
  • 13:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Maintenance
  • 13:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Maintenance
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of ml-etcd1001.eqiad.wmnet to plain
  • 13:35 jgiannelos@deploy1002: Started deploy [restbase/deploy@5575605]: Update restbase to c1d391c7
  • 13:34 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to plain
  • 13:34 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:33 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Move wmgSiteLogoVariants to logos.php (T308620 T321519) (duration: 05m 47s)
  • 13:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 13:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Maintenance
  • 13:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T321312)', diff saved to https://phabricator.wikimedia.org/P36291 and previous config saved to /var/cache/conftool/dbconfig/20221025-132839-ladsgroup.json
  • 13:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:27 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for Move wmgSiteLogoVariants to logos.php (T308620 T321519) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:27 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Move wmgSiteLogoVariants to logos.php (T308620 T321519)
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P36290 and previous config saved to /var/cache/conftool/dbconfig/20221025-132438-ladsgroup.json
  • 13:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 13:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of ml-etcd1001.eqiad.wmnet to drbd
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T321312)', diff saved to https://phabricator.wikimedia.org/P36287 and previous config saved to /var/cache/conftool/dbconfig/20221025-130931-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P36286 and previous config saved to /var/cache/conftool/dbconfig/20221025-130628-ladsgroup.json
  • 13:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1187 (T321312)', diff saved to https://phabricator.wikimedia.org/P36285 and previous config saved to /var/cache/conftool/dbconfig/20221025-130314-ladsgroup.json
  • 13:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T321312)', diff saved to https://phabricator.wikimedia.org/P36284 and previous config saved to /var/cache/conftool/dbconfig/20221025-130249-ladsgroup.json
  • 12:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1023.eqiad.wmnet with reason: host reimage
  • 12:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1023.eqiad.wmnet with reason: host reimage
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P36283 and previous config saved to /var/cache/conftool/dbconfig/20221025-125122-ladsgroup.json
  • 12:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P36281 and previous config saved to /var/cache/conftool/dbconfig/20221025-124743-ladsgroup.json
  • 12:39 moritzm: drain ganeti1015 for eventual reimage T311687
  • 12:38 hashar: Restarting CI Jenkins
  • 12:38 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1023.eqiad.wmnet with OS bullseye
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T321312)', diff saved to https://phabricator.wikimedia.org/P36280 and previous config saved to /var/cache/conftool/dbconfig/20221025-123615-ladsgroup.json
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1023.eqiad.wmnet with reason: Remove from cluster for eventual reimage
  • 12:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1023.eqiad.wmnet with reason: Remove from cluster for eventual reimage
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P36279 and previous config saved to /var/cache/conftool/dbconfig/20221025-123236-ladsgroup.json
  • 12:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T321312)', diff saved to https://phabricator.wikimedia.org/P36278 and previous config saved to /var/cache/conftool/dbconfig/20221025-123001-ladsgroup.json
  • 12:29 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 12:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 12:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 12:23 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2519
  • 12:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2519
  • 12:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T321312)', diff saved to https://phabricator.wikimedia.org/P36277 and previous config saved to /var/cache/conftool/dbconfig/20221025-122015-ladsgroup.json
  • 12:19 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 12:19 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 12:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T321312)', diff saved to https://phabricator.wikimedia.org/P36276 and previous config saved to /var/cache/conftool/dbconfig/20221025-121730-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1180 (T321312)', diff saved to https://phabricator.wikimedia.org/P36275 and previous config saved to /var/cache/conftool/dbconfig/20221025-121111-ladsgroup.json
  • 12:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T321312)', diff saved to https://phabricator.wikimedia.org/P36274 and previous config saved to /var/cache/conftool/dbconfig/20221025-121047-ladsgroup.json
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti4002.ulsfo.wmnet
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P36273 and previous config saved to /var/cache/conftool/dbconfig/20221025-120509-ladsgroup.json
  • 12:00 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 11:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 132203
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P36272 and previous config saved to /var/cache/conftool/dbconfig/20221025-115540-ladsgroup.json
  • 11:55 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4002.ulsfo.wmnet
  • 11:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 132203
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P36271 and previous config saved to /var/cache/conftool/dbconfig/20221025-115002-ladsgroup.json
  • 11:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti4002.ulsfo.wmnet with reason: Remove from cluster for eventual decom
  • 11:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti4002.ulsfo.wmnet with reason: Remove from cluster for eventual decom
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P36270 and previous config saved to /var/cache/conftool/dbconfig/20221025-114034-ladsgroup.json
  • 11:38 jbond@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp2002.wikimedia.org
  • 11:37 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test2002.wikimedia.org
  • 11:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T321312)', diff saved to https://phabricator.wikimedia.org/P36269 and previous config saved to /var/cache/conftool/dbconfig/20221025-113455-ladsgroup.json
  • 11:34 jbond@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp2002.wikimedia.org
  • 11:33 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test2002.wikimedia.org
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2180 (T321312)', diff saved to https://phabricator.wikimedia.org/P36268 and previous config saved to /var/cache/conftool/dbconfig/20221025-112848-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 11:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36267 and previous config saved to /var/cache/conftool/dbconfig/20221025-112822-ladsgroup.json
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T321312)', diff saved to https://phabricator.wikimedia.org/P36266 and previous config saved to /var/cache/conftool/dbconfig/20221025-112527-ladsgroup.json
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1168 (T321312)', diff saved to https://phabricator.wikimedia.org/P36265 and previous config saved to /var/cache/conftool/dbconfig/20221025-111930-ladsgroup.json
  • 11:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T321312)', diff saved to https://phabricator.wikimedia.org/P36264 and previous config saved to /var/cache/conftool/dbconfig/20221025-111906-ladsgroup.json
  • 11:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P36263 and previous config saved to /var/cache/conftool/dbconfig/20221025-111316-ladsgroup.json
  • 11:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P36262 and previous config saved to /var/cache/conftool/dbconfig/20221025-110359-ladsgroup.json
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316', diff saved to https://phabricator.wikimedia.org/P36261 and previous config saved to /var/cache/conftool/dbconfig/20221025-105810-ladsgroup.json
  • 10:53 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P36260 and previous config saved to /var/cache/conftool/dbconfig/20221025-104852-ladsgroup.json
  • 10:43 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36259 and previous config saved to /var/cache/conftool/dbconfig/20221025-104303-ladsgroup.json
  • 10:41 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
  • 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36258 and previous config saved to /var/cache/conftool/dbconfig/20221025-104047-ladsgroup.json
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T321312)', diff saved to https://phabricator.wikimedia.org/P36257 and previous config saved to /var/cache/conftool/dbconfig/20221025-103346-ladsgroup.json
  • 10:32 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
  • 10:31 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1165 (T321312)', diff saved to https://phabricator.wikimedia.org/P36256 and previous config saved to /var/cache/conftool/dbconfig/20221025-102724-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T321312)', diff saved to https://phabricator.wikimedia.org/P36255 and previous config saved to /var/cache/conftool/dbconfig/20221025-102642-ladsgroup.json
  • 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P36254 and previous config saved to /var/cache/conftool/dbconfig/20221025-102540-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P36253 and previous config saved to /var/cache/conftool/dbconfig/20221025-101135-ladsgroup.json
  • 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P36252 and previous config saved to /var/cache/conftool/dbconfig/20221025-101034-ladsgroup.json
  • 09:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P36251 and previous config saved to /var/cache/conftool/dbconfig/20221025-095629-ladsgroup.json
  • 09:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36250 and previous config saved to /var/cache/conftool/dbconfig/20221025-095527-ladsgroup.json
  • 09:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36249 and previous config saved to /var/cache/conftool/dbconfig/20221025-094921-ladsgroup.json
  • 09:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36248 and previous config saved to /var/cache/conftool/dbconfig/20221025-094800-ladsgroup.json
  • 09:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 09:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T321312)', diff saved to https://phabricator.wikimedia.org/P36247 and previous config saved to /var/cache/conftool/dbconfig/20221025-094733-ladsgroup.json
  • 09:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T321312)', diff saved to https://phabricator.wikimedia.org/P36246 and previous config saved to /var/cache/conftool/dbconfig/20221025-094122-ladsgroup.json
  • 09:36 moritzm: drain ganeti4002 for eventual decom T317247
  • 09:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T321312)', diff saved to https://phabricator.wikimedia.org/P36245 and previous config saved to /var/cache/conftool/dbconfig/20221025-093513-ladsgroup.json
  • 09:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 09:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 09:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36244 and previous config saved to /var/cache/conftool/dbconfig/20221025-093449-ladsgroup.json
  • 09:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P36243 and previous config saved to /var/cache/conftool/dbconfig/20221025-093226-ladsgroup.json
  • 09:31 vgutierrez: restart pybal on lvs5003
  • 09:27 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
  • 09:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P36241 and previous config saved to /var/cache/conftool/dbconfig/20221025-091942-ladsgroup.json
  • 09:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P36240 and previous config saved to /var/cache/conftool/dbconfig/20221025-091720-ladsgroup.json
  • 09:14 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
  • 09:14 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
  • 09:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316', diff saved to https://phabricator.wikimedia.org/P36239 and previous config saved to /var/cache/conftool/dbconfig/20221025-090436-ladsgroup.json
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T321312)', diff saved to https://phabricator.wikimedia.org/P36238 and previous config saved to /var/cache/conftool/dbconfig/20221025-090213-ladsgroup.json
  • 08:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
  • 08:57 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
  • 08:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2158 (T321312)', diff saved to https://phabricator.wikimedia.org/P36237 and previous config saved to /var/cache/conftool/dbconfig/20221025-085558-ladsgroup.json
  • 08:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 08:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 08:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 08:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T321312)', diff saved to https://phabricator.wikimedia.org/P36236 and previous config saved to /var/cache/conftool/dbconfig/20221025-085527-ladsgroup.json
  • 08:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36235 and previous config saved to /var/cache/conftool/dbconfig/20221025-084929-ladsgroup.json
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36234 and previous config saved to /var/cache/conftool/dbconfig/20221025-084713-ladsgroup.json
  • 08:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36233 and previous config saved to /var/cache/conftool/dbconfig/20221025-084541-root.json
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P36232 and previous config saved to /var/cache/conftool/dbconfig/20221025-084020-ladsgroup.json
  • 08:36 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P36230 and previous config saved to /var/cache/conftool/dbconfig/20221025-083206-ladsgroup.json
  • 08:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36229 and previous config saved to /var/cache/conftool/dbconfig/20221025-083034-root.json
  • 08:29 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be1065.eqiad.wmnet
  • 08:26 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4005.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:26 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4005.ulsfo.wmnet to cluster ulsfo and group 1
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P36228 and previous config saved to /var/cache/conftool/dbconfig/20221025-082514-ladsgroup.json
  • 08:20 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
  • 08:19 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
  • 08:18 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:17 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
  • 08:17 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:17 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P36227 and previous config saved to /var/cache/conftool/dbconfig/20221025-081700-ladsgroup.json
  • 08:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36226 and previous config saved to /var/cache/conftool/dbconfig/20221025-081529-root.json
  • 08:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:12 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.7 refs T320512
  • 08:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
  • 08:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T321312)', diff saved to https://phabricator.wikimedia.org/P36225 and previous config saved to /var/cache/conftool/dbconfig/20221025-081007-ladsgroup.json
  • 08:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2129 (T321312)', diff saved to https://phabricator.wikimedia.org/P36224 and previous config saved to /var/cache/conftool/dbconfig/20221025-080238-ladsgroup.json
  • 08:02 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 08:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 08:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T321312)', diff saved to https://phabricator.wikimedia.org/P36223 and previous config saved to /var/cache/conftool/dbconfig/20221025-080212-ladsgroup.json
  • 08:02 moritzm: drain ganeti1023 for eventual reimage T311687
  • 08:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36222 and previous config saved to /var/cache/conftool/dbconfig/20221025-080153-ladsgroup.json
  • 08:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36221 and previous config saved to /var/cache/conftool/dbconfig/20221025-080024-root.json
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36220 and previous config saved to /var/cache/conftool/dbconfig/20221025-075657-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36219 and previous config saved to /var/cache/conftool/dbconfig/20221025-075638-ladsgroup.json
  • 07:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36218 and previous config saved to /var/cache/conftool/dbconfig/20221025-075613-ladsgroup.json
  • 07:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P36217 and previous config saved to /var/cache/conftool/dbconfig/20221025-074705-ladsgroup.json
  • 07:45 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36216 and previous config saved to /var/cache/conftool/dbconfig/20221025-074519-root.json
  • 07:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P36215 and previous config saved to /var/cache/conftool/dbconfig/20221025-074106-ladsgroup.json
  • 07:38 moritzm: installing 5.10.149-2 update on bullseye hosts (regression doesn't concern any of our servers, but still makes sense to have further reboots move to the latest kernel)
  • 07:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P36213 and previous config saved to /var/cache/conftool/dbconfig/20221025-073159-ladsgroup.json
  • 07:31 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4005.ulsfo.wmnet to cluster ulsfo and group 1
  • 07:31 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4005.ulsfo.wmnet to cluster ulsfo and group 1
  • 07:30 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36212 and previous config saved to /var/cache/conftool/dbconfig/20221025-073014-root.json
  • 07:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
  • 07:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316', diff saved to https://phabricator.wikimedia.org/P36211 and previous config saved to /var/cache/conftool/dbconfig/20221025-072600-ladsgroup.json
  • 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
  • 07:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T321312)', diff saved to https://phabricator.wikimedia.org/P36210 and previous config saved to /var/cache/conftool/dbconfig/20221025-071652-ladsgroup.json
  • 07:15 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36209 and previous config saved to /var/cache/conftool/dbconfig/20221025-071509-root.json
  • 07:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36208 and previous config saved to /var/cache/conftool/dbconfig/20221025-071053-ladsgroup.json
  • 07:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2124 (T321312)', diff saved to https://phabricator.wikimedia.org/P36207 and previous config saved to /var/cache/conftool/dbconfig/20221025-070922-ladsgroup.json
  • 07:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 07:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T321312)', diff saved to https://phabricator.wikimedia.org/P36206 and previous config saved to /var/cache/conftool/dbconfig/20221025-070856-ladsgroup.json
  • 07:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36205 and previous config saved to /var/cache/conftool/dbconfig/20221025-070837-ladsgroup.json
  • 07:00 marostegui@cumin1001: dbctl commit (dc=all): 'db1202 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P36204 and previous config saved to /var/cache/conftool/dbconfig/20221025-070004-root.json
  • 06:58 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 06:58 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 06:57 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 06:56 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P36203 and previous config saved to /var/cache/conftool/dbconfig/20221025-065350-ladsgroup.json
  • 06:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P36202 and previous config saved to /var/cache/conftool/dbconfig/20221025-065330-ladsgroup.json
  • 06:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117', diff saved to https://phabricator.wikimedia.org/P36201 and previous config saved to /var/cache/conftool/dbconfig/20221025-063843-ladsgroup.json
  • 06:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 7795
  • 06:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P36200 and previous config saved to /var/cache/conftool/dbconfig/20221025-063824-ladsgroup.json
  • 06:36 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 7795
  • 06:34 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bruno Scarone out of all services on: 1206 hosts
  • 06:33 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bruno Scarone out of all services on: 1206 hosts
  • 06:33 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bruno Scarone out of all services on: 799 hosts
  • 06:32 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bruno Scarone out of all services on: 799 hosts
  • 06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2117 (T321312)', diff saved to https://phabricator.wikimedia.org/P36199 and previous config saved to /var/cache/conftool/dbconfig/20221025-062337-ladsgroup.json
  • 06:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36198 and previous config saved to /var/cache/conftool/dbconfig/20221025-062318-ladsgroup.json
  • 06:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36196 and previous config saved to /var/cache/conftool/dbconfig/20221025-061710-ladsgroup.json
  • 06:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2117 (T321312)', diff saved to https://phabricator.wikimedia.org/P36195 and previous config saved to /var/cache/conftool/dbconfig/20221025-061621-ladsgroup.json
  • 06:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 06:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2117.codfw.wmnet with reason: Maintenance
  • 06:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P36194 and previous config saved to /var/cache/conftool/dbconfig/20221025-061552-ladsgroup.json
  • 06:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 06:15 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 06:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 06:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1160 T321177', diff saved to https://phabricator.wikimedia.org/P36193 and previous config saved to /var/cache/conftool/dbconfig/20221025-060643-ladsgroup.json
  • 06:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Promote db1138 to s4 primary and set section read-write T321177', diff saved to https://phabricator.wikimedia.org/P36192 and previous config saved to /var/cache/conftool/dbconfig/20221025-060118-ladsgroup.json
  • 06:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T321177', diff saved to https://phabricator.wikimedia.org/P36191 and previous config saved to /var/cache/conftool/dbconfig/20221025-060043-ladsgroup.json
  • 06:00 Amir1: Starting s4 eqiad failover from db1160 to db1138 - T321177
  • 05:56 _joe_: restarting pybal again on lvs1020, again for testing
  • 05:25 _joe_: restarting pybal on lvs1020 to test cookbook mechanism
  • 05:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set db1138 with weight 0 T321177', diff saved to https://phabricator.wikimedia.org/P36190 and previous config saved to /var/cache/conftool/dbconfig/20221025-051933-ladsgroup.json
  • 05:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s4 T321177
  • 05:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 34 hosts with reason: Primary switchover s4 T321177
  • 04:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T321312)', diff saved to https://phabricator.wikimedia.org/P36189 and previous config saved to /var/cache/conftool/dbconfig/20221025-041558-ladsgroup.json
  • 04:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 04:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 04:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 04:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P36188 and previous config saved to /var/cache/conftool/dbconfig/20221025-040052-ladsgroup.json
  • 03:59 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:48 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:48 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P36187 and previous config saved to /var/cache/conftool/dbconfig/20221025-034546-ladsgroup.json
  • 03:41 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:40 mwpresync@deploy1002: Pruned MediaWiki: 1.40.0-wmf.5 (duration: 01m 56s)
  • 03:38 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.7 refs T320512 (duration: 35m 54s)
  • 03:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T321312)', diff saved to https://phabricator.wikimedia.org/P36186 and previous config saved to /var/cache/conftool/dbconfig/20221025-033039-ladsgroup.json
  • 03:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T321312)', diff saved to https://phabricator.wikimedia.org/P36185 and previous config saved to /var/cache/conftool/dbconfig/20221025-032316-ladsgroup.json
  • 03:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 03:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 03:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T321312)', diff saved to https://phabricator.wikimedia.org/P36184 and previous config saved to /var/cache/conftool/dbconfig/20221025-032252-ladsgroup.json
  • 03:11 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P36183 and previous config saved to /var/cache/conftool/dbconfig/20221025-030745-ladsgroup.json
  • 03:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.7 refs T320512
  • 02:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P36182 and previous config saved to /var/cache/conftool/dbconfig/20221025-025239-ladsgroup.json
  • 02:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T321312)', diff saved to https://phabricator.wikimedia.org/P36181 and previous config saved to /var/cache/conftool/dbconfig/20221025-024709-ladsgroup.json
  • 02:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T321312)', diff saved to https://phabricator.wikimedia.org/P36180 and previous config saved to /var/cache/conftool/dbconfig/20221025-023733-ladsgroup.json
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P36179 and previous config saved to /var/cache/conftool/dbconfig/20221025-023203-ladsgroup.json
  • 02:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1191 (T321312)', diff saved to https://phabricator.wikimedia.org/P36178 and previous config saved to /var/cache/conftool/dbconfig/20221025-023120-ladsgroup.json
  • 02:31 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 02:31 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 02:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T321312)', diff saved to https://phabricator.wikimedia.org/P36177 and previous config saved to /var/cache/conftool/dbconfig/20221025-023056-ladsgroup.json
  • 02:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P36176 and previous config saved to /var/cache/conftool/dbconfig/20221025-021656-ladsgroup.json
  • 02:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P36175 and previous config saved to /var/cache/conftool/dbconfig/20221025-021550-ladsgroup.json
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T321312)', diff saved to https://phabricator.wikimedia.org/P36174 and previous config saved to /var/cache/conftool/dbconfig/20221025-020150-ladsgroup.json
  • 02:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P36173 and previous config saved to /var/cache/conftool/dbconfig/20221025-020043-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T321312)', diff saved to https://phabricator.wikimedia.org/P36172 and previous config saved to /var/cache/conftool/dbconfig/20221025-015528-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36171 and previous config saved to /var/cache/conftool/dbconfig/20221025-015502-ladsgroup.json
  • 01:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T321312)', diff saved to https://phabricator.wikimedia.org/P36170 and previous config saved to /var/cache/conftool/dbconfig/20221025-014536-ladsgroup.json
  • 01:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P36169 and previous config saved to /var/cache/conftool/dbconfig/20221025-013956-ladsgroup.json
  • 01:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T321312)', diff saved to https://phabricator.wikimedia.org/P36168 and previous config saved to /var/cache/conftool/dbconfig/20221025-013917-ladsgroup.json
  • 01:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 01:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 01:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36167 and previous config saved to /var/cache/conftool/dbconfig/20221025-013852-ladsgroup.json
  • 01:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P36166 and previous config saved to /var/cache/conftool/dbconfig/20221025-012449-ladsgroup.json
  • 01:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P36165 and previous config saved to /var/cache/conftool/dbconfig/20221025-012345-ladsgroup.json
  • 01:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36164 and previous config saved to /var/cache/conftool/dbconfig/20221025-010943-ladsgroup.json
  • 01:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317', diff saved to https://phabricator.wikimedia.org/P36163 and previous config saved to /var/cache/conftool/dbconfig/20221025-010839-ladsgroup.json
  • 01:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36162 and previous config saved to /var/cache/conftool/dbconfig/20221025-010225-ladsgroup.json
  • 00:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36161 and previous config saved to /var/cache/conftool/dbconfig/20221025-005332-ladsgroup.json
  • 00:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P36160 and previous config saved to /var/cache/conftool/dbconfig/20221025-004817-ladsgroup.json
  • 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P36159 and previous config saved to /var/cache/conftool/dbconfig/20221025-004718-ladsgroup.json
  • 00:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312', diff saved to https://phabricator.wikimedia.org/P36158 and previous config saved to /var/cache/conftool/dbconfig/20221025-003310-ladsgroup.json
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3316', diff saved to https://phabricator.wikimedia.org/P36157 and previous config saved to /var/cache/conftool/dbconfig/20221025-003211-ladsgroup.json
  • 00:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P36151 and previous config saved to /var/cache/conftool/dbconfig/20221025-000257-ladsgroup.json

2022-10-24

  • 23:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36150 and previous config saved to /var/cache/conftool/dbconfig/20221024-235804-ladsgroup.json
  • 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P36149 and previous config saved to /var/cache/conftool/dbconfig/20221024-235645-ladsgroup.json
  • 23:56 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:56 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T321312)', diff saved to https://phabricator.wikimedia.org/P36148 and previous config saved to /var/cache/conftool/dbconfig/20221024-235618-ladsgroup.json
  • 23:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P36147 and previous config saved to /var/cache/conftool/dbconfig/20221024-235357-ladsgroup.json
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P36146 and previous config saved to /var/cache/conftool/dbconfig/20221024-234111-ladsgroup.json
  • 23:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P36145 and previous config saved to /var/cache/conftool/dbconfig/20221024-233849-ladsgroup.json
  • 23:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P36144 and previous config saved to /var/cache/conftool/dbconfig/20221024-232604-ladsgroup.json
  • 23:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T321312)', diff saved to https://phabricator.wikimedia.org/P36143 and previous config saved to /var/cache/conftool/dbconfig/20221024-232343-ladsgroup.json
  • 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T321312)', diff saved to https://phabricator.wikimedia.org/P36142 and previous config saved to /var/cache/conftool/dbconfig/20221024-231721-ladsgroup.json
  • 23:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 23:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T321312)', diff saved to https://phabricator.wikimedia.org/P36141 and previous config saved to /var/cache/conftool/dbconfig/20221024-231629-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T321312)', diff saved to https://phabricator.wikimedia.org/P36140 and previous config saved to /var/cache/conftool/dbconfig/20221024-231058-ladsgroup.json
  • 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T321312)', diff saved to https://phabricator.wikimedia.org/P36139 and previous config saved to /var/cache/conftool/dbconfig/20221024-230446-ladsgroup.json
  • 23:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 23:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 23:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 23:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 23:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T321312)', diff saved to https://phabricator.wikimedia.org/P36138 and previous config saved to /var/cache/conftool/dbconfig/20221024-230405-ladsgroup.json
  • 23:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P36137 and previous config saved to /var/cache/conftool/dbconfig/20221024-230122-ladsgroup.json
  • 23:00 TimStarling: on mwmaint1002 running renameInvalidUsernames.php for T292552
  • 22:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P36136 and previous config saved to /var/cache/conftool/dbconfig/20221024-224858-ladsgroup.json
  • 22:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P36135 and previous config saved to /var/cache/conftool/dbconfig/20221024-224616-ladsgroup.json
  • 22:35 cstone: civicrm upgraded from 89a46665 to 4cb2d91e
  • 22:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P36134 and previous config saved to /var/cache/conftool/dbconfig/20221024-223352-ladsgroup.json
  • 22:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T321312)', diff saved to https://phabricator.wikimedia.org/P36133 and previous config saved to /var/cache/conftool/dbconfig/20221024-223109-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T321312)', diff saved to https://phabricator.wikimedia.org/P36131 and previous config saved to /var/cache/conftool/dbconfig/20221024-222444-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 22:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T321312)', diff saved to https://phabricator.wikimedia.org/P36130 and previous config saved to /var/cache/conftool/dbconfig/20221024-222418-ladsgroup.json
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T321312)', diff saved to https://phabricator.wikimedia.org/P36129 and previous config saved to /var/cache/conftool/dbconfig/20221024-221845-ladsgroup.json
  • 22:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T321312)', diff saved to https://phabricator.wikimedia.org/P36128 and previous config saved to /var/cache/conftool/dbconfig/20221024-221227-ladsgroup.json
  • 22:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 22:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 22:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T321312)', diff saved to https://phabricator.wikimedia.org/P36127 and previous config saved to /var/cache/conftool/dbconfig/20221024-221203-ladsgroup.json
  • 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P36126 and previous config saved to /var/cache/conftool/dbconfig/20221024-220912-ladsgroup.json
  • 21:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P36125 and previous config saved to /var/cache/conftool/dbconfig/20221024-215657-ladsgroup.json
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P36124 and previous config saved to /var/cache/conftool/dbconfig/20221024-215405-ladsgroup.json
  • 21:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P36123 and previous config saved to /var/cache/conftool/dbconfig/20221024-214150-ladsgroup.json
  • 21:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T321312)', diff saved to https://phabricator.wikimedia.org/P36122 and previous config saved to /var/cache/conftool/dbconfig/20221024-213859-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T321312)', diff saved to https://phabricator.wikimedia.org/P36121 and previous config saved to /var/cache/conftool/dbconfig/20221024-213032-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T321312)', diff saved to https://phabricator.wikimedia.org/P36120 and previous config saved to /var/cache/conftool/dbconfig/20221024-213006-ladsgroup.json
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T321312)', diff saved to https://phabricator.wikimedia.org/P36119 and previous config saved to /var/cache/conftool/dbconfig/20221024-212644-ladsgroup.json
  • 21:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1001.wikimedia.org
  • 21:09 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudrabbit1003.wikimedia.org
  • 21:06 volans: uploaded python3-gjson_0.2.0 to apt.wikimedia.org bullseye-wikimedia
  • 21:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P36115 and previous config saved to /var/cache/conftool/dbconfig/20221024-210508-ladsgroup.json
  • 21:04 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1007.wikimedia.org
  • 21:02 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudrabbit1003.wikimedia.org
  • 21:01 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices1004.wikimedia.org
  • 20:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P36114 and previous config saved to /var/cache/conftool/dbconfig/20221024-205953-ladsgroup.json
  • 20:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P36113 and previous config saved to /var/cache/conftool/dbconfig/20221024-205002-ladsgroup.json
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1004.wikimedia.org
  • 20:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1007.wikimedia.org
  • 20:49 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:49 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:48 urbanecm: UTC late B&C window completed
  • 20:48 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:45 urbanecm@deploy1002: Finished scap: Backport for logos: Automate icon generation (T319223) (duration: 08m 49s)
  • 20:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T321312)', diff saved to https://phabricator.wikimedia.org/P36112 and previous config saved to /var/cache/conftool/dbconfig/20221024-204446-ladsgroup.json
  • 20:41 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1006.wikimedia.org
  • 20:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T321312)', diff saved to https://phabricator.wikimedia.org/P36111 and previous config saved to /var/cache/conftool/dbconfig/20221024-203713-ladsgroup.json
  • 20:37 urbanecm@deploy1002: urbanecm and stang: Backport for logos: Automate icon generation (T319223) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 20:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 20:36 urbanecm@deploy1002: Started scap: Backport for logos: Automate icon generation (T319223)
  • 20:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T321312)', diff saved to https://phabricator.wikimedia.org/P36110 and previous config saved to /var/cache/conftool/dbconfig/20221024-203647-ladsgroup.json
  • 20:36 urbanecm@deploy1002: Finished scap: Backport for Add wmgSiteLogoVariants support to Chinese projects (T308620) (duration: 07m 02s)
  • 20:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36108 and previous config saved to /var/cache/conftool/dbconfig/20221024-203455-ladsgroup.json
  • 20:34 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudservices1005.wikimedia.org
  • 20:29 urbanecm@deploy1002: urbanecm and stang: Backport for Add wmgSiteLogoVariants support to Chinese projects (T308620) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:29 urbanecm@deploy1002: Started scap: Backport for Add wmgSiteLogoVariants support to Chinese projects (T308620)
  • 20:28 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1006.wikimedia.org
  • 20:28 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol1005.wikimedia.org
  • 20:27 urbanecm@deploy1002: Finished scap: Backport for Allow 'nofollow' on external links in Parsoid output (T321437), Retry without RESTBase when the page/revision seems to be missing (T315688) (duration: 06m 38s)
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36107 and previous config saved to /var/cache/conftool/dbconfig/20221024-202738-ladsgroup.json
  • 20:26 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices1005.wikimedia.org
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:25 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudbackup1002-dev.eqiad.wmnet
  • 20:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P36106 and previous config saved to /var/cache/conftool/dbconfig/20221024-202141-ladsgroup.json
  • 20:21 urbanecm@deploy1002: urbanecm and matmarex: Backport for Allow 'nofollow' on external links in Parsoid output (T321437), Retry without RESTBase when the page/revision seems to be missing (T315688) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:21 urbanecm@deploy1002: Started scap: Backport for Allow 'nofollow' on external links in Parsoid output (T321437), Retry without RESTBase when the page/revision seems to be missing (T315688)
  • 20:17 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.wikimedia.org
  • 20:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup1002-dev.eqiad.wmnet
  • 20:14 urbanecm@deploy1002: Finished scap: Backport for Promote several Wikipedias to desktop improvements group (T319012) (duration: 05m 53s)
  • 20:13 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1001-dev.eqiad.wmnet
  • 20:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P36105 and previous config saved to /var/cache/conftool/dbconfig/20221024-201232-ladsgroup.json
  • 20:09 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup1001-dev.eqiad.wmnet
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:09 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Promote several Wikipedias to desktop improvements group (T319012) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:08 urbanecm@deploy1002: Started scap: Backport for Promote several Wikipedias to desktop improvements group (T319012)
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:08 urbanecm@deploy1002: Finished scap: Backport for Unset some bad logos (duration: 06m 07s)
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P36104 and previous config saved to /var/cache/conftool/dbconfig/20221024-200634-ladsgroup.json
  • 20:02 urbanecm@deploy1002: urbanecm and jdlrobson: Backport for Unset some bad logos synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 20:01 urbanecm@deploy1002: Started scap: Backport for Unset some bad logos
  • 19:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316', diff saved to https://phabricator.wikimedia.org/P36103 and previous config saved to /var/cache/conftool/dbconfig/20221024-195725-ladsgroup.json
  • 19:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T321312)', diff saved to https://phabricator.wikimedia.org/P36102 and previous config saved to /var/cache/conftool/dbconfig/20221024-195128-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2120 (T321312)', diff saved to https://phabricator.wikimedia.org/P36101 and previous config saved to /var/cache/conftool/dbconfig/20221024-194452-ladsgroup.json
  • 19:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 19:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T321312)', diff saved to https://phabricator.wikimedia.org/P36100 and previous config saved to /var/cache/conftool/dbconfig/20221024-194416-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36099 and previous config saved to /var/cache/conftool/dbconfig/20221024-194219-ladsgroup.json
  • 19:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P36098 and previous config saved to /var/cache/conftool/dbconfig/20221024-193610-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3316 (T321312)', diff saved to https://phabricator.wikimedia.org/P36097 and previous config saved to /var/cache/conftool/dbconfig/20221024-193447-ladsgroup.json
  • 19:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 19:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 19:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P36096 and previous config saved to /var/cache/conftool/dbconfig/20221024-192909-ladsgroup.json
  • 19:27 ladsgroup@deploy1002: Finished scap: Backport for Add 'class' to LBFactory callback config (duration: 05m 20s)
  • 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321312)', diff saved to https://phabricator.wikimedia.org/P36095 and previous config saved to /var/cache/conftool/dbconfig/20221024-192251-ladsgroup.json
  • 19:22 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Add 'class' to LBFactory callback config synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 19:22 ladsgroup@deploy1002: Started scap: Backport for Add 'class' to LBFactory callback config
  • 19:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P36094 and previous config saved to /var/cache/conftool/dbconfig/20221024-191403-ladsgroup.json
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P36093 and previous config saved to /var/cache/conftool/dbconfig/20221024-190745-ladsgroup.json
  • 19:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:02 mforns@deploy1002: Finished deploy [analytics/refinery@d3b7785] (thin): Regular analytics weekly train THIN [analytics/refinery@d3b7785] (duration: 00m 07s)
  • 19:02 mforns@deploy1002: Started deploy [analytics/refinery@d3b7785] (thin): Regular analytics weekly train THIN [analytics/refinery@d3b7785]
  • 19:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:00 ladsgroup@deploy1002: Finished scap: Backport for Avoid using DBLoadBalancerFactoryConfigBuilder mw service (T298485) (duration: 06m 55s)
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T321312)', diff saved to https://phabricator.wikimedia.org/P36092 and previous config saved to /var/cache/conftool/dbconfig/20221024-185856-ladsgroup.json
  • 18:55 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:53 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Avoid using DBLoadBalancerFactoryConfigBuilder mw service (T298485) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 18:53 ladsgroup@deploy1002: Started scap: Backport for Avoid using DBLoadBalancerFactoryConfigBuilder mw service (T298485)
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P36091 and previous config saved to /var/cache/conftool/dbconfig/20221024-185238-ladsgroup.json
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T321312)', diff saved to https://phabricator.wikimedia.org/P36090 and previous config saved to /var/cache/conftool/dbconfig/20221024-185230-ladsgroup.json
  • 18:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 18:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 18:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db1112', diff saved to https://phabricator.wikimedia.org/P36089 and previous config saved to /var/cache/conftool/dbconfig/20221024-184359-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1112', diff saved to https://phabricator.wikimedia.org/P36088 and previous config saved to /var/cache/conftool/dbconfig/20221024-184239-ladsgroup.json
  • 18:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T321312)', diff saved to https://phabricator.wikimedia.org/P36087 and previous config saved to /var/cache/conftool/dbconfig/20221024-183732-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1196 (T321312)', diff saved to https://phabricator.wikimedia.org/P36086 and previous config saved to /var/cache/conftool/dbconfig/20221024-183015-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 18:29 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 18:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321312)', diff saved to https://phabricator.wikimedia.org/P36085 and previous config saved to /var/cache/conftool/dbconfig/20221024-182951-ladsgroup.json
  • 18:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P36084 and previous config saved to /var/cache/conftool/dbconfig/20221024-181444-ladsgroup.json
  • 17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T321312)', diff saved to https://phabricator.wikimedia.org/P36082 and previous config saved to /var/cache/conftool/dbconfig/20221024-174431-ladsgroup.json
  • 17:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1186 (T321312)', diff saved to https://phabricator.wikimedia.org/P36081 and previous config saved to /var/cache/conftool/dbconfig/20221024-173812-ladsgroup.json
  • 17:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321312)', diff saved to https://phabricator.wikimedia.org/P36080 and previous config saved to /var/cache/conftool/dbconfig/20221024-173748-ladsgroup.json
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P36079 and previous config saved to /var/cache/conftool/dbconfig/20221024-172242-ladsgroup.json
  • 17:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P36078 and previous config saved to /var/cache/conftool/dbconfig/20221024-170735-ladsgroup.json
  • 16:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T321312)', diff saved to https://phabricator.wikimedia.org/P36077 and previous config saved to /var/cache/conftool/dbconfig/20221024-165229-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1184 (T321312)', diff saved to https://phabricator.wikimedia.org/P36076 and previous config saved to /var/cache/conftool/dbconfig/20221024-164510-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321312)', diff saved to https://phabricator.wikimedia.org/P36075 and previous config saved to /var/cache/conftool/dbconfig/20221024-164446-ladsgroup.json
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P36074 and previous config saved to /var/cache/conftool/dbconfig/20221024-162939-ladsgroup.json
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321312)', diff saved to https://phabricator.wikimedia.org/P36073 and previous config saved to /var/cache/conftool/dbconfig/20221024-162035-ladsgroup.json
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P36072 and previous config saved to /var/cache/conftool/dbconfig/20221024-161432-ladsgroup.json
  • 16:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P36071 and previous config saved to /var/cache/conftool/dbconfig/20221024-160528-ladsgroup.json
  • 16:03 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2068.codfw.wmnet
  • 15:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T321312)', diff saved to https://phabricator.wikimedia.org/P36070 and previous config saved to /var/cache/conftool/dbconfig/20221024-155926-ladsgroup.json
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1169 (T321312)', diff saved to https://phabricator.wikimedia.org/P36069 and previous config saved to /var/cache/conftool/dbconfig/20221024-155313-ladsgroup.json
  • 15:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 15:52 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
  • 15:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
  • 15:51 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P36068 and previous config saved to /var/cache/conftool/dbconfig/20221024-155022-ladsgroup.json
  • 15:50 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
  • 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1140.eqiad.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1139.eqiad.wmnet with reason: Maintenance
  • 15:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321312)', diff saved to https://phabricator.wikimedia.org/P36067 and previous config saved to /var/cache/conftool/dbconfig/20221024-154543-ladsgroup.json
  • 15:43 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 15:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
  • 15:42 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
  • 15:41 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
  • 15:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T321312)', diff saved to https://phabricator.wikimedia.org/P36066 and previous config saved to /var/cache/conftool/dbconfig/20221024-153515-ladsgroup.json
  • 15:34 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
  • 15:32 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
  • 15:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P36065 and previous config saved to /var/cache/conftool/dbconfig/20221024-153037-ladsgroup.json
  • 15:29 mforns@deploy1002: Finished deploy [airflow-dags/analytics@62b4181]: (no justification provided) (duration: 00m 11s)
  • 15:29 mforns@deploy1002: Started deploy [airflow-dags/analytics@62b4181]: (no justification provided)
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2176 (T321312)', diff saved to https://phabricator.wikimedia.org/P36064 and previous config saved to /var/cache/conftool/dbconfig/20221024-152856-ladsgroup.json
  • 15:28 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321312)', diff saved to https://phabricator.wikimedia.org/P36063 and previous config saved to /var/cache/conftool/dbconfig/20221024-152830-ladsgroup.json
  • 15:27 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
  • 15:26 XioNoX: drain eqiad-esams transport
  • 15:23 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
  • 15:23 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:22 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 15:19 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 15:18 mforns@deploy1002: Finished deploy [analytics/refinery@d3b7785] (thin): Regular analytics weekly train THIN [analytics/refinery@d3b7785] (duration: 00m 09s)
  • 15:18 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
  • 15:18 mforns@deploy1002: Started deploy [analytics/refinery@d3b7785] (thin): Regular analytics weekly train THIN [analytics/refinery@d3b7785]
  • 15:18 mforns@deploy1002: Finished deploy [analytics/refinery@d3b7785]: Regular analytics weekly train [analytics/refinery@d3b7785] (duration: 05m 34s)
  • 15:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135', diff saved to https://phabricator.wikimedia.org/P36062 and previous config saved to /var/cache/conftool/dbconfig/20221024-151530-ladsgroup.json
  • 15:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P36061 and previous config saved to /var/cache/conftool/dbconfig/20221024-151324-ladsgroup.json
  • 15:12 mforns@deploy1002: Started deploy [analytics/refinery@d3b7785]: Regular analytics weekly train [analytics/refinery@d3b7785]
  • 15:10 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
  • 15:09 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
  • 15:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1135 (T321312)', diff saved to https://phabricator.wikimedia.org/P36060 and previous config saved to /var/cache/conftool/dbconfig/20221024-150024-ladsgroup.json
  • 14:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P36059 and previous config saved to /var/cache/conftool/dbconfig/20221024-145817-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1135 (T321312)', diff saved to https://phabricator.wikimedia.org/P36058 and previous config saved to /var/cache/conftool/dbconfig/20221024-145511-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1135.eqiad.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321312)', diff saved to https://phabricator.wikimedia.org/P36057 and previous config saved to /var/cache/conftool/dbconfig/20221024-145436-ladsgroup.json
  • 14:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T321312)', diff saved to https://phabricator.wikimedia.org/P36056 and previous config saved to /var/cache/conftool/dbconfig/20221024-144311-ladsgroup.json
  • 14:42 XioNoX: drain NTT on cr1-eqiad
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P36055 and previous config saved to /var/cache/conftool/dbconfig/20221024-143930-ladsgroup.json
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2174 (T321312)', diff saved to https://phabricator.wikimedia.org/P36054 and previous config saved to /var/cache/conftool/dbconfig/20221024-143650-ladsgroup.json
  • 14:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 14:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321312)', diff saved to https://phabricator.wikimedia.org/P36053 and previous config saved to /var/cache/conftool/dbconfig/20221024-143625-ladsgroup.json
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134', diff saved to https://phabricator.wikimedia.org/P36052 and previous config saved to /var/cache/conftool/dbconfig/20221024-142423-ladsgroup.json
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P36051 and previous config saved to /var/cache/conftool/dbconfig/20221024-142118-ladsgroup.json
  • 14:16 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:15 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1134 (T321312)', diff saved to https://phabricator.wikimedia.org/P36050 and previous config saved to /var/cache/conftool/dbconfig/20221024-140917-ladsgroup.json
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P36049 and previous config saved to /var/cache/conftool/dbconfig/20221024-140612-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1134 (T321312)', diff saved to https://phabricator.wikimedia.org/P36048 and previous config saved to /var/cache/conftool/dbconfig/20221024-140404-ladsgroup.json
  • 14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1134.eqiad.wmnet with reason: Maintenance
  • 13:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
  • 13:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 13:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1133.eqiad.wmnet with reason: Maintenance
  • 13:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321312)', diff saved to https://phabricator.wikimedia.org/P36047 and previous config saved to /var/cache/conftool/dbconfig/20221024-135557-ladsgroup.json
  • 13:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:54 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:53 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for plwikimedia: Enable VisualEditor by default (T321308) (duration: 07m 25s)
  • 13:53 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
  • 13:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T321312)', diff saved to https://phabricator.wikimedia.org/P36046 and previous config saved to /var/cache/conftool/dbconfig/20221024-135105-ladsgroup.json
  • 13:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:48 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
  • 13:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
  • 13:46 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and stang: Backport for plwikimedia: Enable VisualEditor by default (T321308) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:46 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for plwikimedia: Enable VisualEditor by default (T321308)
  • 13:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2173 (T321312)', diff saved to https://phabricator.wikimedia.org/P36045 and previous config saved to /var/cache/conftool/dbconfig/20221024-134437-ladsgroup.json
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 13:44 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
  • 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P36044 and previous config saved to /var/cache/conftool/dbconfig/20221024-134356-ladsgroup.json
  • 13:42 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
  • 13:42 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:41 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 13:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128', diff saved to https://phabricator.wikimedia.org/P36043 and previous config saved to /var/cache/conftool/dbconfig/20221024-134050-ladsgroup.json
  • 13:20 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:16 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:16 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P36040 and previous config saved to /var/cache/conftool/dbconfig/20221024-131343-ladsgroup.json
  • 13:12 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
  • 13:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1128 (T321312)', diff saved to https://phabricator.wikimedia.org/P36039 and previous config saved to /var/cache/conftool/dbconfig/20221024-131037-ladsgroup.json
  • 13:07 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:06 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:06 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1128 (T321312)', diff saved to https://phabricator.wikimedia.org/P36038 and previous config saved to /var/cache/conftool/dbconfig/20221024-130413-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 13:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1128.eqiad.wmnet with reason: Maintenance
  • 13:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321312)', diff saved to https://phabricator.wikimedia.org/P36037 and previous config saved to /var/cache/conftool/dbconfig/20221024-130349-ladsgroup.json
  • 13:03 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
  • 13:01 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
  • 12:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P36036 and previous config saved to /var/cache/conftool/dbconfig/20221024-125836-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P36035 and previous config saved to /var/cache/conftool/dbconfig/20221024-125420-ladsgroup.json
  • 12:52 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
  • 12:52 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
  • 12:49 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin1001"
  • 12:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P36034 and previous config saved to /var/cache/conftool/dbconfig/20221024-124842-ladsgroup.json
  • 12:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
  • 12:48 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
  • 12:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P36033 and previous config saved to /var/cache/conftool/dbconfig/20221024-123913-ladsgroup.json
  • 12:35 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
  • 12:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
  • 12:34 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
  • 12:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
  • 12:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119', diff saved to https://phabricator.wikimedia.org/P36032 and previous config saved to /var/cache/conftool/dbconfig/20221024-123336-ladsgroup.json
  • 12:25 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
  • 12:24 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 12:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311', diff saved to https://phabricator.wikimedia.org/P36031 and previous config saved to /var/cache/conftool/dbconfig/20221024-122407-ladsgroup.json
  • 12:23 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
  • 12:22 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
  • 12:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1119 (T321312)', diff saved to https://phabricator.wikimedia.org/P36030 and previous config saved to /var/cache/conftool/dbconfig/20221024-121829-ladsgroup.json
  • 12:15 dcausse: restarting blazegraph on wdqs1005, wdqs1006, wdqs1012 and wdqs1016 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 12:14 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 12:13 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
  • 12:13 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be1054.eqiad.wmnet
  • 12:13 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1119 (T321312)', diff saved to https://phabricator.wikimedia.org/P36029 and previous config saved to /var/cache/conftool/dbconfig/20221024-121058-ladsgroup.json
  • 12:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1119.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T321312)', diff saved to https://phabricator.wikimedia.org/P36028 and previous config saved to /var/cache/conftool/dbconfig/20221024-121034-ladsgroup.json
  • 12:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P36027 and previous config saved to /var/cache/conftool/dbconfig/20221024-120900-ladsgroup.json
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P36025 and previous config saved to /var/cache/conftool/dbconfig/20221024-120153-ladsgroup.json
  • 12:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P36024 and previous config saved to /var/cache/conftool/dbconfig/20221024-120026-ladsgroup.json
  • 12:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321312)', diff saved to https://phabricator.wikimedia.org/P36023 and previous config saved to /var/cache/conftool/dbconfig/20221024-115959-ladsgroup.json
  • 11:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P36022 and previous config saved to /var/cache/conftool/dbconfig/20221024-115528-ladsgroup.json
  • 11:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
  • 11:54 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P36021 and previous config saved to /var/cache/conftool/dbconfig/20221024-114452-ladsgroup.json
  • 11:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118', diff saved to https://phabricator.wikimedia.org/P36020 and previous config saved to /var/cache/conftool/dbconfig/20221024-114022-ladsgroup.json
  • 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P36019 and previous config saved to /var/cache/conftool/dbconfig/20221024-112946-ladsgroup.json
  • 11:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1118 (T321312)', diff saved to https://phabricator.wikimedia.org/P36018 and previous config saved to /var/cache/conftool/dbconfig/20221024-112515-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1118 (T321312)', diff saved to https://phabricator.wikimedia.org/P36017 and previous config saved to /var/cache/conftool/dbconfig/20221024-111849-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1118.eqiad.wmnet with reason: Maintenance
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321312)', diff saved to https://phabricator.wikimedia.org/P36016 and previous config saved to /var/cache/conftool/dbconfig/20221024-111825-ladsgroup.json
  • 11:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db1189', diff saved to https://phabricator.wikimedia.org/P36015 and previous config saved to /var/cache/conftool/dbconfig/20221024-111813-ladsgroup.json
  • 11:17 ladsgroup@deploy1002: Finished scap: Backport for Enable LBFactory config callback in CLI (T298485) (duration: 08m 35s)
  • 11:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
  • 11:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T321312)', diff saved to https://phabricator.wikimedia.org/P36014 and previous config saved to /var/cache/conftool/dbconfig/20221024-111439-ladsgroup.json
  • 11:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1189', diff saved to https://phabricator.wikimedia.org/P36013 and previous config saved to /var/cache/conftool/dbconfig/20221024-111121-ladsgroup.json
  • 11:09 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Enable LBFactory config callback in CLI (T298485) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 11:09 ladsgroup@deploy1002: Started scap: Backport for Enable LBFactory config callback in CLI (T298485)
  • 11:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2153 (T321312)', diff saved to https://phabricator.wikimedia.org/P36012 and previous config saved to /var/cache/conftool/dbconfig/20221024-110822-ladsgroup.json
  • 11:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 11:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321312)', diff saved to https://phabricator.wikimedia.org/P36011 and previous config saved to /var/cache/conftool/dbconfig/20221024-110756-ladsgroup.json
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P36010 and previous config saved to /var/cache/conftool/dbconfig/20221024-110318-ladsgroup.json
  • 10:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
  • 10:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P36009 and previous config saved to /var/cache/conftool/dbconfig/20221024-105250-ladsgroup.json
  • 10:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107', diff saved to https://phabricator.wikimedia.org/P36008 and previous config saved to /var/cache/conftool/dbconfig/20221024-104812-ladsgroup.json
  • 10:47 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
  • 10:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
  • 10:43 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
  • 10:42 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
  • 10:42 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
  • 10:39 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
  • 10:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P36007 and previous config saved to /var/cache/conftool/dbconfig/20221024-103743-ladsgroup.json
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1107 (T321312)', diff saved to https://phabricator.wikimedia.org/P36006 and previous config saved to /var/cache/conftool/dbconfig/20221024-103305-ladsgroup.json
  • 10:32 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
  • 10:32 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
  • 10:30 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
  • 10:29 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1050.eqiad.wmnet
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1107 (T321312)', diff saved to https://phabricator.wikimedia.org/P36005 and previous config saved to /var/cache/conftool/dbconfig/20221024-102636-ladsgroup.json
  • 10:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1107.eqiad.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321312)', diff saved to https://phabricator.wikimedia.org/P36004 and previous config saved to /var/cache/conftool/dbconfig/20221024-102612-ladsgroup.json
  • 10:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T321312)', diff saved to https://phabricator.wikimedia.org/P36003 and previous config saved to /var/cache/conftool/dbconfig/20221024-102237-ladsgroup.json
  • 10:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1050.eqiad.wmnet
  • 10:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1049.eqiad.wmnet
  • 10:15 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
  • 10:15 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
  • 10:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2146 (T321312)', diff saved to https://phabricator.wikimedia.org/P36002 and previous config saved to /var/cache/conftool/dbconfig/20221024-101518-ladsgroup.json
  • 10:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321312)', diff saved to https://phabricator.wikimedia.org/P36001 and previous config saved to /var/cache/conftool/dbconfig/20221024-101453-ladsgroup.json
  • 10:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P36000 and previous config saved to /var/cache/conftool/dbconfig/20221024-101105-ladsgroup.json
  • 10:07 Emperor: upload wmf-beamer-style 0.3 to apt
  • 10:07 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:06 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 10:04 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes-staging,service=kubemaster
  • 10:01 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1049.eqiad.wmnet
  • 09:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P35999 and previous config saved to /var/cache/conftool/dbconfig/20221024-095946-ladsgroup.json
  • 09:58 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1048.eqiad.wmnet
  • 09:57 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
  • 09:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P35998 and previous config saved to /var/cache/conftool/dbconfig/20221024-095559-ladsgroup.json
  • 09:53 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
  • 09:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P35997 and previous config saved to /var/cache/conftool/dbconfig/20221024-094440-ladsgroup.json
  • 09:43 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1048.eqiad.wmnet
  • 09:41 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=kubemaster,name=kubemaster2002.codfw.wmnet
  • 09:41 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
  • 09:41 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2002.codfw.wmnet
  • 09:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T321312)', diff saved to https://phabricator.wikimedia.org/P35996 and previous config saved to /var/cache/conftool/dbconfig/20221024-094052-ladsgroup.json
  • 09:35 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
  • 09:35 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=kubemaster,name=kubemaster2002.codfw.wmnet
  • 09:34 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=kubemaster,name=kubemaster2001.codfw.wmnet
  • 09:34 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
  • 09:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T321312)', diff saved to https://phabricator.wikimedia.org/P35995 and previous config saved to /var/cache/conftool/dbconfig/20221024-092933-ladsgroup.json
  • 09:27 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
  • 09:27 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=kubemaster,name=kubemaster2001.codfw.wmnet
  • 09:25 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2016.codfw.wmnet
  • 09:23 oblivian@deploy1002: Finished scap: Backport for Stop assigning the PHP_ENGINE cookie (T271736) (duration: 04m 59s)
  • 09:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2145 (T321312)', diff saved to https://phabricator.wikimedia.org/P35994 and previous config saved to /var/cache/conftool/dbconfig/20221024-092310-ladsgroup.json
  • 09:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:20 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2016.codfw.wmnet
  • 09:19 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=kubesvc,name=kubernetes2016.codfw.wmnet
  • 09:19 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=kubesvc,name=kubernetes2015.codfw.wmnet
  • 09:18 oblivian@deploy1002: oblivian and oblivian: Backport for Stop assigning the PHP_ENGINE cookie (T271736) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 09:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2015.codfw.wmnet
  • 09:18 oblivian@deploy1002: Started scap: Backport for Stop assigning the PHP_ENGINE cookie (T271736)
  • 09:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321312)', diff saved to https://phabricator.wikimedia.org/P35993 and previous config saved to /var/cache/conftool/dbconfig/20221024-091801-ladsgroup.json
  • 09:17 claime: kubernetes2015:~$ sudo systemctl reset-failed ifup@ens13.service T273026
  • 09:15 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2051.codfw.wmnet
  • 09:12 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2015.codfw.wmnet
  • 09:11 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=codfw,cluster=kubernetes,service=kubesvc,name=kubernetes2015.codfw.wmnet
  • 09:10 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1047.eqiad.wmnet
  • 09:09 claime: Starting october reboots of lingering wikikube codfw hosts
  • 09:08 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,service=kubemaster,name=kubemaster1002.eqiad.wmnet
  • 09:08 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1002.eqiad.wmnet
  • 09:03 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1047.eqiad.wmnet
  • 09:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P35991 and previous config saved to /var/cache/conftool/dbconfig/20221024-090255-ladsgroup.json
  • 09:01 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
  • 09:01 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kubernetes,service=kubemaster,name=kubemaster1002.eqiad.wmnet
  • 09:01 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1046.eqiad.wmnet
  • 09:01 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
  • 09:00 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2050.codfw.wmnet
  • 09:00 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,service=kubemaster,name=kubemaster1001.eqiad.wmnet
  • 08:59 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1001.eqiad.wmnet
  • 08:53 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
  • 08:52 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kubernetes,service=kubemaster,name=kubemaster1001.eqiad.wmnet
  • 08:52 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2050.codfw.wmnet
  • 08:51 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2049.codfw.wmnet
  • 08:48 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1046.eqiad.wmnet
  • 08:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P35990 and previous config saved to /var/cache/conftool/dbconfig/20221024-084748-ladsgroup.json
  • 08:45 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1016.eqiad.wmnet
  • 08:44 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1016.eqiad.wmnet
  • 08:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 199524
  • 08:44 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1045.eqiad.wmnet
  • 08:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 199524
  • 08:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T321312)', diff saved to https://phabricator.wikimedia.org/P35989 and previous config saved to /var/cache/conftool/dbconfig/20221024-084037-ladsgroup.json
  • 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:40 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 08:39 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31042
  • 08:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P35988 and previous config saved to /var/cache/conftool/dbconfig/20221024-083955-ladsgroup.json
  • 08:39 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1016.eqiad.wmnet
  • 08:39 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31042
  • 08:38 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1016.eqiad.wmnet
  • 08:38 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1015.eqiad.wmnet
  • 08:37 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2049.codfw.wmnet
  • 08:37 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1015.eqiad.wmnet
  • 08:37 claime: kubernetes1015:~$ sudo systemctl reset-failed ifup@ens13.service T273026
  • 08:36 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1045.eqiad.wmnet
  • 08:35 Emperor: set thanos ring replicas to 3.50 T311690
  • 08:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T321312)', diff saved to https://phabricator.wikimedia.org/P35987 and previous config saved to /var/cache/conftool/dbconfig/20221024-083242-ladsgroup.json
  • 08:30 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1015.eqiad.wmnet
  • 08:29 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1015.eqiad.wmnet
  • 08:28 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1006.eqiad.wmnet
  • 08:28 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1006.eqiad.wmnet
  • 08:27 claime: kubernetes1006:~$ sudo systemctl reset-failed ifup@ens13.service T273026
  • 08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3303
  • 08:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2130 (T321312)', diff saved to https://phabricator.wikimedia.org/P35986 and previous config saved to /var/cache/conftool/dbconfig/20221024-082605-ladsgroup.json
  • 08:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 08:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T321312)', diff saved to https://phabricator.wikimedia.org/P35985 and previous config saved to /var/cache/conftool/dbconfig/20221024-082540-ladsgroup.json
  • 08:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P35984 and previous config saved to /var/cache/conftool/dbconfig/20221024-082448-ladsgroup.json
  • 08:22 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3303
  • 08:19 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes1006.eqiad.wmnet
  • 08:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
  • 08:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
  • 08:12 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8075
  • 08:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P35983 and previous config saved to /var/cache/conftool/dbconfig/20221024-081033-ladsgroup.json
  • 08:09 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8075
  • 08:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1105:3312', diff saved to https://phabricator.wikimedia.org/P35982 and previous config saved to /var/cache/conftool/dbconfig/20221024-080942-ladsgroup.json
  • 08:08 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 132337
  • 08:07 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 132337
  • 08:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32787
  • 08:05 ladsgroup@deploy1002: Finished scap: Backport for Enable source links on Translation ns on enwikisource and thwikisource (T53980) (duration: 09m 18s)
  • 08:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32787
  • 08:01 cgoubert@cumin1001: conftool action : set/pooled=no; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1006.eqiad.wmnet
  • 08:00 claime: Starting october reboots of lingering wikikube eqiad hosts

2022-10-22

  • 03:16 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 00:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1005.eqiad.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T321312)', diff saved to https://phabricator.wikimedia.org/P35965 and previous config saved to /var/cache/conftool/dbconfig/20221022-000300-ladsgroup.json

2022-10-21

  • 23:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P35964 and previous config saved to /var/cache/conftool/dbconfig/20221021-234754-ladsgroup.json
  • 23:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P35963 and previous config saved to /var/cache/conftool/dbconfig/20221021-233247-ladsgroup.json
  • 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T321312)', diff saved to https://phabricator.wikimedia.org/P35962 and previous config saved to /var/cache/conftool/dbconfig/20221021-231741-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1203 (T321312)', diff saved to https://phabricator.wikimedia.org/P35961 and previous config saved to /var/cache/conftool/dbconfig/20221021-231026-ladsgroup.json
  • 23:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 23:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 23:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T321312)', diff saved to https://phabricator.wikimedia.org/P35960 and previous config saved to /var/cache/conftool/dbconfig/20221021-231001-ladsgroup.json
  • 22:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P35959 and previous config saved to /var/cache/conftool/dbconfig/20221021-225455-ladsgroup.json
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P35958 and previous config saved to /var/cache/conftool/dbconfig/20221021-223948-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T321312)', diff saved to https://phabricator.wikimedia.org/P35957 and previous config saved to /var/cache/conftool/dbconfig/20221021-222442-ladsgroup.json
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1193 (T321312)', diff saved to https://phabricator.wikimedia.org/P35956 and previous config saved to /var/cache/conftool/dbconfig/20221021-221826-ladsgroup.json
  • 22:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 22:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T321312)', diff saved to https://phabricator.wikimedia.org/P35955 and previous config saved to /var/cache/conftool/dbconfig/20221021-221802-ladsgroup.json
  • 22:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P35954 and previous config saved to /var/cache/conftool/dbconfig/20221021-220256-ladsgroup.json
  • 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P35953 and previous config saved to /var/cache/conftool/dbconfig/20221021-214749-ladsgroup.json
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T321312)', diff saved to https://phabricator.wikimedia.org/P35952 and previous config saved to /var/cache/conftool/dbconfig/20221021-213242-ladsgroup.json
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1192 (T321312)', diff saved to https://phabricator.wikimedia.org/P35951 and previous config saved to /var/cache/conftool/dbconfig/20221021-212629-ladsgroup.json
  • 21:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 21:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 21:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T321312)', diff saved to https://phabricator.wikimedia.org/P35950 and previous config saved to /var/cache/conftool/dbconfig/20221021-212604-ladsgroup.json
  • 21:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P35949 and previous config saved to /var/cache/conftool/dbconfig/20221021-211058-ladsgroup.json
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P35948 and previous config saved to /var/cache/conftool/dbconfig/20221021-205551-ladsgroup.json
  • 20:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T321312)', diff saved to https://phabricator.wikimedia.org/P35947 and previous config saved to /var/cache/conftool/dbconfig/20221021-204045-ladsgroup.json
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1178 (T321312)', diff saved to https://phabricator.wikimedia.org/P35946 and previous config saved to /var/cache/conftool/dbconfig/20221021-203430-ladsgroup.json
  • 20:34 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 20:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T321312)', diff saved to https://phabricator.wikimedia.org/P35945 and previous config saved to /var/cache/conftool/dbconfig/20221021-203406-ladsgroup.json
  • 20:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T321312)', diff saved to https://phabricator.wikimedia.org/P35944 and previous config saved to /var/cache/conftool/dbconfig/20221021-202721-ladsgroup.json
  • 20:20 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: apply updates - bking@cumin2002 - T321310
  • 20:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P35943 and previous config saved to /var/cache/conftool/dbconfig/20221021-201900-ladsgroup.json
  • 20:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P35942 and previous config saved to /var/cache/conftool/dbconfig/20221021-201214-ladsgroup.json
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P35941 and previous config saved to /var/cache/conftool/dbconfig/20221021-200353-ladsgroup.json
  • 19:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P35940 and previous config saved to /var/cache/conftool/dbconfig/20221021-195708-ladsgroup.json
  • 19:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T321312)', diff saved to https://phabricator.wikimedia.org/P35939 and previous config saved to /var/cache/conftool/dbconfig/20221021-194847-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1177 (T321312)', diff saved to https://phabricator.wikimedia.org/P35938 and previous config saved to /var/cache/conftool/dbconfig/20221021-194234-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T321312)', diff saved to https://phabricator.wikimedia.org/P35937 and previous config saved to /var/cache/conftool/dbconfig/20221021-194210-ladsgroup.json
  • 19:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T321312)', diff saved to https://phabricator.wikimedia.org/P35936 and previous config saved to /var/cache/conftool/dbconfig/20221021-194201-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2181 (T321312)', diff saved to https://phabricator.wikimedia.org/P35935 and previous config saved to /var/cache/conftool/dbconfig/20221021-193550-ladsgroup.json
  • 19:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 19:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 19:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35934 and previous config saved to /var/cache/conftool/dbconfig/20221021-193524-ladsgroup.json
  • 19:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P35933 and previous config saved to /var/cache/conftool/dbconfig/20221021-192704-ladsgroup.json
  • 19:24 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1002-dev.eqiad.wmnet
  • 19:22 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6]: wmf-puppet-dashboard updates (duration: 02m 55s)
  • 19:21 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol2001-dev.wikimedia.org
  • 19:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P35932 and previous config saved to /var/cache/conftool/dbconfig/20221021-192016-ladsgroup.json
  • 19:20 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup1002-dev.eqiad.wmnet
  • 19:19 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6]: wmf-puppet-dashboard updates
  • 19:19 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudbackup1001-dev.eqiad.wmnet
  • 19:18 taavi@deploy1002: Finished deploy [horizon/deploy@9d02cd6] (dev): wmf-puppet-dashboard updates (duration: 01m 12s)
  • 19:17 taavi@deploy1002: Started deploy [horizon/deploy@9d02cd6] (dev): wmf-puppet-dashboard updates
  • 19:15 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudbackup1001-dev.eqiad.wmnet
  • 19:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P35931 and previous config saved to /var/cache/conftool/dbconfig/20221021-191157-ladsgroup.json
  • 19:10 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2001-dev.wikimedia.org
  • 19:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318', diff saved to https://phabricator.wikimedia.org/P35930 and previous config saved to /var/cache/conftool/dbconfig/20221021-190509-ladsgroup.json
  • 18:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol2004-dev.wikimedia.org
  • 18:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T321312)', diff saved to https://phabricator.wikimedia.org/P35929 and previous config saved to /var/cache/conftool/dbconfig/20221021-185651-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1172 (T321312)', diff saved to https://phabricator.wikimedia.org/P35928 and previous config saved to /var/cache/conftool/dbconfig/20221021-185032-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35927 and previous config saved to /var/cache/conftool/dbconfig/20221021-185003-ladsgroup.json
  • 18:49 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices2004-dev.wikimedia.org
  • 18:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P35926 and previous config saved to /var/cache/conftool/dbconfig/20221021-184747-ladsgroup.json
  • 18:46 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2004-dev.wikimedia.org
  • 18:46 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudcontrol2005-dev.wikimedia.org
  • 18:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T321312)', diff saved to https://phabricator.wikimedia.org/P35925 and previous config saved to /var/cache/conftool/dbconfig/20221021-184547-ladsgroup.json
  • 18:40 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet,service=varnish-fe
  • 18:40 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet,service=ats-tls
  • 18:40 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet,service=ats-be
  • 18:40 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4049.ulsfo.wmnet,service=varnish-fe
  • 18:40 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4049.ulsfo.wmnet,service=ats-tls
  • 18:40 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4049.ulsfo.wmnet,service=ats-be
  • 18:39 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet,service=varnish-fe
  • 18:39 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet,service=ats-tls
  • 18:39 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet,service=ats-be
  • 18:39 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4047.ulsfo.wmnet,service=varnish-fe
  • 18:39 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4047.ulsfo.wmnet,service=ats-tls
  • 18:39 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4047.ulsfo.wmnet,service=ats-be
  • 18:38 sukhe: pool new host cp4049: T317244
  • 18:38 sukhe: pool new host cp4047: T317244
  • 18:37 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices2004-dev.wikimedia.org
  • 18:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cloudservices2005-dev.wikimedia.org
  • 18:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P35924 and previous config saved to /var/cache/conftool/dbconfig/20221021-183241-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P35923 and previous config saved to /var/cache/conftool/dbconfig/20221021-183041-ladsgroup.json
  • 18:24 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
  • 18:24 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx1001.wikimedia.org with reason: New Kernel
  • 18:22 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2005-dev.wikimedia.org
  • 18:21 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudservices2005-dev.wikimedia.org
  • 18:19 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
  • 18:19 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mx2001.wikimedia.org with reason: New Kernel
  • 18:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P35922 and previous config saved to /var/cache/conftool/dbconfig/20221021-181734-ladsgroup.json
  • 18:15 jhathaway@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
  • 18:15 jhathaway@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on mirror1001.wikimedia.org with reason: New Kernel
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P35921 and previous config saved to /var/cache/conftool/dbconfig/20221021-181534-ladsgroup.json
  • 18:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P35920 and previous config saved to /var/cache/conftool/dbconfig/20221021-180228-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T321312)', diff saved to https://phabricator.wikimedia.org/P35919 and previous config saved to /var/cache/conftool/dbconfig/20221021-180028-ladsgroup.json
  • 17:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35918 and previous config saved to /var/cache/conftool/dbconfig/20221021-175615-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P35917 and previous config saved to /var/cache/conftool/dbconfig/20221021-175453-ladsgroup.json
  • 17:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35916 and previous config saved to /var/cache/conftool/dbconfig/20221021-175427-ladsgroup.json
  • 17:53 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2003-dev.codfw.wmnet with OS bullseye
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P35915 and previous config saved to /var/cache/conftool/dbconfig/20221021-173921-ladsgroup.json
  • 17:32 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: host reimage
  • 17:29 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
  • 17:26 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2003-dev.codfw.wmnet with reason: host reimage
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318', diff saved to https://phabricator.wikimedia.org/P35914 and previous config saved to /var/cache/conftool/dbconfig/20221021-172414-ladsgroup.json
  • 17:20 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: apply updates - bking@cumin2002 - T321310
  • 17:20 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply updates - bking@cumin2002 - T321310
  • 17:10 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2003-dev.codfw.wmnet with OS bullseye
  • 17:09 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-ctrl1002.eqiad.wmnet
  • 17:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35913 and previous config saved to /var/cache/conftool/dbconfig/20221021-170908-ladsgroup.json
  • 17:07 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
  • 17:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P35912 and previous config saved to /var/cache/conftool/dbconfig/20221021-170551-ladsgroup.json
  • 17:03 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2002-dev.codfw.wmnet with reason: host reimage
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1167 (T321312)', diff saved to https://phabricator.wikimedia.org/P35911 and previous config saved to /var/cache/conftool/dbconfig/20221021-170011-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126 (T321312)', diff saved to https://phabricator.wikimedia.org/P35910 and previous config saved to /var/cache/conftool/dbconfig/20221021-165930-ladsgroup.json
  • 16:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35909 and previous config saved to /var/cache/conftool/dbconfig/20221021-165045-ladsgroup.json
  • 16:46 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-ctrl1001.eqiad.wmnet
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P35908 and previous config saved to /var/cache/conftool/dbconfig/20221021-164424-ladsgroup.json
  • 16:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:42 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2002-dev.codfw.wmnet with OS bullseye
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311', diff saved to https://phabricator.wikimedia.org/P35907 and previous config saved to /var/cache/conftool/dbconfig/20221021-163538-ladsgroup.json
  • 16:31 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:29 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1126', diff saved to https://phabricator.wikimedia.org/P35906 and previous config saved to /var/cache/conftool/dbconfig/20221021-162917-ladsgroup.json
  • 16:29 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:27 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl1002.eqiad.wmnet on all recursors
  • 16:27 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl1002.eqiad.wmnet on all recursors
  • 16:27 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:27 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 16:24 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:23 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 16:23 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl1002.eqiad.wmnet
  • 16:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt2003-dev']
  • 16:22 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl1001.eqiad.wmnet on all recursors
  • 16:22 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl1001.eqiad.wmnet on all recursors
  • 16:22 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2167:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P35905 and previous config saved to /var/cache/conftool/dbconfig/20221021-162032-ladsgroup.json
  • 16:20 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 16:20 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host aux-k8s-ctrl1001.eqiad.wmnet
  • 16:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 16:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1116.eqiad.wmnet with reason: Maintenance
  • 16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T321312)', diff saved to https://phabricator.wikimedia.org/P35899 and previous config saved to /var/cache/conftool/dbconfig/20221021-160246-ladsgroup.json
  • 16:02 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2001-dev.codfw.wmnet with reason: host reimage
  • 15:59 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2002-dev']
  • 15:58 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1044.eqiad.wmnet
  • 15:58 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2048.codfw.wmnet
  • 15:57 andrew@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2001-dev.codfw.wmnet with reason: host reimage
  • 15:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P35898 and previous config saved to /var/cache/conftool/dbconfig/20221021-155616-ladsgroup.json
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P35897 and previous config saved to /var/cache/conftool/dbconfig/20221021-154740-ladsgroup.json
  • 15:46 cgoubert@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-worker-eqiad
  • 15:45 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
  • 15:41 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4047.ulsfo.wmnet with reason: host reimage
  • 15:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P35896 and previous config saved to /var/cache/conftool/dbconfig/20221021-154110-ladsgroup.json
  • 15:40 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 15:37 pt1979@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudvirt2001-dev']
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114', diff saved to https://phabricator.wikimedia.org/P35895 and previous config saved to /var/cache/conftool/dbconfig/20221021-153234-ladsgroup.json
  • 15:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T321312)', diff saved to https://phabricator.wikimedia.org/P35894 and previous config saved to /var/cache/conftool/dbconfig/20221021-152603-ladsgroup.json
  • 15:23 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt2001-dev']
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2166 (T321312)', diff saved to https://phabricator.wikimedia.org/P35893 and previous config saved to /var/cache/conftool/dbconfig/20221021-151945-ladsgroup.json
  • 15:19 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T321312)', diff saved to https://phabricator.wikimedia.org/P35892 and previous config saved to /var/cache/conftool/dbconfig/20221021-151920-ladsgroup.json
  • 15:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1114 (T321312)', diff saved to https://phabricator.wikimedia.org/P35890 and previous config saved to /var/cache/conftool/dbconfig/20221021-151727-ladsgroup.json
  • 15:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS buster
  • 15:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1114 (T321312)', diff saved to https://phabricator.wikimedia.org/P35889 and previous config saved to /var/cache/conftool/dbconfig/20221021-151104-ladsgroup.json
  • 15:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1114.eqiad.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T321312)', diff saved to https://phabricator.wikimedia.org/P35888 and previous config saved to /var/cache/conftool/dbconfig/20221021-151040-ladsgroup.json
  • 15:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4049.ulsfo.wmnet with OS buster
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P35887 and previous config saved to /var/cache/conftool/dbconfig/20221021-150413-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P35886 and previous config saved to /var/cache/conftool/dbconfig/20221021-145534-ladsgroup.json
  • 14:55 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1043.eqiad.wmnet
  • 14:54 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2047.codfw.wmnet
  • 14:49 btullis@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
  • 14:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P35885 and previous config saved to /var/cache/conftool/dbconfig/20221021-144907-ladsgroup.json
  • 14:48 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2047.codfw.wmnet
  • 14:47 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2046.codfw.wmnet
  • 14:47 bblack@cumin1001: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4047
  • 14:46 bblack@cumin1001: START - Cookbook sre.network.configure-switch-interfaces for host cp4047
  • 14:44 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply updates - bking@cumin2002 - T321310
  • 14:41 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
  • 14:40 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2046.codfw.wmnet
  • 14:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111', diff saved to https://phabricator.wikimedia.org/P35884 and previous config saved to /var/cache/conftool/dbconfig/20221021-144028-ladsgroup.json
  • 14:37 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4049.ulsfo.wmnet with reason: host reimage
  • 14:34 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2045.codfw.wmnet
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T321312)', diff saved to https://phabricator.wikimedia.org/P35883 and previous config saved to /var/cache/conftool/dbconfig/20221021-143400-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2164 (T321312)', diff saved to https://phabricator.wikimedia.org/P35882 and previous config saved to /var/cache/conftool/dbconfig/20221021-142742-ladsgroup.json
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T321312)', diff saved to https://phabricator.wikimedia.org/P35881 and previous config saved to /var/cache/conftool/dbconfig/20221021-142712-ladsgroup.json
  • 14:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1111 (T321312)', diff saved to https://phabricator.wikimedia.org/P35880 and previous config saved to /var/cache/conftool/dbconfig/20221021-142521-ladsgroup.json
  • 14:23 bblack@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS buster
  • 14:22 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2045.codfw.wmnet
  • 14:22 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1043.eqiad.wmnet
  • 14:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=varnish-fe
  • 14:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=ats-tls
  • 14:22 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet,service=ats-be
  • 14:22 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4037.ulsfo.wmnet,service=varnish-fe
  • 14:22 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4037.ulsfo.wmnet,service=ats-tls
  • 14:22 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4037.ulsfo.wmnet,service=ats-be
  • 14:22 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply updates - bking@cumin2002 - T321310
  • 14:21 sukhe: pool new host cp4037: T317244
  • 14:21 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2044.codfw.wmnet
  • 14:18 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1040.eqiad.wmnet
  • 14:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1111 (T321312)', diff saved to https://phabricator.wikimedia.org/P35879 and previous config saved to /var/cache/conftool/dbconfig/20221021-141815-ladsgroup.json
  • 14:18 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS buster
  • 14:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 14:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1111.eqiad.wmnet with reason: Maintenance
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T321312)', diff saved to https://phabricator.wikimedia.org/P35878 and previous config saved to /var/cache/conftool/dbconfig/20221021-141752-ladsgroup.json
  • 14:16 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 14:12 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4049.ulsfo.wmnet with OS buster
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P35877 and previous config saved to /var/cache/conftool/dbconfig/20221021-141206-ladsgroup.json
  • 14:11 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1040.eqiad.wmnet
  • 14:07 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2044.codfw.wmnet
  • 14:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P35876 and previous config saved to /var/cache/conftool/dbconfig/20221021-140245-ladsgroup.json
  • 14:00 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2043.codfw.wmnet
  • 14:00 mvernon@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be1042.eqiad.wmnet
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P35875 and previous config saved to /var/cache/conftool/dbconfig/20221021-135659-ladsgroup.json
  • 13:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104', diff saved to https://phabricator.wikimedia.org/P35874 and previous config saved to /var/cache/conftool/dbconfig/20221021-134737-ladsgroup.json
  • 13:45 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1042.eqiad.wmnet
  • 13:44 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2043.codfw.wmnet
  • 13:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T321312)', diff saved to https://phabricator.wikimedia.org/P35873 and previous config saved to /var/cache/conftool/dbconfig/20221021-134153-ladsgroup.json
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2163 (T321312)', diff saved to https://phabricator.wikimedia.org/P35872 and previous config saved to /var/cache/conftool/dbconfig/20221021-133534-ladsgroup.json
  • 13:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T321312)', diff saved to https://phabricator.wikimedia.org/P35871 and previous config saved to /var/cache/conftool/dbconfig/20221021-133509-ladsgroup.json
  • 13:34 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1041.eqiad.wmnet
  • 13:34 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes,service=kubesvc,name=kubernetes1005.eqiad.wmnet
  • 13:33 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2042.codfw.wmnet
  • 13:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1104 (T321312)', diff saved to https://phabricator.wikimedia.org/P35870 and previous config saved to /var/cache/conftool/dbconfig/20221021-133231-ladsgroup.json
  • 13:32 claime: kubernetes1005:~$ sudo systemctl reset-failed ifup@ens13.service T273026
  • 13:31 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply updates - bking@cumin2002 - T321310
  • 13:28 btullis@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
  • 13:27 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be1041.eqiad.wmnet
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1104 (T321312)', diff saved to https://phabricator.wikimedia.org/P35869 and previous config saved to /var/cache/conftool/dbconfig/20221021-132716-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 13:27 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2042.codfw.wmnet
  • 13:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1104.eqiad.wmnet with reason: Maintenance
  • 13:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35868 and previous config saved to /var/cache/conftool/dbconfig/20221021-132652-ladsgroup.json
  • 13:25 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2041.codfw.wmnet
  • 13:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P35867 and previous config saved to /var/cache/conftool/dbconfig/20221021-132003-ladsgroup.json
  • 13:19 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS buster
  • 13:18 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2041.codfw.wmnet
  • 13:17 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 13:16 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2040.codfw.wmnet
  • 13:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P35866 and previous config saved to /var/cache/conftool/dbconfig/20221021-131145-ladsgroup.json
  • 13:09 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-single for host ms-be2040.codfw.wmnet
  • 13:07 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS buster
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P35865 and previous config saved to /var/cache/conftool/dbconfig/20221021-130456-ladsgroup.json
  • 13:00 cgoubert@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-worker-codfw
  • 12:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318', diff saved to https://phabricator.wikimedia.org/P35864 and previous config saved to /var/cache/conftool/dbconfig/20221021-125639-ladsgroup.json
  • 12:55 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2024.codfw.wmnet
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T321312)', diff saved to https://phabricator.wikimedia.org/P35863 and previous config saved to /var/cache/conftool/dbconfig/20221021-124950-ladsgroup.json
  • 12:48 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2024.codfw.wmnet
  • 12:47 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2023.codfw.wmnet
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2162 (T321312)', diff saved to https://phabricator.wikimedia.org/P35862 and previous config saved to /var/cache/conftool/dbconfig/20221021-124327-ladsgroup.json
  • 12:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 12:43 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 12:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T321312)', diff saved to https://phabricator.wikimedia.org/P35861 and previous config saved to /var/cache/conftool/dbconfig/20221021-124302-ladsgroup.json
  • 12:42 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2023.codfw.wmnet
  • 12:42 dcausse: restarting blazegraph on wdqs1013 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 12:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35860 and previous config saved to /var/cache/conftool/dbconfig/20221021-124132-ladsgroup.json
  • 12:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P35859 and previous config saved to /var/cache/conftool/dbconfig/20221021-123815-ladsgroup.json
  • 12:35 claime: rebooted kubernetes2006.codfw.wmnet manually - root cause T273026
  • 12:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P35858 and previous config saved to /var/cache/conftool/dbconfig/20221021-122755-ladsgroup.json
  • 12:23 cgoubert@cumin1001: conftool action : set/pooled=yes; selector: dc=codfw,cluster=kubernetes,service=kubesvc,name=kubernetes2006.codfw.wmnet
  • 12:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P35857 and previous config saved to /var/cache/conftool/dbconfig/20221021-122308-ladsgroup.json
  • 12:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P35856 and previous config saved to /var/cache/conftool/dbconfig/20221021-121249-ladsgroup.json
  • 12:11 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 12:09 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes2006.codfw.wmnet
  • 12:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P35855 and previous config saved to /var/cache/conftool/dbconfig/20221021-120802-ladsgroup.json
  • 12:03 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubernetes2006.codfw.wmnet
  • 11:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T321312)', diff saved to https://phabricator.wikimedia.org/P35854 and previous config saved to /var/cache/conftool/dbconfig/20221021-115742-ladsgroup.json
  • 11:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P35853 and previous config saved to /var/cache/conftool/dbconfig/20221021-115255-ladsgroup.json
  • 11:52 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:51 Emperor: rolling reboot of codfw swift frontends re October reboots
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2161 (T321312)', diff saved to https://phabricator.wikimedia.org/P35852 and previous config saved to /var/cache/conftool/dbconfig/20221021-115128-ladsgroup.json
  • 11:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 11:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T321312)', diff saved to https://phabricator.wikimedia.org/P35851 and previous config saved to /var/cache/conftool/dbconfig/20221021-115103-ladsgroup.json
  • 11:48 mvernon@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-cluster (exit_code=0)
  • 11:47 klausman@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
  • 11:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35850 and previous config saved to /var/cache/conftool/dbconfig/20221021-114553-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T321312)', diff saved to https://phabricator.wikimedia.org/P35849 and previous config saved to /var/cache/conftool/dbconfig/20221021-114429-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35848 and previous config saved to /var/cache/conftool/dbconfig/20221021-114405-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P35847 and previous config saved to /var/cache/conftool/dbconfig/20221021-113556-ladsgroup.json
  • 11:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318', diff saved to https://phabricator.wikimedia.org/P35846 and previous config saved to /var/cache/conftool/dbconfig/20221021-112859-ladsgroup.json
  • 11:27 mvernon@cumin1001: START - Cookbook sre.hosts.reboot-cluster
  • 11:27 Emperor: rolling reboot of eqiad swift frontends re October reboots
  • 11:22 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
  • 11:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P35845 and previous config saved to /var/cache/conftool/dbconfig/20221021-112050-ladsgroup.json
  • 10:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35840 and previous config saved to /var/cache/conftool/dbconfig/20221021-105845-ladsgroup.json
  • 10:58 cgoubert@cumin1001: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2001.codfw.wmnet
  • 10:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P35839 and previous config saved to /var/cache/conftool/dbconfig/20221021-105529-ladsgroup.json
  • 10:49 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
  • 10:49 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
  • 10:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P35838 and previous config saved to /var/cache/conftool/dbconfig/20221021-104349-ladsgroup.json
  • 10:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P35837 and previous config saved to /var/cache/conftool/dbconfig/20221021-104022-ladsgroup.json
  • 10:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P35836 and previous config saved to /var/cache/conftool/dbconfig/20221021-102842-ladsgroup.json
  • 10:25 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 10:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311', diff saved to https://phabricator.wikimedia.org/P35835 and previous config saved to /var/cache/conftool/dbconfig/20221021-102516-ladsgroup.json
  • 10:24 jynus: restart of ms-backup hosts
  • 10:18 cgoubert@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
  • 10:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T321312)', diff saved to https://phabricator.wikimedia.org/P35834 and previous config saved to /var/cache/conftool/dbconfig/20221021-101336-ladsgroup.json
  • 10:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1099:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P35833 and previous config saved to /var/cache/conftool/dbconfig/20221021-101009-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2152 (T321312)', diff saved to https://phabricator.wikimedia.org/P35832 and previous config saved to /var/cache/conftool/dbconfig/20221021-100813-ladsgroup.json
  • 10:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 10:07 btullis@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:dse-k8s-worker
  • 10:07 btullis@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
  • 10:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 10:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 10:03 elukey@cumin1001: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-staging-worker
  • 10:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3318 (T321312)', diff saved to https://phabricator.wikimedia.org/P35831 and previous config saved to /var/cache/conftool/dbconfig/20221021-100305-ladsgroup.json
  • 10:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1099:3311 (T321312)', diff saved to https://phabricator.wikimedia.org/P35830 and previous config saved to /var/cache/conftool/dbconfig/20221021-100137-ladsgroup.json
  • 10:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 10:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1099.eqiad.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 10:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 09:56 cgoubert@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 09:55 btullis@cumin1001: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:dse-k8s-worker
  • 09:54 btullis@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
  • 09:54 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 09:46 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 09:43 klausman@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 09:18 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
  • 09:16 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
  • 09:14 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
  • 09:11 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
  • 09:10 jelto@cumin1001: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
  • 09:10 jynus: finished rolling restart of dbprov hosts
  • 09:09 klausman@cumin1001: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
  • 08:52 jynus: finished rolling restart of backup hosts
  • 08:47 klausman@cumin1001: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
  • 08:40 elukey@cumin1001: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
  • 07:37 jynus: start of rolling restart of backup hosts
  • 07:20 oblivian@deploy1002: Finished scap: Backport for Fix broken links (duration: 07m 11s)
  • 07:13 oblivian@deploy1002: oblivian and oblivian: Backport for Fix broken links synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 07:13 oblivian@deploy1002: Started scap: Backport for Fix broken links
  • 07:00 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36692
  • 06:58 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36692
  • 06:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T321312)', diff saved to https://phabricator.wikimedia.org/P35829 and previous config saved to /var/cache/conftool/dbconfig/20221021-062817-ladsgroup.json
  • 06:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P35828 and previous config saved to /var/cache/conftool/dbconfig/20221021-061311-ladsgroup.json
  • 05:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P35827 and previous config saved to /var/cache/conftool/dbconfig/20221021-055804-ladsgroup.json
  • 05:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T321312)', diff saved to https://phabricator.wikimedia.org/P35826 and previous config saved to /var/cache/conftool/dbconfig/20221021-054258-ladsgroup.json
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2179 (T321312)', diff saved to https://phabricator.wikimedia.org/P35825 and previous config saved to /var/cache/conftool/dbconfig/20221021-053636-ladsgroup.json
  • 05:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 05:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 05:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T321312)', diff saved to https://phabricator.wikimedia.org/P35824 and previous config saved to /var/cache/conftool/dbconfig/20221021-053611-ladsgroup.json
  • 05:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P35823 and previous config saved to /var/cache/conftool/dbconfig/20221021-052104-ladsgroup.json
  • 05:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P35822 and previous config saved to /var/cache/conftool/dbconfig/20221021-050558-ladsgroup.json
  • 04:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T321312)', diff saved to https://phabricator.wikimedia.org/P35821 and previous config saved to /var/cache/conftool/dbconfig/20221021-045051-ladsgroup.json
  • 04:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2172 (T321312)', diff saved to https://phabricator.wikimedia.org/P35820 and previous config saved to /var/cache/conftool/dbconfig/20221021-044433-ladsgroup.json
  • 04:44 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 04:44 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 04:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T321312)', diff saved to https://phabricator.wikimedia.org/P35819 and previous config saved to /var/cache/conftool/dbconfig/20221021-044407-ladsgroup.json
  • 04:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P35818 and previous config saved to /var/cache/conftool/dbconfig/20221021-042901-ladsgroup.json
  • 04:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P35817 and previous config saved to /var/cache/conftool/dbconfig/20221021-041354-ladsgroup.json
  • 03:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T321312)', diff saved to https://phabricator.wikimedia.org/P35816 and previous config saved to /var/cache/conftool/dbconfig/20221021-035848-ladsgroup.json
  • 03:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2155 (T321312)', diff saved to https://phabricator.wikimedia.org/P35815 and previous config saved to /var/cache/conftool/dbconfig/20221021-035120-ladsgroup.json
  • 03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 03:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 03:50 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 03:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T321312)', diff saved to https://phabricator.wikimedia.org/P35814 and previous config saved to /var/cache/conftool/dbconfig/20221021-035050-ladsgroup.json
  • 03:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P35813 and previous config saved to /var/cache/conftool/dbconfig/20221021-033544-ladsgroup.json
  • 03:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P35812 and previous config saved to /var/cache/conftool/dbconfig/20221021-032037-ladsgroup.json
  • 02:48 cstone: civicrm upgraded from 3e24d6f7 to 89a46665
  • 02:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P35808 and previous config saved to /var/cache/conftool/dbconfig/20221021-024303-ladsgroup.json
  • 02:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314', diff saved to https://phabricator.wikimedia.org/P35807 and previous config saved to /var/cache/conftool/dbconfig/20221021-022757-ladsgroup.json
  • 02:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35806 and previous config saved to /var/cache/conftool/dbconfig/20221021-021250-ladsgroup.json
  • 02:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P35805 and previous config saved to /var/cache/conftool/dbconfig/20221021-020733-ladsgroup.json
  • 01:55 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T321312)', diff saved to https://phabricator.wikimedia.org/P35804 and previous config saved to /var/cache/conftool/dbconfig/20221021-015503-ladsgroup.json
  • 01:52 ryankemper@cumin1001: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T321310
  • 01:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P35803 and previous config saved to /var/cache/conftool/dbconfig/20221021-015226-ladsgroup.json
  • 01:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P35802 and previous config saved to /var/cache/conftool/dbconfig/20221021-013957-ladsgroup.json
  • 01:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P35801 and previous config saved to /var/cache/conftool/dbconfig/20221021-013720-ladsgroup.json
  • 01:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P35800 and previous config saved to /var/cache/conftool/dbconfig/20221021-012450-ladsgroup.json
  • 01:24 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS buster
  • 01:22 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T321310
  • 01:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P35799 and previous config saved to /var/cache/conftool/dbconfig/20221021-012213-ladsgroup.json
  • 01:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35798 and previous config saved to /var/cache/conftool/dbconfig/20221021-011452-ladsgroup.json
  • 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2138:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P35797 and previous config saved to /var/cache/conftool/dbconfig/20221021-011324-ladsgroup.json
  • 01:13 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 01:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P35796 and previous config saved to /var/cache/conftool/dbconfig/20221021-011259-ladsgroup.json
  • 01:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T321312)', diff saved to https://phabricator.wikimedia.org/P35795 and previous config saved to /var/cache/conftool/dbconfig/20221021-010944-ladsgroup.json
  • 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1199 (T321312)', diff saved to https://phabricator.wikimedia.org/P35794 and previous config saved to /var/cache/conftool/dbconfig/20221021-010325-ladsgroup.json
  • 01:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 01:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 01:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T321312)', diff saved to https://phabricator.wikimedia.org/P35793 and previous config saved to /var/cache/conftool/dbconfig/20221021-010301-ladsgroup.json
  • 01:01 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS buster
  • 00:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P35792 and previous config saved to /var/cache/conftool/dbconfig/20221021-005752-ladsgroup.json
  • 00:48 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs4008.ulsfo.wmnet with OS bullseye
  • 00:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P35791 and previous config saved to /var/cache/conftool/dbconfig/20221021-004754-ladsgroup.json
  • 00:47 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt2001-dev.codfw.wmnet with OS bullseye
  • 00:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P35790 and previous config saved to /var/cache/conftool/dbconfig/20221021-004246-ladsgroup.json
  • 00:39 ryankemper@cumin1001: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T321310
  • 00:38 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS buster
  • 00:34 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T321310
  • 00:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P35789 and previous config saved to /var/cache/conftool/dbconfig/20221021-003247-ladsgroup.json
  • 00:32 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 00:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P35788 and previous config saved to /var/cache/conftool/dbconfig/20221021-002739-ladsgroup.json
  • 00:27 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs4008.ulsfo.wmnet with reason: host reimage
  • 00:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35787 and previous config saved to /var/cache/conftool/dbconfig/20221021-002624-ladsgroup.json
  • 00:25 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS buster
  • 00:20 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4047.ulsfo.wmnet with OS buster
  • 00:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T321312)', diff saved to https://phabricator.wikimedia.org/P35786 and previous config saved to /var/cache/conftool/dbconfig/20221021-001740-ladsgroup.json
  • 00:16 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4005.ulsfo.wmnet with OS bullseye
  • 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1190 (T321312)', diff saved to https://phabricator.wikimedia.org/P35785 and previous config saved to /var/cache/conftool/dbconfig/20221021-001123-ladsgroup.json
  • 00:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314', diff saved to https://phabricator.wikimedia.org/P35784 and previous config saved to /var/cache/conftool/dbconfig/20221021-001117-ladsgroup.json
  • 00:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 00:11 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 00:07 robh@cumin2002: START - Cookbook sre.hosts.reimage for host lvs4008.ulsfo.wmnet with OS bullseye
  • 00:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 00:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T321312)', diff saved to https://phabricator.wikimedia.org/P35783 and previous config saved to /var/cache/conftool/dbconfig/20221021-000636-ladsgroup.json
  • 00:06 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host lvs4008
  • 00:06 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs4008
  • 00:05 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:04 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 00:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4047.ulsfo.wmnet with OS buster
  • 00:01 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on ganeti4005.ulsfo.wmnet with reason: host reimage
  • 00:00 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4005.ulsfo.wmnet with reason: host reimage

2022-10-20

  • 23:58 robh@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host lvs4008
  • 23:58 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host lvs4008
  • 23:41 sukhe: COMPLETED: sudo cumin 'A:installserver' 'run-puppet-agent -q' for Gerrit 845074
  • 23:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35780 and previous config saved to /var/cache/conftool/dbconfig/20221020-234104-ladsgroup.json
  • 23:39 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4005.ulsfo.wmnet with OS bullseye
  • 23:38 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti4005.ulsfo.wmnet with OS bullseye
  • 23:38 sukhe: sudo cumin 'A:installserver' 'run-puppet-agent -q' for Gerrit 845074
  • 23:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149', diff saved to https://phabricator.wikimedia.org/P35779 and previous config saved to /var/cache/conftool/dbconfig/20221020-233623-ladsgroup.json
  • 23:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P35778 and previous config saved to /var/cache/conftool/dbconfig/20221020-233452-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35777 and previous config saved to /var/cache/conftool/dbconfig/20221020-233325-ladsgroup.json
  • 23:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 23:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 23:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T321312)', diff saved to https://phabricator.wikimedia.org/P35776 and previous config saved to /var/cache/conftool/dbconfig/20221020-233300-ladsgroup.json
  • 23:32 robh@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4005.ulsfo.wmnet with OS bullseye
  • 23:31 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 23:31 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 23:29 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 23:25 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 23:25 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 23:23 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 23:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1149 (T321312)', diff saved to https://phabricator.wikimedia.org/P35775 and previous config saved to /var/cache/conftool/dbconfig/20221020-232116-ladsgroup.json
  • 23:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P35774 and previous config saved to /var/cache/conftool/dbconfig/20221020-231754-ladsgroup.json
  • 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1149 (T321312)', diff saved to https://phabricator.wikimedia.org/P35773 and previous config saved to /var/cache/conftool/dbconfig/20221020-231446-ladsgroup.json
  • 23:14 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 23:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1149.eqiad.wmnet with reason: Maintenance
  • 23:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T321312)', diff saved to https://phabricator.wikimedia.org/P35772 and previous config saved to /var/cache/conftool/dbconfig/20221020-231422-ladsgroup.json
  • 23:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20221020-230242-ladsgroup.json
  • 22:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P35771 and previous config saved to /var/cache/conftool/dbconfig/20221020-225916-ladsgroup.json
  • 22:50 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt2003-dev.codfw.wmnet
  • 22:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt2002-dev.codfw.wmnet
  • 22:49 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt2001-dev.codfw.wmnet
  • 22:49 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet
  • 22:48 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1002.eqiad.wmnet
  • 22:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T321312)', diff saved to https://phabricator.wikimedia.org/P35770 and previous config saved to /var/cache/conftool/dbconfig/20221020-224736-ladsgroup.json
  • 22:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148', diff saved to https://phabricator.wikimedia.org/P35769 and previous config saved to /var/cache/conftool/dbconfig/20221020-224409-ladsgroup.json
  • 22:43 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1003.eqiad.wmnet
  • 22:43 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1002.eqiad.wmnet
  • 22:42 andrew@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet
  • 22:40 cstone: civicrm upgraded from c96dd3ae to 3e24d6f7
  • 22:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2136 (T321312)', diff saved to https://phabricator.wikimedia.org/P35768 and previous config saved to /var/cache/conftool/dbconfig/20221020-224003-ladsgroup.json
  • 22:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T321312)', diff saved to https://phabricator.wikimedia.org/P35767 and previous config saved to /var/cache/conftool/dbconfig/20221020-223937-ladsgroup.json
  • 22:36 andrew@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudvirt-wdqs1001.eqiad.wmnet
  • 22:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1148 (T321312)', diff saved to https://phabricator.wikimedia.org/P35766 and previous config saved to /var/cache/conftool/dbconfig/20221020-222903-ladsgroup.json
  • 22:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P35765 and previous config saved to /var/cache/conftool/dbconfig/20221020-222431-ladsgroup.json
  • 22:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1148 (T321312)', diff saved to https://phabricator.wikimedia.org/P35764 and previous config saved to /var/cache/conftool/dbconfig/20221020-222253-ladsgroup.json
  • 22:22 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 22:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1148.eqiad.wmnet with reason: Maintenance
  • 22:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T321312)', diff saved to https://phabricator.wikimedia.org/P35763 and previous config saved to /var/cache/conftool/dbconfig/20221020-222229-ladsgroup.json
  • 22:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119', diff saved to https://phabricator.wikimedia.org/P35762 and previous config saved to /var/cache/conftool/dbconfig/20221020-220924-ladsgroup.json
  • 22:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P35761 and previous config saved to /var/cache/conftool/dbconfig/20221020-220722-ladsgroup.json
  • 21:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:57 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS buster
  • 21:55 ryankemper@cumin1001: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin1001 - T321310
  • 21:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2119 (T321312)', diff saved to https://phabricator.wikimedia.org/P35760 and previous config saved to /var/cache/conftool/dbconfig/20221020-215418-ladsgroup.json
  • 21:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147', diff saved to https://phabricator.wikimedia.org/P35759 and previous config saved to /var/cache/conftool/dbconfig/20221020-215216-ladsgroup.json
  • 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2119 (T321312)', diff saved to https://phabricator.wikimedia.org/P35758 and previous config saved to /var/cache/conftool/dbconfig/20221020-214750-ladsgroup.json
  • 21:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 21:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2119.codfw.wmnet with reason: Maintenance
  • 21:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T321312)', diff saved to https://phabricator.wikimedia.org/P35757 and previous config saved to /var/cache/conftool/dbconfig/20221020-214725-ladsgroup.json
  • 21:40 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1147 (T321312)', diff saved to https://phabricator.wikimedia.org/P35756 and previous config saved to /var/cache/conftool/dbconfig/20221020-213709-ladsgroup.json
  • 21:35 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P35755 and previous config saved to /var/cache/conftool/dbconfig/20221020-213218-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1147 (T321312)', diff saved to https://phabricator.wikimedia.org/P35754 and previous config saved to /var/cache/conftool/dbconfig/20221020-213050-ladsgroup.json
  • 21:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 21:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1147.eqiad.wmnet with reason: Maintenance
  • 21:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35753 and previous config saved to /var/cache/conftool/dbconfig/20221020-213025-ladsgroup.json
  • 21:25 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:19 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:17 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:17 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 21:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P35752 and previous config saved to /var/cache/conftool/dbconfig/20221020-211712-ladsgroup.json
  • 21:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:16 TheresNoTime: close UTC late window
  • 21:16 samtar@deploy1002: Finished scap: Backport for statsv: Add error counters to delete/tags .js (T320543) (duration: 08m 26s)
  • 21:15 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P35751 and previous config saved to /var/cache/conftool/dbconfig/20221020-211519-ladsgroup.json
  • 21:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netmon1003.wikimedia.org
  • 21:13 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 21:11 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:10 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host netmon1003.wikimedia.org
  • 21:08 samtar@deploy1002: samtar and samtar: Backport for statsv: Add error counters to delete/tags .js (T320543) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:08 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1004.eqiad.wmnet
  • 21:07 samtar@deploy1002: Started scap: Backport for statsv: Add error counters to delete/tags .js (T320543)
  • 21:07 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:06 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 21:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: apply updates - bking@cumin2002 - T321310
  • 21:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T318955)', diff saved to https://phabricator.wikimedia.org/P35750 and previous config saved to /var/cache/conftool/dbconfig/20221020-210308-ladsgroup.json
  • 21:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T321312)', diff saved to https://phabricator.wikimedia.org/P35749 and previous config saved to /var/cache/conftool/dbconfig/20221020-210205-ladsgroup.json
  • 21:01 TheresNoTime: extending UTC late backport window
  • 21:01 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite1004.eqiad.wmnet
  • 21:00 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite1005.eqiad.wmnet
  • 21:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3314', diff saved to https://phabricator.wikimedia.org/P35748 and previous config saved to /var/cache/conftool/dbconfig/20221020-210012-ladsgroup.json
  • 20:59 hoo@deploy1002: Finished scap: Backport for Only generate QS maxlag for pooled servers (T315423 T238751) (duration: 07m 12s)
  • 20:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T321312)', diff saved to https://phabricator.wikimedia.org/P35747 and previous config saved to /var/cache/conftool/dbconfig/20221020-205532-ladsgroup.json
  • 20:37 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
  • 20:36 herron@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host graphite2004.codfw.wmnet
  • 20:36 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
  • 20:35 samtar@deploy1002: Finished scap: Backport for ReplyLinksController: Skip empty reply buttons container (T321185) (duration: 05m 23s)
  • 20:33 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1001.eqiad.wmnet
  • 20:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P35741 and previous config saved to /var/cache/conftool/dbconfig/20221020-203255-ladsgroup.json
  • 20:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P35739 and previous config saved to /var/cache/conftool/dbconfig/20221020-202344-ladsgroup.json
  • 20:22 thcipriani@deploy1002: Finished scap: Backport for Updates to Wikipedia wordmark/taglines and project icons (T319223) (duration: 14m 12s)
  • 20:22 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1001.eqiad.wmnet
  • 20:21 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:21 dzahn@cumin2002: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:eqiad and (A:gitlab-runner)
  • 20:19 robh@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T318955)', diff saved to https://phabricator.wikimedia.org/P35737 and previous config saved to /var/cache/conftool/dbconfig/20221020-201748-ladsgroup.json
  • 20:15 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1002.eqiad.wmnet
  • 20:14 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4049.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:14 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4047.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2106 (T321312)', diff saved to https://phabricator.wikimedia.org/P35736 and previous config saved to /var/cache/conftool/dbconfig/20221020-200947-ladsgroup.json
  • 20:09 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS buster
  • 20:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312', diff saved to https://phabricator.wikimedia.org/P35735 and previous config saved to /var/cache/conftool/dbconfig/20221020-200838-ladsgroup.json
  • 20:08 thcipriani@deploy1002: thcipriani and jdlrobson: Backport for Updates to Wikipedia wordmark/taglines and project icons (T319223) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:08 thcipriani@deploy1002: Started scap: Backport for Updates to Wikipedia wordmark/taglines and project icons (T319223)
  • 20:07 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1002.eqiad.wmnet
  • 20:05 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS buster
  • 20:05 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe1003.eqiad.wmnet
  • 20:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2106 (T321312)', diff saved to https://phabricator.wikimedia.org/P35734 and previous config saved to /var/cache/conftool/dbconfig/20221020-200321-ladsgroup.json
  • 20:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 20:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2106.codfw.wmnet with reason: Maintenance
  • 20:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2175 (T318955)', diff saved to https://phabricator.wikimedia.org/P35733 and previous config saved to /var/cache/conftool/dbconfig/20221020-200205-ladsgroup.json
  • 20:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 20:01 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:codfw and (A:gitlab-runner)
  • 20:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T318955)', diff saved to https://phabricator.wikimedia.org/P35732 and previous config saved to /var/cache/conftool/dbconfig/20221020-200143-ladsgroup.json
  • 20:00 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4049.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 20:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 20:00 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4047.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2099.codfw.wmnet with reason: Maintenance
  • 19:59 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:57 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 19:57 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe1003.eqiad.wmnet
  • 19:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1146:3312 (T321312)', diff saved to https://phabricator.wikimedia.org/P35731 and previous config saved to /var/cache/conftool/dbconfig/20221020-195331-ladsgroup.json
  • 19:50 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS buster
  • 19:41 dzahn@cumin2002: START - Cookbook sre.gitlab.reboot-runner rolling reboot on A:codfw and (A:gitlab-runner)
  • 19:39 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-be2001.codfw.wmnet
  • 19:38 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 19:38 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1145.eqiad.wmnet with reason: Maintenance
  • 19:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P35727 and previous config saved to /var/cache/conftool/dbconfig/20221020-193756-ladsgroup.json
  • 19:37 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS buster
  • 19:33 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2001.codfw.wmnet
  • 19:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312', diff saved to https://phabricator.wikimedia.org/P35726 and previous config saved to /var/cache/conftool/dbconfig/20221020-193130-ladsgroup.json
  • 19:29 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: apply updates - bking@cumin2002 - T321310
  • 19:28 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS buster
  • 19:27 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS buster
  • 19:26 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2001.codfw.wmnet
  • 19:23 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host thanos-fe2002.codfw.wmnet
  • 19:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P35725 and previous config saved to /var/cache/conftool/dbconfig/20221020-192249-ladsgroup.json
  • 19:16 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host thanos-fe2002.codfw.wmnet
  • 19:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2170:3312 (T318955)', diff saved to https://phabricator.wikimedia.org/P35724 and previous config saved to /var/cache/conftool/dbconfig/20221020-191624-ladsgroup.json
  • 19:15 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:13 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 19:13 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cp4029.ulsfo.wmnet
  • 19:13 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:13 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS buster
  • 19:12 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS buster
  • 19:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P35723 and previous config saved to /var/cache/conftool/dbconfig/20221020-190743-ladsgroup.json
  • 19:02 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS buster
  • 19:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2170:3312 (T318955)', diff saved to https://phabricator.wikimedia.org/P35722 and previous config saved to /var/cache/conftool/dbconfig/20221020-190101-ladsgroup.json
  • 19:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T318955)', diff saved to https://phabricator.wikimedia.org/P35721 and previous config saved to /var/cache/conftool/dbconfig/20221020-190039-ladsgroup.json
  • 18:56 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P35720 and previous config saved to /var/cache/conftool/dbconfig/20221020-185236-ladsgroup.json
  • 18:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35719 and previous config saved to /var/cache/conftool/dbconfig/20221020-185021-ladsgroup.json
  • 18:49 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts cp4029.ulsfo.wmnet
  • 18:46 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T318950)', diff saved to https://phabricator.wikimedia.org/P35718 and previous config saved to /var/cache/conftool/dbconfig/20221020-184547-ladsgroup.json
  • 18:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P35717 and previous config saved to /var/cache/conftool/dbconfig/20221020-184533-ladsgroup.json
  • 18:42 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:37 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:36 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P35716 and previous config saved to /var/cache/conftool/dbconfig/20221020-183515-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P35715 and previous config saved to /var/cache/conftool/dbconfig/20221020-183040-ladsgroup.json
  • 18:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P35714 and previous config saved to /var/cache/conftool/dbconfig/20221020-183026-ladsgroup.json
  • 18:28 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:26 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:25 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4005.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314', diff saved to https://phabricator.wikimedia.org/P35713 and previous config saved to /var/cache/conftool/dbconfig/20221020-182008-ladsgroup.json
  • 18:18 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:18 samtar@deploy1002: Finished scap: Backport for Hooks: Log to statsd when a page is noindex'd (T310974) (duration: 08m 08s)
  • 18:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T321312)', diff saved to https://phabricator.wikimedia.org/P35712 and previous config saved to /var/cache/conftool/dbconfig/20221020-181630-ladsgroup.json
  • 18:16 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P35711 and previous config saved to /var/cache/conftool/dbconfig/20221020-181533-ladsgroup.json
  • 18:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T318955)', diff saved to https://phabricator.wikimedia.org/P35710 and previous config saved to /var/cache/conftool/dbconfig/20221020-181520-ladsgroup.json
  • 18:10 samtar@deploy1002: samtar and samtar: Backport for Hooks: Log to statsd when a page is noindex'd (T310974) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 18:09 samtar@deploy1002: Started scap: Backport for Hooks: Log to statsd when a page is noindex'd (T310974)
  • 18:06 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cp4037
  • 18:06 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cp4037
  • 18:06 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cp4037.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:05 TheresNoTime: Backporting gerrit:845012 for T310974 to wmf.6
  • 18:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35709 and previous config saved to /var/cache/conftool/dbconfig/20221020-180502-ladsgroup.json
  • 18:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P35708 and previous config saved to /var/cache/conftool/dbconfig/20221020-180123-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T318950)', diff saved to https://phabricator.wikimedia.org/P35707 and previous config saved to /var/cache/conftool/dbconfig/20221020-180027-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2148 (T318955)', diff saved to https://phabricator.wikimedia.org/P35706 and previous config saved to /var/cache/conftool/dbconfig/20221020-180021-ladsgroup.json
  • 18:00 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 18:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T318955)', diff saved to https://phabricator.wikimedia.org/P35705 and previous config saved to /var/cache/conftool/dbconfig/20221020-175959-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T321312)', diff saved to https://phabricator.wikimedia.org/P35704 and previous config saved to /var/cache/conftool/dbconfig/20221020-175854-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1194 (T318950)', diff saved to https://phabricator.wikimedia.org/P35703 and previous config saved to /var/cache/conftool/dbconfig/20221020-175817-ladsgroup.json
  • 17:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T318950)', diff saved to https://phabricator.wikimedia.org/P35702 and previous config saved to /var/cache/conftool/dbconfig/20221020-175755-ladsgroup.json
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3314 (T321312)', diff saved to https://phabricator.wikimedia.org/P35701 and previous config saved to /var/cache/conftool/dbconfig/20221020-175726-ladsgroup.json
  • 17:57 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 17:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 17:53 robh@cumin2002: START - Cookbook sre.hosts.provision for host cp4037.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:53 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 17:52 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Maintenance
  • 17:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142 (T321312)', diff saved to https://phabricator.wikimedia.org/P35700 and previous config saved to /var/cache/conftool/dbconfig/20221020-175244-ladsgroup.json
  • 17:52 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:49 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:48 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P35699 and previous config saved to /var/cache/conftool/dbconfig/20221020-174617-ladsgroup.json
  • 17:46 mutante: phabricator - disabling git-ssh URIs for repo 'phabricator-translations' https://phabricator.wikimedia.org/source/phabricator-translation - T296022
  • 17:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312', diff saved to https://phabricator.wikimedia.org/P35698 and previous config saved to /var/cache/conftool/dbconfig/20221020-174453-ladsgroup.json
  • 17:44 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P35697 and previous config saved to /var/cache/conftool/dbconfig/20221020-174248-ladsgroup.json
  • 17:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P35696 and previous config saved to /var/cache/conftool/dbconfig/20221020-173737-ladsgroup.json
  • 17:36 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:35 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T321312)', diff saved to https://phabricator.wikimedia.org/P35695 and previous config saved to /var/cache/conftool/dbconfig/20221020-173111-ladsgroup.json
  • 17:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P35693 and previous config saved to /var/cache/conftool/dbconfig/20221020-172741-ladsgroup.json
  • 17:26 dzahn@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T321312)', diff saved to https://phabricator.wikimedia.org/P35692 and previous config saved to /var/cache/conftool/dbconfig/20221020-172445-ladsgroup.json
  • 17:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 17:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T321312)', diff saved to https://phabricator.wikimedia.org/P35691 and previous config saved to /var/cache/conftool/dbconfig/20221020-172419-ladsgroup.json
  • 17:23 dzahn@cumin2002: START - Cookbook sre.dns.netbox
  • 17:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1142', diff saved to https://phabricator.wikimedia.org/P35690 and previous config saved to /var/cache/conftool/dbconfig/20221020-172231-ladsgroup.json
  • 17:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 10310
  • 17:15 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 10310
  • 17:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2138:3312 (T318955)', diff saved to https://phabricator.wikimedia.org/P35689 and previous config saved to /var/cache/conftool/dbconfig/20221020-171439-ladsgroup.json
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T318950)', diff saved to https://phabricator.wikimedia.org/P35688 and previous config saved to /var/cache/conftool/dbconfig/20221020-171234-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P35679 and previous config saved to /var/cache/conftool/dbconfig/20221020-165456-ladsgroup.json
  • 16:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P35678 and previous config saved to /var/cache/conftool/dbconfig/20221020-165406-ladsgroup.json
  • 16:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141', diff saved to https://phabricator.wikimedia.org/P35677 and previous config saved to /var/cache/conftool/dbconfig/20221020-164515-ladsgroup.json
  • 16:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P35676 and previous config saved to /var/cache/conftool/dbconfig/20221020-164339-ladsgroup.json
  • 16:40 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1005.eqiad.wmnet
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P35675 and previous config saved to /var/cache/conftool/dbconfig/20221020-163950-ladsgroup.json
  • 16:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T321312)', diff saved to https://phabricator.wikimedia.org/P35674 and previous config saved to /var/cache/conftool/dbconfig/20221020-163900-ladsgroup.json
  • 16:22 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 16:20 mutante: phab1001 (phabricator) - remove LVS IP from loopback - ip addr del 208.80.154.250 dev lo - T296022
  • 16:20 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36351
  • 16:18 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36351
  • 16:18 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 16:17 urbanecm@deploy1002: Finished scap: Backport for MenteeOverview: Fix link under "reverted" column (T321321) (duration: 04m 22s)
  • 16:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P35668 and previous config saved to /var/cache/conftool/dbconfig/20221020-161648-ladsgroup.json
  • 16:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1141 (T321312)', diff saved to https://phabricator.wikimedia.org/P35667 and previous config saved to /var/cache/conftool/dbconfig/20221020-161502-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T318955)', diff saved to https://phabricator.wikimedia.org/P35666 and previous config saved to /var/cache/conftool/dbconfig/20221020-161326-ladsgroup.json
  • 16:13 urbanecm@deploy1002: Started scap: Backport for MenteeOverview: Fix link under "reverted" column (T321321)
  • 16:12 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 16:11 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 16:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2126 (T318955)', diff saved to https://phabricator.wikimedia.org/P35665 and previous config saved to /var/cache/conftool/dbconfig/20221020-161106-ladsgroup.json
  • 16:11 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 16:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 16:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 16:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 16:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T318955)', diff saved to https://phabricator.wikimedia.org/P35664 and previous config saved to /var/cache/conftool/dbconfig/20221020-161029-ladsgroup.json
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1141 (T321312)', diff saved to https://phabricator.wikimedia.org/P35663 and previous config saved to /var/cache/conftool/dbconfig/20221020-160832-ladsgroup.json
  • 16:08 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 16:08 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1141.eqiad.wmnet with reason: Maintenance
  • 16:08 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 16:08 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138 (T321312)', diff saved to https://phabricator.wikimedia.org/P35662 and previous config saved to /var/cache/conftool/dbconfig/20221020-160808-ladsgroup.json
  • 16:05 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 16:02 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 16:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P35661 and previous config saved to /var/cache/conftool/dbconfig/20221020-160142-ladsgroup.json
  • 16:01 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 16:00 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host failoid2002.codfw.wmnet
  • 16:00 volans@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 15:59 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 15:55 mutante: phabricator (diffusion) - clicked "disable" and then "deactivate" on Blubber diffusion repo. it's now "inactive, publishing and syncing has been disabled https://phabricator.wikimedia.org/source/blubber/ T317820
  • 15:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P35660 and previous config saved to /var/cache/conftool/dbconfig/20221020-155523-ladsgroup.json
  • 15:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1138', diff saved to https://phabricator.wikimedia.org/P35659 and previous config saved to /var/cache/conftool/dbconfig/20221020-155302-ladsgroup.json
  • 15:49 herron@cumin1001: START - Cookbook sre.hosts.reboot-single for host prometheus2005.codfw.wmnet
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1174 (T318950)', diff saved to https://phabricator.wikimedia.org/P35658 and previous config saved to /var/cache/conftool/dbconfig/20221020-154724-ladsgroup.json
  • 15:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:47 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 15:47 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35657 and previous config saved to /var/cache/conftool/dbconfig/20221020-154644-ladsgroup.json
  • 15:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T321312)', diff saved to https://phabricator.wikimedia.org/P35656 and previous config saved to /var/cache/conftool/dbconfig/20221020-154635-ladsgroup.json
  • 15:44 herron@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 15:41 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P35655 and previous config saved to /var/cache/conftool/dbconfig/20221020-154016-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T321312)', diff saved to https://phabricator.wikimedia.org/P35654 and previous config saved to /var/cache/conftool/dbconfig/20221020-154006-ladsgroup.json
  • 15:40 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 15:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 15:06 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: Reorder variables, GrowthExperiments: Define wgGEMentorshipUseIsActiveFlag (T318457) (duration: 04m 30s)
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2125 (T318955)', diff saved to https://phabricator.wikimedia.org/P35644 and previous config saved to /var/cache/conftool/dbconfig/20221020-150537-ladsgroup.json
  • 15:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 15:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T318955)', diff saved to https://phabricator.wikimedia.org/P35643 and previous config saved to /var/cache/conftool/dbconfig/20221020-150515-ladsgroup.json
  • 15:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P35642 and previous config saved to /var/cache/conftool/dbconfig/20221020-150329-ladsgroup.json
  • 15:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P35641 and previous config saved to /var/cache/conftool/dbconfig/20221020-150201-ladsgroup.json
  • 15:01 urbanecm@deploy1002: urbanecm and urbanecm: Backport for GrowthExperiments: Reorder variables, GrowthExperiments: Define wgGEMentorshipUseIsActiveFlag (T318457) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 15:01 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: Reorder variables, GrowthExperiments: Define wgGEMentorshipUseIsActiveFlag (T318457)
  • 15:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1170:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35640 and previous config saved to /var/cache/conftool/dbconfig/20221020-150125-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1170:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35639 and previous config saved to /var/cache/conftool/dbconfig/20221020-145214-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T318950)', diff saved to https://phabricator.wikimedia.org/P35638 and previous config saved to /var/cache/conftool/dbconfig/20221020-145152-ladsgroup.json
  • 14:51 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host idp-test1002.wikimedia.org
  • 14:50 pt1979@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wdqs2005']
  • 14:50 pt1979@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wdqs2005']
  • 14:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P35637 and previous config saved to /var/cache/conftool/dbconfig/20221020-145009-ladsgroup.json
  • 14:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T321312)', diff saved to https://phabricator.wikimedia.org/P35636 and previous config saved to /var/cache/conftool/dbconfig/20221020-144823-ladsgroup.json
  • 14:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121', diff saved to https://phabricator.wikimedia.org/P35635 and previous config saved to /var/cache/conftool/dbconfig/20221020-144655-ladsgroup.json
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T321312)', diff saved to https://phabricator.wikimedia.org/P35634 and previous config saved to /var/cache/conftool/dbconfig/20221020-144150-ladsgroup.json
  • 14:41 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T321312)', diff saved to https://phabricator.wikimedia.org/P35633 and previous config saved to /var/cache/conftool/dbconfig/20221020-144125-ladsgroup.json
  • 14:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 36012
  • 14:37 papaul: powerdown wdqs2005 for maintenance
  • 14:37 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 36012
  • 14:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P35632 and previous config saved to /var/cache/conftool/dbconfig/20221020-143646-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104', diff saved to https://phabricator.wikimedia.org/P35631 and previous config saved to /var/cache/conftool/dbconfig/20221020-143502-ladsgroup.json
  • 14:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1121 (T321312)', diff saved to https://phabricator.wikimedia.org/P35630 and previous config saved to /var/cache/conftool/dbconfig/20221020-143148-ladsgroup.json
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P35629 and previous config saved to /var/cache/conftool/dbconfig/20221020-142618-ladsgroup.json
  • 14:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1121 (T321312)', diff saved to https://phabricator.wikimedia.org/P35628 and previous config saved to /var/cache/conftool/dbconfig/20221020-142331-ladsgroup.json
  • 14:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 14:22 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1121.eqiad.wmnet with reason: Maintenance
  • 14:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P35627 and previous config saved to /var/cache/conftool/dbconfig/20221020-142139-ladsgroup.json
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2104 (T318955)', diff saved to https://phabricator.wikimedia.org/P35626 and previous config saved to /var/cache/conftool/dbconfig/20221020-141956-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2104 (T318955)', diff saved to https://phabricator.wikimedia.org/P35625 and previous config saved to /var/cache/conftool/dbconfig/20221020-141736-ladsgroup.json
  • 14:17 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2104.codfw.wmnet with reason: Maintenance
  • 14:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host etherpad1003.eqiad.wmnet
  • 14:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P35624 and previous config saved to /var/cache/conftool/dbconfig/20221020-141112-ladsgroup.json
  • 14:11 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1001.eqiad.wmnet
  • 14:09 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host etherpad1003.eqiad.wmnet
  • 14:09 jbond@cumin1001: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host etherpad1003.eqiad.wmnet
  • 14:08 jbond@cumin1001: START - Cookbook sre.hosts.reboot-single for host etherpad1003.eqiad.wmnet
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T318950)', diff saved to https://phabricator.wikimedia.org/P35623 and previous config saved to /var/cache/conftool/dbconfig/20221020-140633-ladsgroup.json
  • 14:04 btullis@cumin1001: START - Cookbook sre.hosts.reboot-single for host an-db1001.eqiad.wmnet
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1158 (T318950)', diff saved to https://phabricator.wikimedia.org/P35622 and previous config saved to /var/cache/conftool/dbconfig/20221020-140423-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 14:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2097.codfw.wmnet with reason: Maintenance
  • 13:58 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 2906
  • 13:57 btullis: building production-images on build2001 - to build spark T318730
  • 13:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T321312)', diff saved to https://phabricator.wikimedia.org/P35621 and previous config saved to /var/cache/conftool/dbconfig/20221020-135605-ladsgroup.json
  • 13:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 2906
  • 13:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8966
  • 13:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8966
  • 13:36 urbanecm@deploy1002: Finished scap: Backport for Disable wgParserEnableLegacyMediaDOM on viwiki (T314318) (duration: 06m 59s)
  • 13:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 1239
  • 13:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 20115
  • 13:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 20115
  • 13:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 7843
  • 13:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 7843
  • 13:29 urbanecm@deploy1002: urbanecm and arlolra: Backport for Disable wgParserEnableLegacyMediaDOM on viwiki (T314318) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:29 urbanecm@deploy1002: Started scap: Backport for Disable wgParserEnableLegacyMediaDOM on viwiki (T314318)
  • 13:28 btullis@cumin1001: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
  • 13:27 btullis@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 13:27 urbanecm@deploy1002: Finished scap: Backport for DataTableCellMentee: Strike-through suppressed mentees (T319185) (duration: 05m 18s)
  • 13:23 btullis@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 13:23 btullis@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 13:22 urbanecm@deploy1002: urbanecm and urbanecm: Backport for DataTableCellMentee: Strike-through suppressed mentees (T319185) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:21 urbanecm@deploy1002: Started scap: Backport for DataTableCellMentee: Strike-through suppressed mentees (T319185)
  • 13:21 btullis@cumin1001: END (FAIL) - Cookbook sre.postgresql.postgres-init (exit_code=99)
  • 13:20 btullis@cumin1001: START - Cookbook sre.postgresql.postgres-init
  • 13:19 urbanecm@deploy1002: Finished scap: Backport for zhwiki: Add 20 years logos (T320859), zhwiki: Update 20 years logos in logos.php and IS.php (T320859) (duration: 06m 57s)
  • 13:18 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12200
  • 13:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12200
  • 13:13 urbanecm@deploy1002: urbanecm and stang: Backport for zhwiki: Add 20 years logos (T320859), zhwiki: Update 20 years logos in logos.php and IS.php (T320859) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:12 urbanecm@deploy1002: Started scap: Backport for zhwiki: Add 20 years logos (T320859), zhwiki: Update 20 years logos in logos.php and IS.php (T320859)
  • 13:12 urbanecm@deploy1002: Finished scap: Backport for Fix broken wordmarks/taglines (T320944 T321124 T321258) (duration: 06m 03s)
  • 13:11 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 199524
  • 13:10 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 199524
  • 13:09 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 132203
  • 13:08 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 132203
  • 13:08 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 45102
  • 13:06 urbanecm@deploy1002: urbanecm and stang: Backport for Fix broken wordmarks/taglines (T320944 T321124 T321258) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:06 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 45102
  • 13:06 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36692
  • 13:06 urbanecm@deploy1002: Started scap: Backport for Fix broken wordmarks/taglines (T320944 T321124 T321258)
  • 13:05 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 36692
  • 13:05 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36351
  • 13:04 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 36351
  • 13:04 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 32787
  • 13:02 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 32787
  • 13:02 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 29791
  • 13:01 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 29791
  • 13:01 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 19151
  • 13:00 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 19151
  • 13:00 ayounsi@cumin2002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 16509
  • 12:58 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 16509
  • 12:58 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16276
  • 12:55 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 16276
  • 12:55 ayounsi@cumin2002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 15169
  • 12:52 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 15169
  • 12:52 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 14061
  • 12:50 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 14061
  • 12:50 ayounsi@cumin2002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 13335
  • 12:48 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 13335
  • 12:48 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11404
  • 12:47 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 11404
  • 12:47 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 11164
  • 12:46 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 11164
  • 12:46 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 10310
  • 12:44 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 10310
  • 12:44 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8966
  • 12:43 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 8966
  • 12:43 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4637
  • 12:42 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 4637
  • 12:42 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3292
  • 12:41 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 3292
  • 12:41 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2647
  • 12:39 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 2647
  • 12:39 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2152
  • 12:39 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 2152
  • 12:39 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 714
  • 12:37 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 714
  • 12:37 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 42
  • 12:35 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 42
  • 12:33 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 26803
  • 12:33 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 26803
  • 12:33 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9505
  • 12:31 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 9505
  • 12:31 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9498
  • 12:31 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 9498
  • 12:31 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 8674
  • 12:30 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 8674
  • 12:30 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7713
  • 12:29 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 7713
  • 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7575
  • 12:28 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 7575
  • 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7091
  • 12:28 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 7091
  • 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5650
  • 12:28 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 5650
  • 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4780
  • 12:28 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 4780
  • 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4766
  • 12:27 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 4766
  • 12:27 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4648
  • 12:26 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 4648
  • 12:26 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3856
  • 12:25 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 3856
  • 12:25 ayounsi@cumin2002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 1280
  • 12:25 ayounsi@cumin2002: START - Cookbook sre.network.peering with action 'email' for AS: 1280
  • 12:08 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:08 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:07 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:05 cgoubert@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:05 cgoubert@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:05 cgoubert@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:04 claime: Deploying new mw-debug namespace
  • 12:01 cgoubert@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:47 jbond: roll update for libksba
  • 11:16 jbond: upload new pypuppetdb package
  • 09:57 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 63949
  • 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 63949
  • 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 58453
  • 09:56 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 58453
  • 09:56 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 36012
  • 09:55 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 36012
  • 09:55 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 24429
  • 09:54 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 24429
  • 09:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16591
  • 09:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16591
  • 09:53 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16265
  • 09:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16265
  • 09:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12200
  • 09:52 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12200
  • 09:51 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 9269
  • 09:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 9269
  • 09:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6327
  • 09:50 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 6327
  • 09:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 3605
  • 09:49 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3605
  • 09:49 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 2828
  • 09:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2828
  • 09:48 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2518
  • 09:47 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2518
  • 09:47 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2516
  • 09:05 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 13335
  • 08:57 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 13335
  • 08:35 XioNoX: re-enabling Arelion on cr1-drmrs - T321157
  • 08:07 hashar@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.6 refs T320511
  • 07:52 godog: +40 to k8s-mlserve on prometheus codfw
  • 07:49 apergos: UTC morning backport and config training window closed
  • 07:11 kartik@deploy1002: Finished scap: Backport for testwiki: Enable Section Translation in 15 Wikipedias (T319175 T319176) (duration: 06m 41s)
  • 07:04 kartik@deploy1002: kartik and kartik: Backport for testwiki: Enable Section Translation in 15 Wikipedias (T319175 T319176) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 07:04 kartik@deploy1002: Started scap: Backport for testwiki: Enable Section Translation in 15 Wikipedias (T319175 T319176)
  • 06:53 kart_: Updated Updated cxserver to 2022-10-18-161640-production (T317224, T319175, T319176)
  • 06:52 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:51 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:48 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:47 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:42 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:41 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply

2022-10-19

  • 23:33 wfan: civicrm upgraded from 477323fe to c96dd3ae
  • 23:24 ejegg: re-enabled refund queue consumer job
  • 22:43 ejegg: updated fundraising CiviCRM from 4b9e981a to 477323fe
  • 22:33 ejegg: disabled fundraising refund QC, started language update job
  • 22:02 ejegg: updated standalone SmashPig IPN listener from f36143f0 to 9295dc2a
  • 22:00 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1003.eqiad.wmnet
  • 21:49 eileen: config revision changed from 903c8ce2 to 4aad20b1
  • 21:37 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1003.eqiad.wmnet on all recursors
  • 21:36 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1003.eqiad.wmnet on all recursors
  • 21:36 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:30 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 21:30 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1003.eqiad.wmnet
  • 21:30 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1002.eqiad.wmnet
  • 21:06 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1002.eqiad.wmnet on all recursors
  • 21:06 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1002.eqiad.wmnet on all recursors
  • 21:06 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:03 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 21:03 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1002.eqiad.wmnet
  • 20:49 mutante: lvs2010, lvs2008 - systemctl restart pybal.service ; ipvsadm -Dt '208.80.153.250:22' ; ipvsadm -Dt '[2620:0:860:ed1a::3:fa]:22' - T296022
  • 20:44 mutante: lvs1020, lvs1018 - systemctl restart pybal.service ; ipvsadm -Dt '208.80.154.250:22' ; ipvsadm -Dt '[2620:0:861:ed1a::3:16]:22' - T296022
  • 20:38 mutante: puppetmaster1001/puppetmaster2001 - delete .git-*.err files in /var/run/confd-template T296022
  • 20:33 TheresNoTime: closing UTC late backport window
  • 20:31 ejegg: updated payments-wiki from 4e1f308b to 9fa4abd7
  • 20:27 mutante: lvs2010 - restarted pybal, removed git-ssh IP with ipvsadm
  • 20:08 samtar@deploy1002: samtar and samtar: Backport for Hooks: Log to statsd when a page is noindex'd (T310974) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:07 samtar@deploy1002: Started scap: Backport for Hooks: Log to statsd when a page is noindex'd (T310974)
  • 19:46 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:45 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on dbstore1003.eqiad.wmnet with reason: Maintenance
  • 19:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T318955)', diff saved to https://phabricator.wikimedia.org/P35617 and previous config saved to /var/cache/conftool/dbconfig/20221019-194546-ladsgroup.json
  • 19:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P35616 and previous config saved to /var/cache/conftool/dbconfig/20221019-193039-ladsgroup.json
  • 19:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P35615 and previous config saved to /var/cache/conftool/dbconfig/20221019-191533-ladsgroup.json
  • 19:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T318955)', diff saved to https://phabricator.wikimedia.org/P35614 and previous config saved to /var/cache/conftool/dbconfig/20221019-190026-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1200 (T318955)', diff saved to https://phabricator.wikimedia.org/P35613 and previous config saved to /var/cache/conftool/dbconfig/20221019-185813-ladsgroup.json
  • 18:58 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 18:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T318955)', diff saved to https://phabricator.wikimedia.org/P35612 and previous config saved to /var/cache/conftool/dbconfig/20221019-185752-ladsgroup.json
  • 18:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P35611 and previous config saved to /var/cache/conftool/dbconfig/20221019-184245-ladsgroup.json
  • 18:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P35610 and previous config saved to /var/cache/conftool/dbconfig/20221019-182739-ladsgroup.json
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T318955)', diff saved to https://phabricator.wikimedia.org/P35609 and previous config saved to /var/cache/conftool/dbconfig/20221019-181232-ladsgroup.json
  • 18:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1185 (T318955)', diff saved to https://phabricator.wikimedia.org/P35608 and previous config saved to /var/cache/conftool/dbconfig/20221019-181019-ladsgroup.json
  • 18:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 18:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 18:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T318955)', diff saved to https://phabricator.wikimedia.org/P35607 and previous config saved to /var/cache/conftool/dbconfig/20221019-180958-ladsgroup.json
  • 18:07 mutante: aphlict1001 - manually gzip large logfile, logrotate did not run for a day - T321209
  • 18:03 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 18:01 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:57 cmjohnson@cumin1001: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P35606 and previous config saved to /var/cache/conftool/dbconfig/20221019-175451-ladsgroup.json
  • 17:54 cmjohnson@cumin1001: START - Cookbook sre.hosts.provision for host druid1009.mgmt.eqiad.wmnet with reboot policy FORCED
  • 17:49 cmjohnson@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:46 cmjohnson@cumin1001: START - Cookbook sre.dns.netbox
  • 17:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P35605 and previous config saved to /var/cache/conftool/dbconfig/20221019-173945-ladsgroup.json
  • 17:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T318955)', diff saved to https://phabricator.wikimedia.org/P35604 and previous config saved to /var/cache/conftool/dbconfig/20221019-172438-ladsgroup.json
  • 17:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1161 (T318955)', diff saved to https://phabricator.wikimedia.org/P35603 and previous config saved to /var/cache/conftool/dbconfig/20221019-170002-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 16:43 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 16:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35602 and previous config saved to /var/cache/conftool/dbconfig/20221019-163534-ladsgroup.json
  • 16:31 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-db1002.eqiad.wmnet with OS bullseye
  • 16:27 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 16:27 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 16:23 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage
  • 16:22 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 16:21 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 16:20 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-logging1004.eqiad.wmnet with reason: host reimage
  • 16:20 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P35601 and previous config saved to /var/cache/conftool/dbconfig/20221019-162028-ladsgroup.json
  • 16:19 mutante: wikitech - added herron to 'content administrators'
  • 16:17 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1002.eqiad.wmnet with reason: host reimage
  • 16:14 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1002.eqiad.wmnet with reason: host reimage
  • 16:08 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 16:08 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 16:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315', diff saved to https://phabricator.wikimedia.org/P35600 and previous config saved to /var/cache/conftool/dbconfig/20221019-160521-ladsgroup.json
  • 16:01 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-db1002.eqiad.wmnet with OS bullseye
  • 15:52 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 15:51 cmjohnson@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 15:51 jhathaway@cumin1001: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host aux-k8s-etcd1001.eqiad.wmnet
  • 15:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1144:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35599 and previous config saved to /var/cache/conftool/dbconfig/20221019-155015-ladsgroup.json
  • 15:41 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host kafka-logging1004.eqiad.wmnet with OS bullseye
  • 15:28 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-etcd1001.eqiad.wmnet on all recursors
  • 15:28 jhathaway@cumin1001: START - Cookbook sre.dns.wipe-cache aux-k8s-etcd1001.eqiad.wmnet on all recursors
  • 15:28 jhathaway@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1144:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35598 and previous config saved to /var/cache/conftool/dbconfig/20221019-152318-ladsgroup.json
  • 15:23 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:23 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1144.eqiad.wmnet with reason: Maintenance
  • 15:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T318955)', diff saved to https://phabricator.wikimedia.org/P35597 and previous config saved to /var/cache/conftool/dbconfig/20221019-152256-ladsgroup.json
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T318950)', diff saved to https://phabricator.wikimedia.org/P35596 and previous config saved to /var/cache/conftool/dbconfig/20221019-151204-ladsgroup.json
  • 15:08 bd808: Forcing puppet runs on cloudweb100[34] to deploy a new version of Striker
  • 15:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P35595 and previous config saved to /var/cache/conftool/dbconfig/20221019-150749-ladsgroup.json
  • 15:00 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 15:00 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 15:00 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 14:59 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99)
  • 14:59 bking@cumin2002: START - Cookbook sre.wdqs.data-reload
  • 14:57 jhathaway@cumin1001: START - Cookbook sre.dns.netbox
  • 14:57 jhathaway@cumin1001: START - Cookbook sre.ganeti.makevm for new host aux-k8s-etcd1001.eqiad.wmnet
  • 14:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P35594 and previous config saved to /var/cache/conftool/dbconfig/20221019-145658-ladsgroup.json
  • 14:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130', diff saved to https://phabricator.wikimedia.org/P35593 and previous config saved to /var/cache/conftool/dbconfig/20221019-145242-ladsgroup.json
  • 14:50 jnuche@deploy1002: Installation of scap version "4.27.1" completed for 1 hosts
  • 14:50 jnuche@deploy1002: Installing scap version "4.27.1" for 1 hosts
  • 14:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136', diff saved to https://phabricator.wikimedia.org/P35592 and previous config saved to /var/cache/conftool/dbconfig/20221019-144150-ladsgroup.json
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1130 (T318955)', diff saved to https://phabricator.wikimedia.org/P35591 and previous config saved to /var/cache/conftool/dbconfig/20221019-143736-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1130 (T318955)', diff saved to https://phabricator.wikimedia.org/P35590 and previous config saved to /var/cache/conftool/dbconfig/20221019-143523-ladsgroup.json
  • 14:35 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 14:35 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1130.eqiad.wmnet with reason: Maintenance
  • 14:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35589 and previous config saved to /var/cache/conftool/dbconfig/20221019-143501-ladsgroup.json
  • 14:34 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 2516
  • 14:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 2516
  • 14:34 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 1239
  • 14:34 hashar@deploy1002: Finished scap: Backport for Downgrade lcobucci/jwt (4.2.1 => 4.1.5) (T321160) (duration: 04m 25s)
  • 14:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 1239
  • 14:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 701
  • 14:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 701
  • 14:29 hashar@deploy1002: hashar and hashar: Backport for Downgrade lcobucci/jwt (4.2.1 => 4.1.5) (T321160) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 14:29 hashar@deploy1002: Started scap: Backport for Downgrade lcobucci/jwt (4.2.1 => 4.1.5) (T321160)
  • 14:29 hashar@deploy1002: backport aborted: (duration: 05m 09s)
  • 14:29 hashar@deploy1002: sync-world aborted: Backport for Downgrade lcobucci/jwt (4.2.1 => 4.1.5) (T321160) (duration: 03m 27s)
  • 14:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1136 (T318950)', diff saved to https://phabricator.wikimedia.org/P35588 and previous config saved to /var/cache/conftool/dbconfig/20221019-142643-ladsgroup.json
  • 14:25 hashar@deploy1002: hashar and hashar: Backport for Downgrade lcobucci/jwt (4.2.1 => 4.1.5) (T321160) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 14:25 hashar@deploy1002: Started scap: Backport for Downgrade lcobucci/jwt (4.2.1 => 4.1.5) (T321160)
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1136 (T318950)', diff saved to https://phabricator.wikimedia.org/P35587 and previous config saved to /var/cache/conftool/dbconfig/20221019-142433-ladsgroup.json
  • 14:24 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1136.eqiad.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T318950)', diff saved to https://phabricator.wikimedia.org/P35586 and previous config saved to /var/cache/conftool/dbconfig/20221019-142411-ladsgroup.json
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P35585 and previous config saved to /var/cache/conftool/dbconfig/20221019-141955-ladsgroup.json
  • 14:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-db1001.eqiad.wmnet with OS bullseye
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P35584 and previous config saved to /var/cache/conftool/dbconfig/20221019-140905-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315', diff saved to https://phabricator.wikimedia.org/P35583 and previous config saved to /var/cache/conftool/dbconfig/20221019-140449-ladsgroup.json
  • 14:02 matthiasmullie: UTC afternoon backports done
  • 14:00 mlitn@deploy1002: Finished scap: Backport for Add SearchVue to extension-list and config var (T310367) (duration: 26m 03s)
  • 13:55 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-db1001.eqiad.wmnet with reason: host reimage
  • 13:53 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127', diff saved to https://phabricator.wikimedia.org/P35582 and previous config saved to /var/cache/conftool/dbconfig/20221019-135358-ladsgroup.json
  • 13:52 btullis@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-db1001.eqiad.wmnet with reason: host reimage
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1113:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35581 and previous config saved to /var/cache/conftool/dbconfig/20221019-134942-ladsgroup.json
  • 13:43 mlitn@deploy1002: mlitn and mlitn: Backport for Add SearchVue to extension-list and config var (T310367) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 13:40 btullis@cumin1001: START - Cookbook sre.hosts.reimage for host an-db1001.eqiad.wmnet with OS bullseye
  • 13:38 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1127 (T318950)', diff saved to https://phabricator.wikimedia.org/P35580 and previous config saved to /var/cache/conftool/dbconfig/20221019-133852-ladsgroup.json
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1127 (T318950)', diff saved to https://phabricator.wikimedia.org/P35579 and previous config saved to /var/cache/conftool/dbconfig/20221019-133642-ladsgroup.json
  • 13:36 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 13:36 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1127.eqiad.wmnet with reason: Maintenance
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35578 and previous config saved to /var/cache/conftool/dbconfig/20221019-133620-ladsgroup.json
  • 13:34 mlitn@deploy1002: Started scap: Backport for Add SearchVue to extension-list and config var (T310367)
  • 13:32 mlitn@deploy1002: Finished scap: Backport for Add default value for search-thumbnail-extra-namespaces (T320337) (duration: 04m 41s)
  • 13:31 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8220
  • 13:28 hashar@deploy1002: backport Cancelled
  • 13:28 mlitn@deploy1002: mlitn and mlitn: Backport for Add default value for search-thumbnail-extra-namespaces (T320337) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:27 mlitn@deploy1002: Started scap: Backport for Add default value for search-thumbnail-extra-namespaces (T320337)
  • 13:27 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8220
  • 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1113:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35577 and previous config saved to /var/cache/conftool/dbconfig/20221019-132527-ladsgroup.json
  • 13:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1113.eqiad.wmnet with reason: Maintenance
  • 13:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T318955)', diff saved to https://phabricator.wikimedia.org/P35576 and previous config saved to /var/cache/conftool/dbconfig/20221019-132505-ladsgroup.json
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P35575 and previous config saved to /var/cache/conftool/dbconfig/20221019-132114-ladsgroup.json
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P35574 and previous config saved to /var/cache/conftool/dbconfig/20221019-130959-ladsgroup.json
  • 13:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317', diff saved to https://phabricator.wikimedia.org/P35573 and previous config saved to /var/cache/conftool/dbconfig/20221019-130607-ladsgroup.json
  • 13:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T314041)', diff saved to https://phabricator.wikimedia.org/P35572 and previous config saved to /var/cache/conftool/dbconfig/20221019-130459-ladsgroup.json
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110', diff saved to https://phabricator.wikimedia.org/P35571 and previous config saved to /var/cache/conftool/dbconfig/20221019-125452-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1101:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35570 and previous config saved to /var/cache/conftool/dbconfig/20221019-125101-ladsgroup.json
  • 12:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P35569 and previous config saved to /var/cache/conftool/dbconfig/20221019-124952-ladsgroup.json
  • 12:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1110 (T318955)', diff saved to https://phabricator.wikimedia.org/P35568 and previous config saved to /var/cache/conftool/dbconfig/20221019-123946-ladsgroup.json
  • 12:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131', diff saved to https://phabricator.wikimedia.org/P35567 and previous config saved to /var/cache/conftool/dbconfig/20221019-123446-ladsgroup.json
  • 12:27 XioNoX: remove cr4-ulsfo SV8 RS sessions
  • 12:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1131 (T314041)', diff saved to https://phabricator.wikimedia.org/P35566 and previous config saved to /var/cache/conftool/dbconfig/20221019-121939-ladsgroup.json
  • 12:15 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1110 (T318955)', diff saved to https://phabricator.wikimedia.org/P35565 and previous config saved to /var/cache/conftool/dbconfig/20221019-121506-ladsgroup.json
  • 12:15 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 12:14 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1110.eqiad.wmnet with reason: Maintenance
  • 12:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35564 and previous config saved to /var/cache/conftool/dbconfig/20221019-121444-ladsgroup.json
  • 11:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P35563 and previous config saved to /var/cache/conftool/dbconfig/20221019-115938-ladsgroup.json
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1101:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35562 and previous config saved to /var/cache/conftool/dbconfig/20221019-115443-ladsgroup.json
  • 11:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1101.eqiad.wmnet with reason: Maintenance
  • 11:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35561 and previous config saved to /var/cache/conftool/dbconfig/20221019-115421-ladsgroup.json
  • 11:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315', diff saved to https://phabricator.wikimedia.org/P35560 and previous config saved to /var/cache/conftool/dbconfig/20221019-114431-ladsgroup.json
  • 11:43 jnuche@deploy1002: Installation of scap version "4.27.1" completed for 552 hosts
  • 11:43 jnuche@deploy1002: Installing scap version "4.27.1" for 552 hosts
  • 11:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P35559 and previous config saved to /var/cache/conftool/dbconfig/20221019-113915-ladsgroup.json
  • 11:30 jnuche@deploy1002: Installing scap version "4.27.1" for 553 hosts
  • 11:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1096:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35558 and previous config saved to /var/cache/conftool/dbconfig/20221019-112925-ladsgroup.json
  • 11:25 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317', diff saved to https://phabricator.wikimedia.org/P35557 and previous config saved to /var/cache/conftool/dbconfig/20221019-112409-ladsgroup.json
  • 11:23 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:22 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:21 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:14 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:13 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:12 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:11 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:10 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:10 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1098:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35556 and previous config saved to /var/cache/conftool/dbconfig/20221019-110902-ladsgroup.json
  • 11:07 Emperor: upload wmf-beamer-style 0.2 to apt
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1131 (T314041)', diff saved to https://phabricator.wikimedia.org/P35555 and previous config saved to /var/cache/conftool/dbconfig/20221019-110635-ladsgroup.json
  • 11:06 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:06 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1131.eqiad.wmnet with reason: Maintenance
  • 11:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1098:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35554 and previous config saved to /var/cache/conftool/dbconfig/20221019-110552-ladsgroup.json
  • 11:05 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:05 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1098.eqiad.wmnet with reason: Maintenance
  • 11:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1096:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35553 and previous config saved to /var/cache/conftool/dbconfig/20221019-110308-ladsgroup.json
  • 11:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 11:02 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db1096.eqiad.wmnet with reason: Maintenance
  • 10:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:18 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:17 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:17 claime: Deploying mediawiki helm chart v0.2.4 on k8s-experimental mwdebug - T321042
  • 10:08 volans@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:58 volans@cumin1001: START - Cookbook sre.dns.netbox
  • 09:23 urbanecm@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 09:21 urbanecm@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 09:21 urbanecm@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 09:18 urbanecm@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 09:14 urbanecm@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 09:13 urbanecm@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 09:01 XioNoX: remove DHCP server and access zone on mr1-eqiad - T320962
  • 08:15 hashar@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.6 refs T320511 (duration: 03m 37s)
  • 08:12 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.6 refs T320511
  • 07:45 urbanecm@deploy1002: Finished scap: Backport for [growth] Turn mentorship off by default (T321056) (duration: 05m 14s)
  • 07:41 urbanecm@deploy1002: urbanecm and urbanecm: Backport for [growth] Turn mentorship off by default (T321056) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 07:40 urbanecm@deploy1002: Started scap: Backport for [growth] Turn mentorship off by default (T321056)
  • 07:39 urbanecm@deploy1002: Finished scap: Backport for Remove GEHomepageImpactModuleEnabled (duration: 04m 27s)
  • 07:35 urbanecm@deploy1002: urbanecm and kharlan: Backport for Remove GEHomepageImpactModuleEnabled synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:34 urbanecm@deploy1002: Started scap: Backport for Remove GEHomepageImpactModuleEnabled
  • 07:09 kartik@deploy1002: Finished scap: Backport for Enable specialcontribute campaign (T319306) (duration: 06m 11s)
  • 07:03 kartik@deploy1002: kartik and kartik: Backport for Enable specialcontribute campaign (T319306) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 07:03 kartik@deploy1002: Started scap: Backport for Enable specialcontribute campaign (T319306)
  • 06:40 XioNoX: enabled graceful-shutdown on drmrs Arelion BGP
  • 01:35 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1011.eqiad.wmnet with OS bullseye
  • 01:20 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 01:17 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1011.eqiad.wmnet with reason: host reimage
  • 01:05 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1011.eqiad.wmnet with OS bullseye
  • 01:04 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1009.eqiad.wmnet with OS bullseye
  • 00:50 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 00:46 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1009.eqiad.wmnet with reason: host reimage
  • 00:34 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1009.eqiad.wmnet with OS bullseye
  • 00:34 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-presto1007.eqiad.wmnet with OS bullseye
  • 00:14 cmjohnson@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage
  • 00:11 cmjohnson@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on an-presto1007.eqiad.wmnet with reason: host reimage

2022-10-18

  • 23:58 cmjohnson@cumin1001: START - Cookbook sre.hosts.reimage for host an-presto1007.eqiad.wmnet with OS bullseye
  • 22:18 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:17 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 21:11 mutante: otrs1001 - emptied exim paniclog
  • 21:03 TheresNoTime: closing UTC late backport window
  • 21:02 samtar@deploy1002: Finished scap: Backport for arwiki: Fix editeditorprotected restriction level (T321111) (duration: 07m 08s)
  • 20:56 samtar@deploy1002: samtar and stang: Backport for arwiki: Fix editeditorprotected restriction level (T321111) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:55 samtar@deploy1002: Started scap: Backport for arwiki: Fix editeditorprotected restriction level (T321111)
  • 20:54 samtar@deploy1002: Finished scap: Backport for tumwiki: Update project logo (T320473) (duration: 05m 19s)
  • 20:50 mutante: phabricator - on new machines, find / -uid 497 -exec chown phd {}\; to fix privileges. (and then the same for -gid 498) The user phd used to be 497:498 (pid:gid) on old hosts but has been replaced with proper systemd system user using 920:920 T313360
  • 20:49 samtar@deploy1002: samtar and stang: Backport for tumwiki: Update project logo (T320473) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:48 samtar@deploy1002: Started scap: Backport for tumwiki: Update project logo (T320473)
  • 20:46 samtar@deploy1002: Finished scap: Backport for i18n: Fix typo and simplify preference description (T321038) (duration: 16m 31s)
  • 20:34 samtar@deploy1002: samtar and jdlrobson: Backport for i18n: Fix typo and simplify preference description (T321038) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:29 samtar@deploy1002: Started scap: Backport for i18n: Fix typo and simplify preference description (T321038)
  • 20:14 samtar@deploy1002: Finished scap: Backport for Move icons to dedicated folder, Standardize wordmark names (duration: 11m 07s)
  • 20:03 samtar@deploy1002: samtar and jdlrobson: Backport for Move icons to dedicated folder, Standardize wordmark names synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:02 samtar@deploy1002: Started scap: Backport for Move icons to dedicated folder, Standardize wordmark names
  • 18:58 mutante: rsyncing phab dump file - pull from phab1000 to all other hosts T313360
  • 16:44 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:43 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 16:38 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 5650
  • 16:35 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 5650
  • 16:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 100%: 10', diff saved to https://phabricator.wikimedia.org/P35549 and previous config saved to /var/cache/conftool/dbconfig/20221018-163219-ladsgroup.json
  • 16:17 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 75%: 10', diff saved to https://phabricator.wikimedia.org/P35548 and previous config saved to /var/cache/conftool/dbconfig/20221018-161714-ladsgroup.json
  • 16:02 ladsgroup@cumin1001: dbctl commit (dc=all): 'db1131 (re)pooling @ 25%: 10', diff saved to https://phabricator.wikimedia.org/P35547 and previous config saved to /var/cache/conftool/dbconfig/20221018-160209-ladsgroup.json
  • 15:55 hashar: Stopping Gerrit due to a mistake in deploying plugin (forgot to reinstall the builtin plugins)
  • 15:51 hashar@deploy1002: Finished deploy [gerrit/gerrit@da5de16]: gerrit1001: remove motd plugin and its config # T321075 (duration: 00m 08s)
  • 15:51 hashar@deploy1002: Started deploy [gerrit/gerrit@da5de16]: gerrit1001: remove motd plugin and its config # T321075
  • 15:50 hashar@deploy1002: Finished deploy [gerrit/gerrit@da5de16]: gerrit2002: remove motd plugin and its config # T321075 (duration: 00m 10s)
  • 15:49 hashar@deploy1002: Started deploy [gerrit/gerrit@da5de16]: gerrit2002: remove motd plugin and its config # T321075
  • 15:15 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:53 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:59 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:57 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 13:53 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: Enable AddLink backend for be_x_oldwiki (T304549) (duration: 04m 56s)
  • 13:48 kharlan@deploy1002: kharlan and kharlan: Backport for GrowthExperiments: Enable AddLink backend for be_x_oldwiki (T304549) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:48 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: Enable AddLink backend for be_x_oldwiki (T304549)
  • 13:38 kostajh: UTC afternoon backport+config window done \o/
  • 13:35 kharlan@deploy1002: Finished scap: Backport for GrowthExperiments: Enable AddLink backend for bat_smg (T304549) (duration: 04m 50s)
  • 13:30 kharlan@deploy1002: kharlan and kharlan: Backport for GrowthExperiments: Enable AddLink backend for bat_smg (T304549) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:30 kharlan@deploy1002: Started scap: Backport for GrowthExperiments: Enable AddLink backend for bat_smg (T304549)
  • 13:07 kharlan@deploy1002: kharlan and matmarex: Backport for Add "Clear Affordances" to DiscussionTools beta feature on most wikis (T320683) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:06 kharlan@deploy1002: Started scap: Backport for Add "Clear Affordances" to DiscussionTools beta feature on most wikis (T320683)
  • 11:57 kharlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 11:55 kharlan@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 11:55 kharlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 11:52 kharlan@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 11:51 kharlan@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 11:50 kharlan@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 11:37 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.6 refs T320511
  • 11:08 claime: Nutcrackerd disabled on k8s-experimental mwdebug - T321042
  • 11:06 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:06 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:05 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:59 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:57 claime: Disabling nutcracker on k8s-experimental mwdebug - T321042
  • 10:17 urbanecm@deploy1002: Finished scap: Backport for Revert "Add multiple integration tests for Hooks.php" (T321041) (duration: 06m 24s)
  • 10:11 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Revert "Add multiple integration tests for Hooks.php" (T321041) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 10:11 urbanecm@deploy1002: Started scap: Backport for Revert "Add multiple integration tests for Hooks.php" (T321041)
  • 08:50 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert "group0 wikis to 1.40.0-wmf.6" # T320511
  • 08:35 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.6 refs T320511
  • 08:28 hashar@deploy1002: Pruned MediaWiki: 1.40.0-wmf.4 (duration: 02m 11s)
  • 08:26 hashar: scap clean auto # T320511
  • 08:23 hashar@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.6 refs T320511 (duration: 36m 04s)
  • 07:47 hashar@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.6 refs T320511
  • 07:40 hashar: `scap stage-train 1.40.0-wmf.6` # T320511
  • 07:37 hashar: Scratched /srv/mediawiki-staging/php-1.40.0-wmf.6 entirely and doing `scap prep` instead
  • 07:35 hashar: Rebased /srv/mediawiki-staging/php-1.40.0-wmf.6 for de15f77 ( T321021 ) and 0f8be84 ( T319447 )

2022-10-17

  • 23:16 bblack@puppetmaster2001: conftool action : set/pooled=yes; selector: service=git-ssh
  • 23:16 bblack@puppetmaster2001: conftool action : set/weight=100; selector: service=git-ssh
  • 22:55 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for otrs1001.eqiad.wmnet
  • 22:55 dzahn@cumin2002: START - Cookbook sre.hosts.remove-downtime for otrs1001.eqiad.wmnet
  • 22:41 mutante: otrs1001 - systemctl reset-failed (clear alert for ifup@ens13.service)
  • 22:36 bblack: ganeti1027 - gnt-instance reboot otrs1001.eqiad.wmnet
  • 22:36 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on otrs1001.eqiad.wmnet with reason: reboot
  • 22:35 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on otrs1001.eqiad.wmnet with reason: reboot
  • 22:34 bblack: ganeti1027: executing gnt-instance modify -B maxmem=8192 -B memory=8192 otrs1001.eqiad.wmnet
  • 21:33 mutante: otrs1001 - after local exim queue has been drained, set MaxThreads for clamav to 12 again, restarted clamav
  • 21:33 mstyles@deploy1002: Synchronized php-1.40.0-wmf.5/extensions/CheckUser/src/Api/ApiQueryCheckUser.php: (no justification provided) (duration: 03m 37s)
  • 21:20 mutante: otrs1001 - re-enabling puppet, running puppet
  • 21:09 mutante: otrs1001 - changing MaxThreads from 6 to 1 in /etc/clamav/clamd.conf, starting clamav
  • 21:02 mutante: otrs1001 - temp disabled puppet, changing MaxThreads from 12 to 6 in /etc/clamav/clamd.conf
  • 20:40 mutante: mx1001 - exim4 -qf - trying to re-deliver mail in queue for info@ OTRS queue
  • 20:18 urbanecm@deploy1002: Finished scap: 6762292a4: e320d48c8: 6762292a4: DicsussionTools/WikimediaEvents backports (T315688, T315689, T320938) (duration: 04m 35s)
  • 20:13 urbanecm@deploy1002: Started scap: 6762292a4: e320d48c8: 6762292a4: DicsussionTools/WikimediaEvents backports (T315688, T315689, T320938)
  • 19:58 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=phab1001-vcs.eqiad.wmnet
  • 19:57 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 19:20 mutante: otrs1001 - started failed clamav-daemon service
  • 18:57 mutante: puppetmaster2001 - deleted confd-template .err files
  • 18:56 mutante: puppetmaster1001 - deleted confd-template .err files
  • 18:49 dzahn@cumin2002: conftool action : set/pooled=inactive; selector: name=phab1001-vcs.eqiad.wmnet
  • 18:48 dzahn@cumin2002: conftool action : set/pooled=inactive; selector: name=phab2001-vcs.codfw.wmnet
  • 18:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T318955)', diff saved to https://phabricator.wikimedia.org/P35544 and previous config saved to /var/cache/conftool/dbconfig/20221017-181217-ladsgroup.json
  • 17:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P35543 and previous config saved to /var/cache/conftool/dbconfig/20221017-175711-ladsgroup.json
  • 17:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P35542 and previous config saved to /var/cache/conftool/dbconfig/20221017-174204-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T318955)', diff saved to https://phabricator.wikimedia.org/P35541 and previous config saved to /var/cache/conftool/dbconfig/20221017-172658-ladsgroup.json
  • 17:19 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 32787
  • 17:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 32787
  • 17:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2177 (T318955)', diff saved to https://phabricator.wikimedia.org/P35540 and previous config saved to /var/cache/conftool/dbconfig/20221017-171229-ladsgroup.json
  • 17:12 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 17:12 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 17:11 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T318955)', diff saved to https://phabricator.wikimedia.org/P35539 and previous config saved to /var/cache/conftool/dbconfig/20221017-171156-ladsgroup.json
  • 16:56 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P35538 and previous config saved to /var/cache/conftool/dbconfig/20221017-165649-ladsgroup.json
  • 16:41 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P35537 and previous config saved to /var/cache/conftool/dbconfig/20221017-164143-ladsgroup.json
  • 16:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T318955)', diff saved to https://phabricator.wikimedia.org/P35536 and previous config saved to /var/cache/conftool/dbconfig/20221017-162636-ladsgroup.json
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2156 (T318955)', diff saved to https://phabricator.wikimedia.org/P35535 and previous config saved to /var/cache/conftool/dbconfig/20221017-161843-ladsgroup.json
  • 16:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 16:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 16:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 16:18 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 16:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T318955)', diff saved to https://phabricator.wikimedia.org/P35534 and previous config saved to /var/cache/conftool/dbconfig/20221017-161806-ladsgroup.json
  • 16:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T318950)', diff saved to https://phabricator.wikimedia.org/P35533 and previous config saved to /var/cache/conftool/dbconfig/20221017-161330-ladsgroup.json
  • 16:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P35532 and previous config saved to /var/cache/conftool/dbconfig/20221017-160259-ladsgroup.json
  • 15:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P35531 and previous config saved to /var/cache/conftool/dbconfig/20221017-155823-ladsgroup.json
  • 15:47 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P35530 and previous config saved to /var/cache/conftool/dbconfig/20221017-154753-ladsgroup.json
  • 15:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P35529 and previous config saved to /var/cache/conftool/dbconfig/20221017-154317-ladsgroup.json
  • 15:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T318955)', diff saved to https://phabricator.wikimedia.org/P35528 and previous config saved to /var/cache/conftool/dbconfig/20221017-153246-ladsgroup.json
  • 15:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T318950)', diff saved to https://phabricator.wikimedia.org/P35527 and previous config saved to /var/cache/conftool/dbconfig/20221017-152810-ladsgroup.json
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2182 (T318950)', diff saved to https://phabricator.wikimedia.org/P35526 and previous config saved to /var/cache/conftool/dbconfig/20221017-152552-ladsgroup.json
  • 15:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35525 and previous config saved to /var/cache/conftool/dbconfig/20221017-152531-ladsgroup.json
  • 15:18 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2149 (T318955)', diff saved to https://phabricator.wikimedia.org/P35524 and previous config saved to /var/cache/conftool/dbconfig/20221017-151808-ladsgroup.json
  • 15:18 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 15:17 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 15:10 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P35523 and previous config saved to /var/cache/conftool/dbconfig/20221017-151024-ladsgroup.json
  • 15:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 15:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 15:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T318955)', diff saved to https://phabricator.wikimedia.org/P35522 and previous config saved to /var/cache/conftool/dbconfig/20221017-150440-ladsgroup.json
  • 14:55 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2169:3317', diff saved to https://phabricator.wikimedia.org/P35521 and previous config saved to /var/cache/conftool/dbconfig/20221017-145517-ladsgroup.json
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2169:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35518 and previous config saved to /var/cache/conftool/dbconfig/20221017-143753-ladsgroup.json
  • 14:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:37 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35517 and previous config saved to /var/cache/conftool/dbconfig/20221017-143731-ladsgroup.json
  • 14:37 papaul: on going maintenance on cr1-eqiad
  • 14:36 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=eqiad
  • 14:36 claime: Depooling eventgate-main in eqiad - T303543
  • 14:35 claime: Repooling eventgate-main in codfw - T303543
  • 14:34 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-main,name=codfw
  • 14:34 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109', diff saved to https://phabricator.wikimedia.org/P35516 and previous config saved to /var/cache/conftool/dbconfig/20221017-143427-ladsgroup.json
  • 14:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 14:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 14:29 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-main,name=codfw
  • 14:29 claime: Depooling eventgate-main in codfw - T303543
  • 14:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 14:27 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 14:25 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:24 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:23 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics-external,name=eqiad
  • 14:23 claime: Repooling eventgate-analytics-external in eqiad - T303543
  • 14:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:22 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P35515 and previous config saved to /var/cache/conftool/dbconfig/20221017-142224-ladsgroup.json
  • 14:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2109 (T318955)', diff saved to https://phabricator.wikimedia.org/P35514 and previous config saved to /var/cache/conftool/dbconfig/20221017-141921-ladsgroup.json
  • 14:16 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-analytics-external,name=eqiad
  • 14:16 claime: Depooling eventgate-analytics-external in eqiad - T303543
  • 14:14 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics-external,name=codfw
  • 14:14 claime: Repooling eventgate-analytics-external in codfw - T303543
  • 14:13 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 14:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 14:09 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-analytics-external,name=codfw
  • 14:09 claime: Depooling eventgate-analytics-external in codfw - T303543
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317', diff saved to https://phabricator.wikimedia.org/P35513 and previous config saved to /var/cache/conftool/dbconfig/20221017-140717-ladsgroup.json
  • 14:05 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=eqiad
  • 14:05 claime: Repooling eventgate-analytics in eqiad - T303543
  • 14:04 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2109 (T318955)', diff saved to https://phabricator.wikimedia.org/P35512 and previous config saved to /var/cache/conftool/dbconfig/20221017-140452-ladsgroup.json
  • 14:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2109.codfw.wmnet with reason: Maintenance
  • 14:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T318955)', diff saved to https://phabricator.wikimedia.org/P35511 and previous config saved to /var/cache/conftool/dbconfig/20221017-140430-ladsgroup.json
  • 14:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 14:00 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-analytics,name=eqiad
  • 14:00 claime: Depooling eventgate-analytics in eqiad - T303543
  • 13:57 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventgate-analytics,name=codfw
  • 13:56 claime: Repooling eventgate-analytics in codfw - T303543
  • 13:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 13:56 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 13:52 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2168:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35510 and previous config saved to /var/cache/conftool/dbconfig/20221017-135211-ladsgroup.json
  • 13:51 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-analytics,name=codfw
  • 13:50 claime: Depooling eventgate-analytics in codfw - T303543
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2168:3317 (T318950)', diff saved to https://phabricator.wikimedia.org/P35509 and previous config saved to /var/cache/conftool/dbconfig/20221017-134953-ladsgroup.json
  • 13:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T318950)', diff saved to https://phabricator.wikimedia.org/P35508 and previous config saved to /var/cache/conftool/dbconfig/20221017-134931-ladsgroup.json
  • 13:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105', diff saved to https://phabricator.wikimedia.org/P35507 and previous config saved to /var/cache/conftool/dbconfig/20221017-134924-ladsgroup.json
  • 13:49 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:48 claime: Repooled eventgate-logging-external in equiad - T303543
  • 13:47 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 13:47 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:46 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 13:29 urbanecm@deploy1002: Finished scap: Backport for GrowthExperiments: enable the vue version of mentee overview in all wikis (T300532) (duration: 06m 24s)
  • 13:28 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventgate-logging-external,name=codfw
  • 13:27 claime: Depooling eventgate-logging-external in codfw - T303543
  • 13:23 urbanecm@deploy1002: urbanecm and sgimeno: Backport for GrowthExperiments: enable the vue version of mentee overview in all wikis (T300532) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:23 urbanecm@deploy1002: Started scap: Backport for GrowthExperiments: enable the vue version of mentee overview in all wikis (T300532)
  • 13:22 urbanecm@deploy1002: Finished scap: Backport for Enable Sandbox Extension at Bengali Wikiquote (T320903) (duration: 04m 54s)
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P35504 and previous config saved to /var/cache/conftool/dbconfig/20221017-131918-ladsgroup.json
  • 13:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2105 (T318955)', diff saved to https://phabricator.wikimedia.org/P35503 and previous config saved to /var/cache/conftool/dbconfig/20221017-131911-ladsgroup.json
  • 13:18 urbanecm@deploy1002: urbanecm and mdsshakil: Backport for Enable Sandbox Extension at Bengali Wikiquote (T320903) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:17 urbanecm@deploy1002: Started scap: Backport for Enable Sandbox Extension at Bengali Wikiquote (T320903)
  • 13:17 urbanecm@deploy1002: Finished scap: 52821e09c: 35000a4b: dewiktionary: Update logo (T320891) (duration: 04m 03s)
  • 13:13 urbanecm@deploy1002: Started scap: 52821e09c: 35000a4b: dewiktionary: Update logo (T320891)
  • 13:10 urbanecm@deploy1002: Finished scap: b434c5a84: 9d10a60ea: Wordmark changes (T320944, T320840) (duration: 04m 32s)
  • 13:06 root@cumin1001: END (PASS) - Cookbook sre.discovery.service-route (exit_code=0)
  • 13:06 root@cumin1001: START - Cookbook sre.discovery.service-route
  • 13:05 urbanecm@deploy1002: Started scap: b434c5a84: 9d10a60ea: Wordmark changes (T320944, T320840)
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2105 (T318955)', diff saved to https://phabricator.wikimedia.org/P35502 and previous config saved to /var/cache/conftool/dbconfig/20221017-130440-ladsgroup.json
  • 13:04 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:04 ladsgroup@cumin1001: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:04 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2105.codfw.wmnet with reason: Maintenance
  • 13:04 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T318950)', diff saved to https://phabricator.wikimedia.org/P35501 and previous config saved to /var/cache/conftool/dbconfig/20221017-130412-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2159 (T318950)', diff saved to https://phabricator.wikimedia.org/P35500 and previous config saved to /var/cache/conftool/dbconfig/20221017-130154-ladsgroup.json
  • 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on db2095.codfw.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 12:36 XioNoX: re-enable BGP between cr1 and lsw1-e1 - T320566
  • 11:41 topranks: moving port et-2/0/49 out of ae1 bundle asw2-d-eqiad
  • 11:38 topranks: moving et-1/1/3 out of ae bundle on cr1-eqiad
  • 11:11 XioNoX: cr1-eqiad> request chassis fpc slot 1 offline - T320566
  • 11:09 topranks: shutting down BGP sessions from cr1-eqiad to lsw1-e1-eqiad in advance of linecard reboot
  • 10:27 XioNoX: disable cr1-eqiad:ae4 for recabling and troubleshooting - T320566
  • 09:48 jynus: powercycle db1202
  • 09:24 XioNoX: de-pref eqiad-drmrs GTT VPLS (latency between eqiad and drmrs will increase) - T320566
  • 09:09 XioNoX: de-pref cr1-eqiad wavelength transports (to codfw and drmrs) - T320566
  • 08:55 XioNoX: Move all eqiad VRRP mastership to cr2 - T320566
  • 08:03 jynus: restarting several bacula-related daemons to update its configuration
  • 07:57 urbanecm@deploy1002: Finished scap: Backport for Mentee filters: always use mw.user.options values to initialise the mentees store (T320728) (duration: 07m 22s)
  • 07:50 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Mentee filters: always use mw.user.options values to initialise the mentees store (T320728) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:50 urbanecm@deploy1002: Started scap: Backport for Mentee filters: always use mw.user.options values to initialise the mentees store (T320728)
  • 07:38 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 07:38 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 07:38 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 07:37 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 07:34 Emperor: set thanos ring replicas to 3.60 T311690
  • 07:01 elukey: powercycle parse1002 - serial console's tty not responding, OEM events registered in `racadm getsel`
  • 06:14 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 16276
  • 06:08 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 16276

2022-10-15

  • 23:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1131 T320879', diff saved to https://phabricator.wikimedia.org/P35497 and previous config saved to /var/cache/conftool/dbconfig/20221015-232716-ladsgroup.json
  • 23:23 ladsgroup@cumin1001: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T320879', diff saved to https://phabricator.wikimedia.org/P35495 and previous config saved to /var/cache/conftool/dbconfig/20221015-232320-ladsgroup.json
  • 23:22 Amir1: Starting s6 eqiad failover from db1131 to db1173 - T320879
  • 23:10 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s6 T320879
  • 23:10 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s6 T320879

2022-10-14

  • 22:56 mutante: pcc-worker1003.puppet-diffs.eqiad1.wikimedia.cloud - out of disk space again - deleted 3.5GB job "1460" to unblock puppet compiling
  • 20:48 jhathaway@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 20:48 jhathaway@cumin1001: START - Cookbook sre.network.cf
  • 19:57 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 19:57 oblivian@cumin1001: START - Cookbook sre.network.cf
  • 19:55 oblivian@cumin1001: END (PASS) - Cookbook sre.network.cf (exit_code=0)
  • 19:55 oblivian@cumin1001: START - Cookbook sre.network.cf
  • 18:08 mutante: contint* - temp disabled puppet, deploying gerrit:834400, docker version upgrade on CI servers (T318382)
  • 15:49 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:48 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 15:46 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 15:45 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 15:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:43 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:40 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:40 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:48 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:47 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:47 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 14:43 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:42 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:40 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 14:32 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:31 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:29 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 14:29 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:29 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:28 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:27 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 14:27 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:27 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 14:27 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:22 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 14:21 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:19 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:17 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:09 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:09 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 14:06 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 13:59 jclark@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:57 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 13:57 jclark@cumin1001: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:55 jclark@cumin1001: START - Cookbook sre.dns.netbox
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1202 - Degraded RAID (T320786)', diff saved to https://phabricator.wikimedia.org/P35487 and previous config saved to /var/cache/conftool/dbconfig/20221014-120155-ladsgroup.json
  • 10:22 godog: upgrade grafana to 8.5.14
  • 10:15 dcausse: Deployed patch for T320785
  • 08:47 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts elastic[2028-2030].codfw.wmnet
  • 08:47 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:46 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 08:31 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic[2028-2030].codfw.wmnet
  • 08:29 moritzm: installing git security updates on buster
  • 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1008.eqiad.wmnet
  • 08:15 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:15 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts elastic[2025-2027].codfw.wmnet
  • 08:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:14 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 08:12 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:07 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1008.eqiad.wmnet
  • 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1007.eqiad.wmnet
  • 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:02 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:57 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1007.eqiad.wmnet
  • 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1006.eqiad.wmnet
  • 07:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:54 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:45 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1006.eqiad.wmnet
  • 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti1005.eqiad.wmnet
  • 07:43 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:41 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:37 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic[2025-2027].codfw.wmnet
  • 07:36 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1005.eqiad.wmnet
  • 07:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1008.eqiad.wmnet with reason: Remove from cluster for eventual decom
  • 07:16 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1008.eqiad.wmnet with reason: Remove from cluster for eventual decom
  • 06:46 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 7843
  • 06:45 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 7843
  • 06:37 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Not working well
  • 06:37 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1143.eqiad.wmnet with reason: Not working well
  • 03:42 oblivian@cumin1001: dbctl commit (dc=all): 'depool db1143, lagging', diff saved to https://phabricator.wikimedia.org/P35485 and previous config saved to /var/cache/conftool/dbconfig/20221014-034223-oblivian.json
  • 02:24 tstarling@deploy1002: Synchronized wmf-config: clean up deleted file (duration: 03m 46s)
  • 02:11 ryankemper: T300943 Decom of elastic20[25-36] complete. Merged https://gerrit.wikimedia.org/r/c/operations/puppet/+/842547. This is done
  • 02:11 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[2034,2036].codfw.wmnet
  • 02:10 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 02:05 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 02:01 ryankemper: T300943 Final batch of decom'ing `elastic20[25-36]` => already decommissioned rows A/B/C; starting final row D (corresponding to `203[4,6]`)
  • 01:59 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic[2034,2036].codfw.wmnet
  • 01:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[2031-2033].codfw.wmnet
  • 01:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:40 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 01:32 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic[2031-2033].codfw.wmnet
  • 01:26 tstarling@deploy1002: Synchronized wmf-config/CommonSettings.php: (no justification provided) (duration: 03m 36s)
  • 01:20 tstarling@deploy1002: Synchronized wmf-config/UcfirstOverrides.php: for T292552, should have no effect at this stage (duration: 03m 46s)
  • 01:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[2028-2030]
  • 01:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:13 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 00:50 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic[2028-2030]
  • 00:49 ryankemper: [Elastic] `ryankemper@elastic1083:~$ sudo systemctl restart elasticsearch_7*` to clear `CirrusSearchJVMGCYoungPoolInsufficient`
  • 00:48 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts elastic[2025-2027]
  • 00:48 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:45 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=elastic2026.codfw.wmnet
  • 00:44 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 00:43 ryankemper@puppetmaster1001: conftool action : set/weight=10:pooled=no; selector: name=elastic2025*
  • 00:36 ryankemper: T300943 Decom'ing elastic20[25-36]. Decommissioning in batches by row, starting with row A (2025-27)
  • 00:29 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts elastic[2025-2027]

2022-10-13

  • 22:02 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:35 ebernhardson@deploy1002: Finished deploy [wdqs/wdqs@b5b51fa]: 0.3.117 and adding eu knowledge graph to whitelist (duration: 12m 02s)
  • 20:33 TheresNoTime: close UTC late backport window
  • 20:31 samtar@deploy1002: Finished scap: Backport for testcommonswiki: Add editcontentmodel to interface-admin (T320752) (duration: 05m 24s)
  • 20:26 samtar@deploy1002: samtar and lucaswerkmeister: Backport for testcommonswiki: Add editcontentmodel to interface-admin (T320752) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:25 samtar@deploy1002: Started scap: Backport for testcommonswiki: Add editcontentmodel to interface-admin (T320752)
  • 20:23 samtar@deploy1002: Finished scap: Backport for commonswiki: add editcontentmodel right to interface-admin group (T320752) (duration: 05m 03s)
  • 20:22 ebernhardson@deploy1002: Started deploy [wdqs/wdqs@b5b51fa]: 0.3.117 and adding eu knowledge graph to whitelist
  • 20:19 samtar@deploy1002: samtar and nn1l2: Backport for commonswiki: add editcontentmodel right to interface-admin group (T320752) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:18 samtar@deploy1002: Started scap: Backport for commonswiki: add editcontentmodel right to interface-admin group (T320752)
  • 20:13 samtar@deploy1002: Finished scap: Backport for cirrus: Drop client side connect timeout config (T143553), cirrus: remove cross-dc poolcounter increases (duration: 05m 31s)
  • 20:08 samtar@deploy1002: samtar and ebernhardson: Backport for cirrus: Drop client side connect timeout config (T143553), cirrus: remove cross-dc poolcounter increases synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:08 samtar@deploy1002: Started scap: Backport for cirrus: Drop client side connect timeout config (T143553), cirrus: remove cross-dc poolcounter increases
  • 19:38 mutante: rsyncing /srv/repos from phab1001 to 3 other phab servers (with bw limit) - T313360
  • 18:08 dduvall@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.5 refs T314194
  • 17:12 dduvall: disabling puppet on gitlab-runner1002 to debug jwt auth failure
  • 17:11 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:10 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:10 sukhe: disable Puppet and stop Pybal on lvs1017: T286881
  • 17:10 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:50 sukhe: disable Puppet and stop Pybal on lvs1020: T286881
  • 16:26 moritzm: draining ganeti1008 T320419
  • 16:17 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:59 topranks: Adjusting MTU on link from lsw1-e3-eqiad to lsw1-f1-eqiad (drained in advance)
  • 15:29 vgutierrez: partitioning the ATS cache in cp[2027-2028], cp[1075-1076], cp5007, cp[3050-3051] - T317748
  • 15:24 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:23 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:19 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:19 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:03 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 15:03 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:59 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 14:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=varnish-fe
  • 14:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=ats-tls
  • 14:36 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4045.ulsfo.wmnet,service=ats-be
  • 14:36 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4045.ulsfo.wmnet,service=varnish-fe
  • 14:36 sukhe@puppetmaster1001: conftool action : set/weight=1; selector: name=cp4045.ulsfo.wmnet,service=ats-tls
  • 14:36 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: name=cp4045.ulsfo.wmnet,service=ats-be
  • 14:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T318950)', diff saved to https://phabricator.wikimedia.org/P35482 and previous config saved to /var/cache/conftool/dbconfig/20221013-142730-ladsgroup.json
  • 14:26 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P35481 and previous config saved to /var/cache/conftool/dbconfig/20221013-141224-ladsgroup.json
  • 14:06 sukhe: running puppet/utils/pcc_update_facts.py to update nodes
  • 14:06 urbanecm@deploy1002: backport aborted: (duration: 00m 04s)
  • 14:04 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:04 lucaswerkmeister-wmde@deploy1002: Finished scap: Backport for Revert "GrowthExperiments: enable the Vue version of the mentee overview in all wikis" (duration: 05m 49s)
  • 13:59 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and trainbranchbot: Backport for Revert "GrowthExperiments: enable the Vue version of the mentee overview in all wikis" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:58 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for Revert "GrowthExperiments: enable the Vue version of the mentee overview in all wikis"
  • 13:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P35480 and previous config saved to /var/cache/conftool/dbconfig/20221013-135718-ladsgroup.json
  • 13:56 lucaswerkmeister-wmde@deploy1002: Sync cancelled.
  • 13:53 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 13:52 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 13:47 lucaswerkmeister-wmde@deploy1002: lucaswerkmeister-wmde and sgimeno: Backport for GrowthExperiments: enable the Vue version of the mentee overview in all wikis (T300532) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:47 lucaswerkmeister-wmde@deploy1002: Started scap: Backport for GrowthExperiments: enable the Vue version of the mentee overview in all wikis (T300532)
  • 13:45 mlitn@deploy1002: Finished scap: Backport for Commons files can have thumbnails too (duration: 04m 53s)
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T318950)', diff saved to https://phabricator.wikimedia.org/P35479 and previous config saved to /var/cache/conftool/dbconfig/20221013-134211-ladsgroup.json
  • 13:41 mlitn@deploy1002: mlitn and mlitn: Backport for Commons files can have thumbnails too synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:40 mlitn@deploy1002: Started scap: Backport for Commons files can have thumbnails too
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2150 (T318950)', diff saved to https://phabricator.wikimedia.org/P35478 and previous config saved to /var/cache/conftool/dbconfig/20221013-133953-ladsgroup.json
  • 13:39 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T318950)', diff saved to https://phabricator.wikimedia.org/P35477 and previous config saved to /var/cache/conftool/dbconfig/20221013-133931-ladsgroup.json
  • 13:27 mlitn@deploy1002: Finished scap: Backport for Commons files can have thumbnails too (duration: 05m 15s)
  • 13:24 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 13:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P35473 and previous config saved to /var/cache/conftool/dbconfig/20221013-132425-ladsgroup.json
  • 13:24 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:22 mlitn@deploy1002: mlitn and mlitn: Backport for Commons files can have thumbnails too synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:22 mlitn@deploy1002: Started scap: Backport for Commons files can have thumbnails too
  • 13:16 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:16 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 13:15 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:15 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P35472 and previous config saved to /var/cache/conftool/dbconfig/20221013-130918-ladsgroup.json
  • 12:57 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:56 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T318950)', diff saved to https://phabricator.wikimedia.org/P35471 and previous config saved to /var/cache/conftool/dbconfig/20221013-125412-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2122 (T318950)', diff saved to https://phabricator.wikimedia.org/P35470 and previous config saved to /var/cache/conftool/dbconfig/20221013-125154-ladsgroup.json
  • 12:51 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 12:51 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 12:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T318950)', diff saved to https://phabricator.wikimedia.org/P35469 and previous config saved to /var/cache/conftool/dbconfig/20221013-125133-ladsgroup.json
  • 12:45 jbond@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync data - jbond@cumin1001"
  • 12:43 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
  • 12:37 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
  • 12:37 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
  • 12:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P35468 and previous config saved to /var/cache/conftool/dbconfig/20221013-123626-ladsgroup.json
  • 12:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P35467 and previous config saved to /var/cache/conftool/dbconfig/20221013-122120-ladsgroup.json
  • 12:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T318950)', diff saved to https://phabricator.wikimedia.org/P35466 and previous config saved to /var/cache/conftool/dbconfig/20221013-120613-ladsgroup.json
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T318950)', diff saved to https://phabricator.wikimedia.org/P35465 and previous config saved to /var/cache/conftool/dbconfig/20221013-120356-ladsgroup.json
  • 12:03 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 12:03 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 12:03 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120 (T318950)', diff saved to https://phabricator.wikimedia.org/P35464 and previous config saved to /var/cache/conftool/dbconfig/20221013-120334-ladsgroup.json
  • 12:02 moritzm: restarting FPM/Apache on mediawiki canaries to pick up new curl
  • 11:58 moritzm: installing curl security updates on buster
  • 11:48 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P35462 and previous config saved to /var/cache/conftool/dbconfig/20221013-114827-ladsgroup.json
  • 11:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubetcd1005.eqiad.wmnet to plain
  • 11:46 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubetcd1005.eqiad.wmnet to plain
  • 11:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubetcd1005.eqiad.wmnet to drbd
  • 11:37 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels (duration: 04m 41s)
  • 11:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2120', diff saved to https://phabricator.wikimedia.org/P35461 and previous config saved to /var/cache/conftool/dbconfig/20221013-113320-ladsgroup.json
  • 11:33 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels
  • 11:31 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels (duration: 03m 12s)
  • 11:30 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubetcd1005.eqiad.wmnet to drbd
  • 11:28 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels
  • 11:28 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels (duration: 03m 34s)
  • 11:27 Lucas_WMDE: 11:18 ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2120 (T318950)', diff saved to https://phabricator.wikimedia.org/P35460 and previous config saved to /var/cache/conftool/dbconfig/20221013-111814-ladsgroup.json
  • 11:27 Lucas_WMDE: 11:15 ladsgroup@cumin1001 dbctl commit (dc=all): 'Depooling db2120 (T318950)', diff saved to https://phabricator.wikimedia.org/P35458 and previous config saved to /var/cache/conftool/dbconfig/20221013-111556-ladsgroup.json
  • 11:27 Lucas_WMDE: 11:15 ladsgroup@cumin1001 END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 11:27 Lucas_WMDE: 11:15 ladsgroup@cumin1001 START - Cookbook sre.hosts.downtime for 6:00:00 on db2120.codfw.wmnet with reason: Maintenance
  • 11:27 Lucas_WMDE: 11:15 ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2108 (T318950)', diff saved to https://phabricator.wikimedia.org/P35457 and previous config saved to /var/cache/conftool/dbconfig/20221013-111534-ladsgroup.json
  • 11:26 Lucas_WMDE: repeating five messages that got missed due to stashbot quit
  • 11:24 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels
  • 11:24 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels (duration: 18m 30s)
  • 11:06 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext and updated wheels
  • 11:00 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P35455 and previous config saved to /var/cache/conftool/dbconfig/20221013-110028-ladsgroup.json
  • 10:45 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108', diff saved to https://phabricator.wikimedia.org/P35454 and previous config saved to /var/cache/conftool/dbconfig/20221013-104521-ladsgroup.json
  • 10:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2108 (T318950)', diff saved to https://phabricator.wikimedia.org/P35453 and previous config saved to /var/cache/conftool/dbconfig/20221013-103015-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2108 (T318950)', diff saved to https://phabricator.wikimedia.org/P35452 and previous config saved to /var/cache/conftool/dbconfig/20221013-102757-ladsgroup.json
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2108.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2100.codfw.wmnet with reason: Maintenance
  • 10:27 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 10:26 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 10:16 moritzm: draining ganeti1008 T320419
  • 09:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1017.eqiad.wmnet to cluster eqiad and group B
  • 09:33 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1017.eqiad.wmnet to cluster eqiad and group B
  • 09:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
  • 09:28 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:28 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:25 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
  • 09:18 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:16 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:15 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:15 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:15 ayounsi@deploy1002: Finished deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext (duration: 02m 04s)
  • 09:14 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:14 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 09:13 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 09:13 ayounsi@deploy1002: Started deploy [netbox/deploy@7bbf659]: Deploy ntc-netbox-plugin-metrics-ext
  • 09:11 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1017.eqiad.wmnet with OS bullseye
  • 08:48 vgutierrez: partitioning the ATS cache in cp[2029-2030], cp[6001,6009], cp[1077-1078], cp[5002,5008], cp[3052-3053], cp4022 - T317748
  • 08:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1017.eqiad.wmnet with reason: host reimage
  • 08:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1017.eqiad.wmnet with reason: host reimage
  • 08:28 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:28 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:27 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:16 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1017.eqiad.wmnet with OS bullseye
  • 08:13 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS buster
  • 08:13 oblivian@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:12 oblivian@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:11 oblivian@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1017.eqiad.wmnet with reason: Remove from cluster for reimage
  • 08:04 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1017.eqiad.wmnet with reason: Remove from cluster for reimage
  • 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1007.eqiad.wmnet with reason: Remove from cluster for eventual decom
  • 08:02 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1007.eqiad.wmnet with reason: Remove from cluster for eventual decom
  • 07:50 matthiasmullie: UTC morning backports done
  • 07:37 mlitn@deploy1002: Finished scap: Backport for Enable NS_MAIN thumbnails only on wikipedias (T320510) (duration: 08m 24s)
  • 07:29 mlitn@deploy1002: mlitn and mlitn: Backport for Enable NS_MAIN thumbnails only on wikipedias (T320510) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:29 mlitn@deploy1002: Started scap: Backport for Enable NS_MAIN thumbnails only on wikipedias (T320510)
  • 00:58 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4045.ulsfo.wmnet with OS buster
  • 00:38 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 00:34 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 00:15 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster

2022-10-12

  • 21:06 cwhite: clean up old db backups on grafana2001
  • 20:41 TheresNoTime: closing UTC late backport window
  • 20:40 samtar@deploy1002: Finished scap: Backport for yiwiktionary: Adjust width-height ratio of logo to fix display issue (T310961) (duration: 05m 17s)
  • 20:35 samtar@deploy1002: samtar and stang: Backport for yiwiktionary: Adjust width-height ratio of logo to fix display issue (T310961) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:35 samtar@deploy1002: Started scap: Backport for yiwiktionary: Adjust width-height ratio of logo to fix display issue (T310961)
  • 20:21 samtar@deploy1002: Finished scap: Backport for Drop unused wordmark/tagline (T307705), Re-download and optimize wordmark/tagline svg file (T307705) (duration: 04m 53s)
  • 20:16 samtar@deploy1002: samtar and stang: Backport for Drop unused wordmark/tagline (T307705), Re-download and optimize wordmark/tagline svg file (T307705) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:16 samtar@deploy1002: Started scap: Backport for Drop unused wordmark/tagline (T307705), Re-download and optimize wordmark/tagline svg file (T307705)
  • 20:14 samtar@deploy1002: Finished scap: Backport for Set $wgSitename for bnwikiquote (T319183) (duration: 04m 40s)
  • 20:10 samtar@deploy1002: samtar and zabe: Backport for Set $wgSitename for bnwikiquote (T319183) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 20:09 samtar@deploy1002: Started scap: Backport for Set $wgSitename for bnwikiquote (T319183)
  • 20:08 samtar@deploy1002: Finished scap: Backport for Register the editattempt_block schema (T310390) (duration: 05m 42s)
  • 20:03 samtar@deploy1002: samtar and kemayo: Backport for Register the editattempt_block schema (T310390) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 20:02 samtar@deploy1002: Started scap: Backport for Register the editattempt_block schema (T310390)
  • 19:29 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS buster
  • 19:16 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 19:15 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4045.ulsfo.wmnet with OS buster
  • 18:41 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 18:40 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS buster
  • 18:26 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 18:12 dduvall@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.5 refs T314194 (duration: 03m 38s)
  • 18:09 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.5 refs T314194
  • 18:03 dduvall@deploy1002: deploy-promote aborted: (duration: 00m 07s)
  • 17:07 dduvall@deploy1002: helmfile [eqiad] DONE helmfile.d/services/blubberoid: apply
  • 17:07 dduvall@deploy1002: helmfile [eqiad] START helmfile.d/services/blubberoid: apply
  • 17:06 dduvall@deploy1002: helmfile [codfw] DONE helmfile.d/services/blubberoid: apply
  • 17:06 dduvall@deploy1002: helmfile [codfw] START helmfile.d/services/blubberoid: apply
  • 17:02 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 17:00 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 17:00 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:55 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 16:55 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 16:55 dduvall@deploy1002: helmfile [staging] DONE helmfile.d/services/blubberoid: apply
  • 16:55 dduvall@deploy1002: helmfile [staging] START helmfile.d/services/blubberoid: apply
  • 16:19 volans@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:46 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cp4045.ulsfo.wmnet with OS buster
  • 15:45 vgutierrez: partitioning the ATS cache in cp[2031-2032], cp[6002,6010], cp[1079-1080], cp[5003,5009], cp[3054-3055], cp[4023,4032] - T317748
  • 15:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318955)', diff saved to https://phabricator.wikimedia.org/P35445 and previous config saved to /var/cache/conftool/dbconfig/20221012-154230-ladsgroup.json
  • 15:37 urbanecm@deploy1002: Finished scap: Backport for eswiki: Deploy mentorship to only 15% of users (T285235) (duration: 04m 23s)
  • 15:33 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams-internal,name=eqiad
  • 15:33 urbanecm@deploy1002: urbanecm and urbanecm: Backport for eswiki: Deploy mentorship to only 15% of users (T285235) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 15:33 urbanecm@deploy1002: Started scap: Backport for eswiki: Deploy mentorship to only 15% of users (T285235)
  • 15:31 hnowlan@deploy1002: Finished deploy [restbase/deploy@2d002b3]: Add ig,bcl,bn,tl wikiquote, ig wiktionary T314641 (duration: 16m 02s)
  • 15:30 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:30 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 15:28 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 15:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P35444 and previous config saved to /var/cache/conftool/dbconfig/20221012-152724-ladsgroup.json
  • 15:26 ottomata: remove materialized .json files from schemas/event/primary - this should be a no-op as no clients should actually be using the json files. - T315674
  • 15:25 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4045.ulsfo.wmnet with reason: host reimage
  • 15:24 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:24 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 15:23 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:23 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 15:23 claime: redeploying eventstreams-internal eqiad - T310721
  • 15:16 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventstreams-internal,name=eqiad
  • 15:16 claime: depooling eventstreams-internal eqiad - T310721
  • 15:15 hnowlan@deploy1002: Started deploy [restbase/deploy@2d002b3]: Add ig,bcl,bn,tl wikiquote, ig wiktionary T314641
  • 15:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P35443 and previous config saved to /var/cache/conftool/dbconfig/20221012-151217-ladsgroup.json
  • 15:09 claime: repooled eventstreams-internal codfw - T310721
  • 15:09 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams-internal,name=codfw
  • 15:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 15:07 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 15:07 claime: redeploying eventstreams-internal codfw - T310721
  • 15:03 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 14:57 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventstreams-internal,name=codfw
  • 14:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T318955)', diff saved to https://phabricator.wikimedia.org/P35442 and previous config saved to /var/cache/conftool/dbconfig/20221012-145711-ladsgroup.json
  • 14:57 claime: depooling eventstreams-internal codfw - T310721
  • 14:56 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS buster
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2178 (T318955)', diff saved to https://phabricator.wikimedia.org/P35441 and previous config saved to /var/cache/conftool/dbconfig/20221012-145445-ladsgroup.json
  • 14:54 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 14:54 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35440 and previous config saved to /var/cache/conftool/dbconfig/20221012-145423-ladsgroup.json
  • 14:39 oblivian@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:39 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P35439 and previous config saved to /var/cache/conftool/dbconfig/20221012-143917-ladsgroup.json
  • 14:39 oblivian@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 14:35 ladsgroup@deploy1002: Finished scap: Backport for Revert "rdbms: Instead of reconfiguring all of LB, just remove depooled db" (duration: 04m 37s)
  • 14:31 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for Revert "rdbms: Instead of reconfiguring all of LB, just remove depooled db" synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 14:30 ladsgroup@deploy1002: Started scap: Backport for Revert "rdbms: Instead of reconfiguring all of LB, just remove depooled db"
  • 14:24 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315', diff saved to https://phabricator.wikimedia.org/P35438 and previous config saved to /var/cache/conftool/dbconfig/20221012-142410-ladsgroup.json
  • 14:19 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 14:18 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4045.ulsfo.wmnet with OS buster
  • 14:09 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2171:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35436 and previous config saved to /var/cache/conftool/dbconfig/20221012-140903-ladsgroup.json
  • 14:08 ladsgroup@deploy1002: Sync cancelled.
  • 14:07 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repool db1175', diff saved to https://phabricator.wikimedia.org/P35435 and previous config saved to /var/cache/conftool/dbconfig/20221012-140746-ladsgroup.json
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depool db1175', diff saved to https://phabricator.wikimedia.org/P35434 and previous config saved to /var/cache/conftool/dbconfig/20221012-140626-ladsgroup.json
  • 14:04 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 13:53 ladsgroup@deploy1002: ladsgroup and ladsgroup: Backport for rdbms: Instead of reconfiguring all of LB, just remove depooled db (T298485) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 13:53 ladsgroup@deploy1002: Started scap: Backport for rdbms: Instead of reconfiguring all of LB, just remove depooled db (T298485)
  • 13:49 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 13:47 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 13:43 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2171:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35433 and previous config saved to /var/cache/conftool/dbconfig/20221012-134306-ladsgroup.json
  • 13:43 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 13:42 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 13:42 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T318955)', diff saved to https://phabricator.wikimedia.org/P35432 and previous config saved to /var/cache/conftool/dbconfig/20221012-134245-ladsgroup.json
  • 13:27 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P35431 and previous config saved to /var/cache/conftool/dbconfig/20221012-132738-ladsgroup.json
  • 13:27 urbanecm@deploy1002: Finished scap: Backport for Remove Research Incentive survey from eswiki (T318331) (duration: 05m 21s)
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts d-i-test.eqiad.wmnet
  • 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:22 urbanecm@deploy1002: urbanecm and dani: Backport for Remove Research Incentive survey from eswiki (T318331) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 13:21 urbanecm@deploy1002: Started scap: Backport for Remove Research Incentive survey from eswiki (T318331)
  • 13:21 urbanecm@deploy1002: Finished scap: Backport for Move wmgSiteLogoWordmark and wmgSiteLogoTagline to logos.php (T307705) (duration: 07m 06s)
  • 13:18 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:14 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts d-i-test.eqiad.wmnet
  • 13:14 urbanecm@deploy1002: urbanecm and stang: Backport for Move wmgSiteLogoWordmark and wmgSiteLogoTagline to logos.php (T307705) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:14 urbanecm@deploy1002: Started scap: Backport for Move wmgSiteLogoWordmark and wmgSiteLogoTagline to logos.php (T307705)
  • 13:13 urbanecm@deploy1002: Finished scap: Backport for Enable show nearby feature on a small group of wikis (T316782) (duration: 07m 03s)
  • 13:12 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P35430 and previous config saved to /var/cache/conftool/dbconfig/20221012-131232-ladsgroup.json
  • 13:09 moritzm: draining ganeti1007 T320419
  • 13:06 urbanecm@deploy1002: urbanecm and wmde-fisch: Backport for Enable show nearby feature on a small group of wikis (T316782) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet
  • 13:06 urbanecm@deploy1002: Started scap: Backport for Enable show nearby feature on a small group of wikis (T316782)
  • 13:05 urbanecm@deploy1002: backport aborted: (duration: 00m 09s)
  • 13:04 urbanecm@deploy1002: Backport cancelled.
  • 12:57 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T318955)', diff saved to https://phabricator.wikimedia.org/P35429 and previous config saved to /var/cache/conftool/dbconfig/20221012-125725-ladsgroup.json
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2157 (T318955)', diff saved to https://phabricator.wikimedia.org/P35428 and previous config saved to /var/cache/conftool/dbconfig/20221012-123223-ladsgroup.json
  • 12:32 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 12:32 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 12:32 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35427 and previous config saved to /var/cache/conftool/dbconfig/20221012-123201-ladsgroup.json
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 12:25 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 12:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P35426 and previous config saved to /var/cache/conftool/dbconfig/20221012-121655-ladsgroup.json
  • 12:12 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS buster
  • 12:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315', diff saved to https://phabricator.wikimedia.org/P35425 and previous config saved to /var/cache/conftool/dbconfig/20221012-120148-ladsgroup.json
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1005.eqiad.wmnet with reason: Remove from cluster for eventual decom
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1005.eqiad.wmnet with reason: Remove from cluster for eventual decom
  • 11:51 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams,name=eqiad
  • 11:50 claime: repooling eventstreams in eqiad - T310721
  • 11:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2137:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35424 and previous config saved to /var/cache/conftool/dbconfig/20221012-114642-ladsgroup.json
  • 11:46 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 11:45 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 11:44 claime: redeploying eventstreams eqiad - T310721
  • 11:24 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventstreams,name=eqiad
  • 11:24 claime: depooling eventstreams in eqiad - T310721
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2137:3315 (T318955)', diff saved to https://phabricator.wikimedia.org/P35423 and previous config saved to /var/cache/conftool/dbconfig/20221012-112146-ladsgroup.json
  • 11:21 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 11:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T318955)', diff saved to https://phabricator.wikimedia.org/P35422 and previous config saved to /var/cache/conftool/dbconfig/20221012-112124-ladsgroup.json
  • 11:11 moritzm: installing bind9 security updates on buster (client side tools/libs)
  • 11:07 jgiannelos@deploy1002: Finished deploy [restbase/deploy@0474832]: Update restbase to 1a02cdfb (duration: 25m 48s)
  • 11:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P35421 and previous config saved to /var/cache/conftool/dbconfig/20221012-110617-ladsgroup.json
  • 11:02 claime: repooled eventstreams in codfw - T310721
  • 11:01 cgoubert@cumin1001: conftool action : set/pooled=true; selector: dnsdisc=eventstreams,name=codfw
  • 10:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 10:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 10:57 claime: redeploying eventstreams codfw - T310721
  • 10:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:51 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P35420 and previous config saved to /var/cache/conftool/dbconfig/20221012-105111-ladsgroup.json
  • 10:49 moritzm: installing dbus security updates
  • 10:41 jgiannelos@deploy1002: Started deploy [restbase/deploy@0474832]: Update restbase to 1a02cdfb
  • 10:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T318955)', diff saved to https://phabricator.wikimedia.org/P35419 and previous config saved to /var/cache/conftool/dbconfig/20221012-103604-ladsgroup.json
  • 10:35 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 10:33 cgoubert@cumin1001: conftool action : set/pooled=false; selector: dnsdisc=eventstreams,name=codfw
  • 10:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2128 (T318955)', diff saved to https://phabricator.wikimedia.org/P35418 and previous config saved to /var/cache/conftool/dbconfig/20221012-103338-ladsgroup.json
  • 10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 10:33 claime: depooling eventstreams in codfw - T310721
  • 10:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 16:00:00 on db2094.codfw.wmnet with reason: Maintenance
  • 10:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 10:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 10:22 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 20115
  • 10:20 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 20115
  • 10:01 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 10:01 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 09:26 moritzm: draining ganeti1017 T311687
  • 09:25 urbanecm@deploy1002: Finished scap: Backport for Replace wordmark/tagline with correct naming style (T307705) (duration: 04m 20s)
  • 09:21 urbanecm@deploy1002: urbanecm and stang: Backport for Replace wordmark/tagline with correct naming style (T307705) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 09:21 urbanecm@deploy1002: Started scap: Backport for Replace wordmark/tagline with correct naming style (T307705)
  • 09:12 jayme: re-enabled puppet on all kubernetes masters (incl. ml & dse)
  • 09:11 urbanecm@deploy1002: Finished scap: Backport for SVG resources: Run svgo (T320447) (duration: 04m 38s)
  • 09:07 urbanecm@deploy1002: urbanecm and urbanecm: Backport for SVG resources: Run svgo (T320447) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 09:07 urbanecm@deploy1002: Started scap: Backport for SVG resources: Run svgo (T320447)
  • 09:05 jayme: disabling puppet on all kubernetes masters (incl. ml & dse)
  • 08:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to plain
  • 08:58 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to plain
  • 08:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagetcd1004.eqiad.wmnet to plain
  • 08:53 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagetcd1004.eqiad.wmnet to plain
  • 08:52 vgutierrez: partitioning the ATS cache in cp[2033-2034], cp[6003,6011], cp[1081-1082], cp[5004,5010], cp[3056-3057], cp[4024,4028] - T317748
  • 08:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to drbd
  • 08:33 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to drbd
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagetcd1004.eqiad.wmnet to drbd
  • 08:18 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagetcd1004.eqiad.wmnet to drbd
  • 08:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of testvm2001.codfw.wmnet to drbd
  • 08:07 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of testvm2001.codfw.wmnet to drbd
  • 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of testvm2001.codfw.wmnet to plain
  • 08:01 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of testvm2001.codfw.wmnet to plain
  • 07:25 matthiasmullie: UTC morning backports done
  • 07:25 mlitn@deploy1002: Finished scap: Backport for Rescale images based on width alone (T320406) (duration: 05m 19s)
  • 07:20 mlitn@deploy1002: mlitn and mlitn: Backport for Rescale images based on width alone (T320406) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:19 mlitn@deploy1002: Started scap: Backport for Rescale images based on width alone (T320406)
  • 06:17 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 4826
  • 06:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 4826

2022-10-11

  • 21:36 mutante: phab1001 / phab2001 - temp. disabled puppet; stopped ssh-phab service; scheduled icinga downtimes for ssh-phab pybal backend alerts - effectively "soft shutting down" the service - T296022
  • 21:22 mutante: phab2001 - systemctl stop ssh-phab; temp disable puppet
  • 21:12 mutante: puppetmaster1001: rm .*.err in /var/run/confd-template
  • 21:10 mutante: puppetmaster2001: rm .*.err in /var/run/confd-template
  • 20:56 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 20:35 dzahn@cumin2002: conftool action : set/pooled=yes; selector: dc=eqiad,name=phab1001-vcs.eqiad.wmnet
  • 20:35 dzahn@cumin2002: conftool action : set/pooled=inactive; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 20:27 mutante: depooling git-ssh service backends with confctl - T296022
  • 20:26 TheresNoTime: close UTC late backport window
  • 20:26 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=codfw,name=phab2001-vcs.codfw.wmnet
  • 20:25 mutante: depooling git-ssh service backends - checking if monitoring will alert
  • 20:25 dzahn@cumin2002: conftool action : set/pooled=no; selector: dc=eqiad,name=phab1001-vcs.eqiad.wmnet
  • 20:11 samtar@deploy1002: Finished scap: Backport for Undeploy the GDI wave 3 survey from PROD (T320495) (duration: 06m 29s)
  • 20:05 samtar@deploy1002: samtar and essexigyan: Backport for Undeploy the GDI wave 3 survey from PROD (T320495) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:05 samtar@deploy1002: Started scap: Backport for Undeploy the GDI wave 3 survey from PROD (T320495)
  • 19:46 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:39 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:16 dcausse: restarting blazegraph on wdqs1013 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 18:13 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T318959)', diff saved to https://phabricator.wikimedia.org/P35414 and previous config saved to /var/cache/conftool/dbconfig/20221011-181348-ladsgroup.json
  • 18:11 XioNoX: re-enable cr1-eqiad<->asw2-d-eqiad link for re-cabling - T313463
  • 18:09 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ganeti4001.ulsfo.wmnet
  • 18:09 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:07 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.5 refs T314194
  • 18:07 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 18:05 ejegg: updated fundraising python tools from 14d60435 to 4c143d97
  • 18:01 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4001.ulsfo.wmnet
  • 18:00 sukhe: sudo gnt-node remove ganeti4001.ulsfo.wmnet
  • 17:58 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106', diff saved to https://phabricator.wikimedia.org/P35413 and previous config saved to /var/cache/conftool/dbconfig/20221011-175842-ladsgroup.json
  • 17:46 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T318955)', diff saved to https://phabricator.wikimedia.org/P35412 and previous config saved to /var/cache/conftool/dbconfig/20221011-174641-ladsgroup.json
  • 17:37 sukhe: completed homer run for "cr*-ulsfo*" commit 841533
  • 17:35 sukhe: running homer "cr*-ulsfo*" commit "Gerrit 841533: sites.yaml: add dns4004 to anycast_neighbors"
  • 17:31 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P35411 and previous config saved to /var/cache/conftool/dbconfig/20221011-173134-ladsgroup.json
  • 17:28 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db1106 (T318959)', diff saved to https://phabricator.wikimedia.org/P35410 and previous config saved to /var/cache/conftool/dbconfig/20221011-172822-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db1106 (T318959)', diff saved to https://phabricator.wikimedia.org/P35409 and previous config saved to /var/cache/conftool/dbconfig/20221011-172608-ladsgroup.json
  • 17:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 17:25 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 17:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 6:00:00 on db1106.eqiad.wmnet with reason: Maintenance
  • 17:16 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P35408 and previous config saved to /var/cache/conftool/dbconfig/20221011-171627-ladsgroup.json
  • 17:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:01 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T318955)', diff saved to https://phabricator.wikimedia.org/P35407 and previous config saved to /var/cache/conftool/dbconfig/20221011-170121-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2123 (T318955)', diff saved to https://phabricator.wikimedia.org/P35406 and previous config saved to /var/cache/conftool/dbconfig/20221011-165955-ladsgroup.json
  • 16:59 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 16:59 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T318955)', diff saved to https://phabricator.wikimedia.org/P35405 and previous config saved to /var/cache/conftool/dbconfig/20221011-165933-ladsgroup.json
  • 16:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:55 bking@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on elastic2052.codfw.wmnet with reason: T320482
  • 16:54 sukhe: depool and reboot doh1001
  • 16:54 bking@cumin1001: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on elastic2052.codfw.wmnet with reason: T320482
  • 16:44 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P35404 and previous config saved to /var/cache/conftool/dbconfig/20221011-164427-ladsgroup.json
  • 16:40 rzl: gitlab-runner[1002-1004,2002-2004] - systemctl restart buildkitd - T317997
  • 16:29 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111', diff saved to https://phabricator.wikimedia.org/P35403 and previous config saved to /var/cache/conftool/dbconfig/20221011-162920-ladsgroup.json
  • 16:26 dduvall@deploy1002: Pruned MediaWiki: 1.40.0-wmf.3 (duration: 02m 00s)
  • 16:23 dduvall@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.5 refs T314194 (duration: 33m 55s)
  • 16:23 volans@cumin2002: conftool action : set/pooled=no; selector: name=elastic2052..*
  • 16:16 ebernhardson: depool elastic2052. failing to join cluster due to `PROBLEM - MD RAID on elastic2052 is CRITICAL: CRITICAL: State: degraded, Active: 5, Working: 5, Failed: 1, Spare: 0`
  • 16:14 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2111 (T318955)', diff saved to https://phabricator.wikimedia.org/P35402 and previous config saved to /var/cache/conftool/dbconfig/20221011-161414-ladsgroup.json
  • 16:00 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:50 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:50 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:50 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dns4004.wikimedia.org with OS buster
  • 15:50 dduvall@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.5 refs T314194
  • 15:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:49 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2111 (T318955)', diff saved to https://phabricator.wikimedia.org/P35401 and previous config saved to /var/cache/conftool/dbconfig/20221011-154934-ladsgroup.json
  • 15:49 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 15:49 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2111.codfw.wmnet with reason: Maintenance
  • 15:44 ottomata: remove materialized .json files from schemas/event/secondary - this should be a no-op as no clients should actually be using the json files. - T315674
  • 15:38 sukhe: sudo gnt-node evacuate -s ganeti4001.ulsfo.wmnet
  • 15:35 sukhe: sudo gnt-node migrate -f ganeti4001.ulsfo.wmnet
  • 15:34 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:33 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:33 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:32 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:31 TheresNoTime: deployed beta cluster only change, gerrit:841547, for T314294
  • 15:26 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 15:26 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:25 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2101.codfw.wmnet with reason: Maintenance
  • 15:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
  • 15:23 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4004.wikimedia.org with reason: host reimage
  • 15:17 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:16 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:10 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:09 XioNoX: disable cr1-eqiad<->asw2-d-eqiad link for re-cabling - T313463
  • 15:05 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4004.wikimedia.org with OS buster
  • 15:04 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
  • 15:01 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:58 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 14:56 XioNoX: re-enable cr1-eqiad<->asw2-c-eqiad link after optic replacement
  • 14:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
  • 14:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4008.ulsfo.wmnet to cluster ulsfo and group 1
  • 14:50 XioNoX: disable cr1-eqiad<->asw2-c-eqiad link for optic replacement
  • 14:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 14:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 14:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:15 sukhe: completed homer run for Gerrit 841501
  • 14:14 sukhe: homer "cr*-ulsfo*" commit "Gerrit 841501: sites.yaml: decom dns4002"
  • 14:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:01 hoo@deploy1002: Finished scap: Backport for updateQueryServiceLag: Add lb(-pool) options for forward compatibility (T315423 T238751) (duration: 04m 57s)
  • 14:00 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:57 hoo@deploy1002: hoo and hoo: Backport for updateQueryServiceLag: Add lb(-pool) options for forward compatibility (T315423 T238751) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:56 hoo@deploy1002: Started scap: Backport for updateQueryServiceLag: Add lb(-pool) options for forward compatibility (T315423 T238751)
  • 13:50 volans@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 13:19 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 13:18 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:18 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:17 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 13:17 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:17 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:14 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/extensions/Wikistories/extension.json: Backport: Make discovery mode config default to 'off' (T314582) (duration: 03m 48s)
  • 13:14 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 13:13 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:02 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 13:01 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:01 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:00 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:59 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:58 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:46 vgutierrez: partitioning the ATS cache in cp[2035-2036], cp[6004,6012], cp[1083-1084], cp[5005,5011], cp[3058-3059], cp[4025,4029] - T317748
  • 12:39 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 12:05 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T314041)', diff saved to https://phabricator.wikimedia.org/P35397 and previous config saved to /var/cache/conftool/dbconfig/20221011-120514-ladsgroup.json
  • 11:50 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P35396 and previous config saved to /var/cache/conftool/dbconfig/20221011-115007-ladsgroup.json
  • 11:35 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110', diff saved to https://phabricator.wikimedia.org/P35395 and previous config saved to /var/cache/conftool/dbconfig/20221011-113501-ladsgroup.json
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1032.eqiad.wmnet to cluster eqiad and group A
  • 11:26 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1032.eqiad.wmnet to cluster eqiad and group A
  • 11:19 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2110 (T314041)', diff saved to https://phabricator.wikimedia.org/P35394 and previous config saved to /var/cache/conftool/dbconfig/20221011-111954-ladsgroup.json
  • 11:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
  • 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
  • 10:41 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 10:13 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 10:12 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 10:08 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 10:07 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 10:06 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 10:02 volans@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 09:57 volans@cumin2002: START - Cookbook sre.hosts.provision for host lvs4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 09:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on ganeti1006.eqiad.wmnet with reason: Remove from cluster for decom
  • 09:44 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on ganeti1006.eqiad.wmnet with reason: Remove from cluster for decom
  • 08:53 vgutierrez: partitioning the ATS cache in cp1085, cp1086, cp2037, cp2038, cp3060, cp3061, cp4026, cp4030, cp5006, cp5012, cp6005, cp6013 - T317748
  • 08:37 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host ganeti4008.ulsfo.wmnet
  • 07:41 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 07:40 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 07:31 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 07:30 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 07:24 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 07:22 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 07:21 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 07:21 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 07:18 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 07:18 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 07:17 ryankemper: [Elastic] Forcing recheck of elastic settings check alerts; expecting a bit of noise as the alerts resolve (hopefully)
  • 07:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 07:17 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 07:16 ryankemper: [Elastic] Updated cross-cluster remote seeds (masters): `ryankemper@mwmaint1002:~/elastic$ python push_cross_cluster_conf.py https://search.svc.eqiad.wmnet:9[2,4,6]43/_cluster/settings --ccc chi=chi_eqiad_masters.lst psi=psi_eqiad_masters.lst omega=omega_eqiad_masters.lst`
  • 07:15 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 07:12 elukey@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 07:11 elukey@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 07:09 kartik@deploy1002: Finished scap: Backport for ContentTranslation: Make Mongolian Wikipedia MT stricter by 10% (T319156) (duration: 08m 56s)
  • 07:02 kartik@deploy1002: kartik and kartik: Backport for ContentTranslation: Make Mongolian Wikipedia MT stricter by 10% (T319156) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 07:01 kartik@deploy1002: Started scap: Backport for ContentTranslation: Make Mongolian Wikipedia MT stricter by 10% (T319156)
  • 06:54 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:53 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:53 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:52 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:44 elukey: kill leftover process of jmads on stat1005 to allow user cleanup via puppet
  • 06:43 elukey: kill leftover process of nokafor on stat1004 to allow user cleanup via puppet
  • 06:37 elukey: kill leftover process of bmansurov on stat1007 to allow user cleanup via puppet
  • 06:35 XioNoX: delete now unused VC ports on asw2-c4-eqiad - T313384
  • 06:34 elukey: kill leftover process of bmansurov on an-airflow1002 to allow user cleanup via puppet
  • 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:06 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-10-10

  • 21:19 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4004.wikimedia.org with OS bullseye
  • 20:44 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:44 urbanecm@deploy1002: Finished scap: Backport for Resize wordmark and tagline of Bengali Wikibooks (T319320) (duration: 07m 29s)
  • 19:16 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:14 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 19:14 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns4004
  • 19:14 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4004.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 19:13 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns4004
  • 19:07 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4004.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:39 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns4002.wikimedia.org
  • 18:39 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:38 robh@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4008.ulsfo.wmnet with OS bullseye
  • 18:35 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 18:30 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns4002.wikimedia.org
  • 18:21 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
  • 18:18 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4008.ulsfo.wmnet with reason: host reimage
  • 17:20 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:17 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:17 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:09 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:01 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ganeti4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:00 robh@cumin2002: START - Cookbook sre.hosts.provision for host ganeti4008.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 16:58 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:56 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 16:56 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4008
  • 16:55 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4008
  • 16:08 urbanecm@deploy1002: urbanecm and urbanecm: Backport for eswiki: Enable Growth mentorship for 25% of new accounts (T285235) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 16:08 urbanecm@deploy1002: Started scap: Backport for eswiki: Enable Growth mentorship for 25% of new accounts (T285235)
  • 15:34 mforns@deploy1002: Finished deploy [airflow-dags/analytics@60aa96c]: (no justification provided) (duration: 00m 12s)
  • 15:33 mforns@deploy1002: Started deploy [airflow-dags/analytics@60aa96c]: (no justification provided)
  • 14:56 claime: Updating helm3 to 3.9.4-1 on chartmuseum2001.codfw.wmnet,chartmuseum1001.eqiad.wmnet,contint[1001,2001].wikimedia.org,deploy2002.codfw.wmnet,deploy1002.eqiad.wmnet,releases2002.codfw.wmnet,releases1002.eqiad.wmnet
  • 14:06 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T314041)', diff saved to https://phabricator.wikimedia.org/P35389 and previous config saved to /var/cache/conftool/dbconfig/20221010-140635-ladsgroup.json
  • 14:02 Lucas_WMDE: UTC afternoon backport+config window done # likewise
  • 14:01 Lucas_WMDE: 13:51 ladsgroup@cumin1001 dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P35388 and previous config saved to /var/cache/conftool/dbconfig/20221010-135128-ladsgroup.json # re-logging due to stashbot issue
  • 13:36 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P35387 and previous config saved to /var/cache/conftool/dbconfig/20221010-133621-ladsgroup.json
  • 13:31 vgutierrez: partitioning the ATS cache in cp1087, cp1088, cp2039, cp2040, cp3062, cp3063, cp4033, cp4035, cp5013, cp5015, cp6006, cp6014 - T317748
  • 13:23 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:21 moritzm: draining ganeti1006 T320419
  • 13:21 ladsgroup@cumin1001: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T314041)', diff saved to https://phabricator.wikimedia.org/P35386 and previous config saved to /var/cache/conftool/dbconfig/20221010-132115-ladsgroup.json
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:20 samtar@deploy1002: Finished scap: Backport for trwikivoyage: Install WikiLove extension (T319537) (duration: 06m 55s)
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:13 samtar@deploy1002: samtar and stang: Backport for trwikivoyage: Install WikiLove extension (T319537) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:13 samtar@deploy1002: Started scap: Backport for trwikivoyage: Install WikiLove extension (T319537)
  • 13:11 TheresNoTime: [samtar@mwmaint1002 ~]$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php trwikivoyage wikilove
  • 12:35 moritzm: installing puma security updates
  • 12:28 moritzm: installing jetty9 security updates
  • 12:19 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1031.eqiad.wmnet to cluster eqiad and group A
  • 12:18 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1031.eqiad.wmnet to cluster eqiad and group A
  • 12:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
  • 12:14 moritzm: installing ruby-rack security updates
  • 12:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
  • 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1014.eqiad.wmnet to cluster eqiad and group B
  • 11:23 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1014.eqiad.wmnet to cluster eqiad and group B
  • 11:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 11:08 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 11:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 11:01 urbanecm@deploy1002: Finished scap: Backport for Update interwiki cache (duration: 04m 13s)
  • 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
  • 11:00 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 11:00 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:57 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Update interwiki cache synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 10:57 urbanecm@deploy1002: Started scap: Backport for Update interwiki cache
  • 10:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:55 urbanecm@deploy1002: Finished scap: Creating igwiktionary (T314635) (duration: 04m 13s)
  • 10:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
  • 10:51 urbanecm@deploy1002: Started scap: Creating igwiktionary (T314635)
  • 10:49 urbanecm@deploy1002: Finished scap: Creating igwikiquote (T314636) (duration: 04m 24s)
  • 10:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:44 urbanecm@deploy1002: Started scap: Creating igwikiquote (T314636)
  • 10:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:40 urbanecm@deploy1002: Finished scap: Creating bclwikiquote (T316453) (duration: 04m 11s)
  • 10:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:36 urbanecm@deploy1002: Started scap: Creating bclwikiquote (T316453)
  • 10:35 urbanecm@deploy1002: scap failed: FileNotFoundError [Errno 2] Invalid/unavailable version dir: '/srv/mediawiki-staging/php-1.40-0-wmf.4' (duration: 00m 00s)
  • 10:34 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:34 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:32 urbanecm@deploy1002: Finished scap: Creating tlwikiquote (T317107) (duration: 04m 04s)
  • 10:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:28 urbanecm@deploy1002: Started scap: Creating tlwikiquote (T317107)
  • 10:24 urbanecm@deploy1002: Finished scap: Creating bnwikiquote (T319183) (duration: 04m 56s)
  • 10:19 urbanecm@deploy1002: Started scap: Creating bnwikiquote (T319183)
  • 10:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:14 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:14 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:13 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:04 vgutierrez: rolling upgrade to HAProxy 2.4.19 on both text and upload caching clusters
  • 09:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1026.eqiad.wmnet to cluster eqiad and group A
  • 09:43 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1026.eqiad.wmnet to cluster eqiad and group A
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 09:35 claime: Imported helm3 3.9.4-1 to buster-wikimedia and bullseye-wikimedia
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 09:33 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2121 (T314041)', diff saved to https://phabricator.wikimedia.org/P35384 and previous config saved to /var/cache/conftool/dbconfig/20221010-093334-ladsgroup.json
  • 09:33 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:33 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: dbctl commit (dc=all): 'Depooling db2110 (T314041)', diff saved to https://phabricator.wikimedia.org/P35383 and previous config saved to /var/cache/conftool/dbconfig/20221010-093041-ladsgroup.json
  • 09:30 ladsgroup@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 09:30 ladsgroup@cumin1001: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2110.codfw.wmnet with reason: Maintenance
  • 09:26 vgutierrez: partitioning the ATS cache in cp1089, cp1090, cp2041, cp2042, cp3064, cp3065, cp4034, cp4036, cp5014, cp5016, cp6007, cp6015 - T317748
  • 08:28 Emperor: set thanos ring replicas to 3.68 T311690
  • 08:23 jynus: online resizefs of backup1003 bacula partition
  • 08:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1030.eqiad.wmnet to cluster eqiad and group A
  • 08:11 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1030.eqiad.wmnet to cluster eqiad and group A
  • 08:09 jynus: online resizefs of backup2003 bacula partition
  • 08:05 jynus: restarting db2100:s7 to apply new buffer pool config
  • 07:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1030.eqiad.wmnet
  • 07:51 jayme: importes kubernetes 1.23.12 to component/kubernetes123 for buster-wikimedia, bullseye-wikimedia - T307943
  • 07:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 07:45 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 07:43 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 07:43 godog: bounce thanos-compact on thanos-fe2001
  • 07:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
  • 07:39 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 07:37 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 07:35 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 07:34 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 07:31 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 07:26 elukey: kill hanging process for user bmansurov on deploy1002 to allow proper user cleanup
  • 06:58 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bmansurov out of all services on: 1211 hosts
  • 06:58 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bmansurov out of all services on: 1211 hosts
  • 06:56 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Bmansurov out of all services on: 797 hosts
  • 06:56 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Bmansurov out of all services on: 797 hosts
  • 06:06 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 397715
  • 06:06 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 397715

2022-10-08

  • 06:56 hashar: Restarting Gerrit to fix up replicaton to GitHub - T320305

2022-10-07

  • 21:29 dzahn@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: debugging
  • 21:28 dzahn@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: debugging
  • 19:46 sukhe@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts ganeti4004.ulsfo.wmnet
  • 19:46 sukhe@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:42 sukhe@cumin2002: START - Cookbook sre.dns.netbox
  • 19:37 sukhe@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti4004.ulsfo.wmnet
  • 19:07 sukhe: decommission ganeti4004.ulsfo.wmnet: T317249
  • 19:05 sukhe: sudo gnt-node remove ganeti4004.ulsfo.wmnet T317249
  • 17:51 ryankemper: [Elastic] Updated list of cross-cluster remote seeds for all eqiad/codfw elastic clusters; should resolve `ElasticSearch setting check` alerts
  • 17:20 sukhe: sudo gnt-node evacuate -s ganeti4004.ulsfo.wmnet
  • 17:13 sukhe: migrate ganeti4004: T317249
  • 17:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 17:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 17:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:01 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:59 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.4 refs T314193
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:52 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:52 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:51 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:50 brennen@deploy1002: Finished scap: Backport for RecentSignificantEditStore: Force section titles to be an index array (T319799) (duration: 06m 41s)
  • 16:46 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:46 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:46 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:44 brennen@deploy1002: brennen and kartik: Backport for RecentSignificantEditStore: Force section titles to be an index array (T319799) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 16:43 brennen@deploy1002: Started scap: Backport for RecentSignificantEditStore: Force section titles to be an index array (T319799)
  • 16:42 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:42 brennen@deploy1002: Finished scap: Backport for Check whether title actually exists (T319798) (duration: 05m 47s)
  • 16:36 brennen@deploy1002: brennen and brennen: Backport for Check whether title actually exists (T319798) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 16:36 brennen@deploy1002: Started scap: Backport for Check whether title actually exists (T319798)
  • 16:15 brennen: train 1.40.0-wmf.4 (T314193) blockers have patches; after discussion in releng, going ahead with friday deploy in interest of avoiding a scramble during the coming holiday week
  • 15:09 btullis@cumin1001: END (PASS) - Cookbook sre.aqs.roll-restart (exit_code=0) for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 14:57 btullis@cumin1001: START - Cookbook sre.aqs.roll-restart for AQS aqs cluster: Roll restart of all AQS's nodejs daemons.
  • 14:35 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1002.eqiad.wmnet with OS buster
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 13:24 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
  • 13:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS buster
  • 13:08 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS buster
  • 12:40 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
  • 12:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:57 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS buster
  • 11:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:56 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS buster
  • 11:51 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1001.eqiad.wmnet with OS buster
  • 11:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:50 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1001.eqiad.wmnet with OS buster
  • 11:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:49 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS buster
  • 11:27 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:48 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 09:26 elukey: delete calico pods in CrashLoop on dse-k8s-codfw (probably due to the incorrect docker settings)
  • 08:59 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:52 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:44 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:43 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:39 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:37 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:36 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:35 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:35 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:33 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 08:33 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1029.eqiad.wmnet to cluster eqiad and group A
  • 08:26 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1029.eqiad.wmnet to cluster eqiad and group A
  • 08:23 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudnet1004.eqiad.wmnet with OS bullseye
  • 08:23 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 08:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1014.eqiad.wmnet with OS bullseye
  • 08:22 vgutierrez: partition ats-be cache in cp6016 - T317748
  • 08:21 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:19 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:19 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:19 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:19 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 08:11 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
  • 08:11 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
  • 08:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1014.eqiad.wmnet with reason: host reimage
  • 08:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1014.eqiad.wmnet with reason: host reimage
  • 07:54 elukey: re-initialize docker on dse-k8s-worker1004 - wrong storage type set (devicemapper instead of overlay2)
  • 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1014.eqiad.wmnet with OS bullseye
  • 07:49 elukey: re-initialize docker on dse-k8s-worker100[5-8] - wrong storage type set (devicemapper instead of overlay2)
  • 07:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1026.eqiad.wmnet with OS bullseye
  • 07:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1026.eqiad.wmnet with reason: host reimage
  • 07:23 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1026.eqiad.wmnet with reason: host reimage
  • 07:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1029.eqiad.wmnet to cluster eqiad and group A
  • 07:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1029.eqiad.wmnet to cluster eqiad and group A
  • 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1014.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
  • 07:14 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1014.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
  • 07:11 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1026.eqiad.wmnet with OS bullseye
  • 03:54 ejegg: civicrm upgraded from 6156f7cc to 4b9e981a
  • 03:52 ejegg: updated SmashPig standalone deployment from a8bb2212 to f36143f0

2022-10-06

  • 21:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:09 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:08 thcipriani@deploy1002: Finished scap: Backport for Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396) (duration: 06m 08s)
  • 21:02 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudnet1004.eqiad.wmnet
  • 21:02 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:02 thcipriani@deploy1002: thcipriani and jdlrobson: Backport for Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 21:01 thcipriani@deploy1002: Started scap: Backport for Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396)
  • 20:58 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:48 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:45 samtar@deploy1002: Finished scap: Backport for Replace promise handling when AfD'ing pages (T238025), Replace promise handling when AfD'ing pages (T238025) (duration: 07m 56s)
  • 20:40 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:39 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudnet1004.eqiad.wmnet
  • 20:37 samtar@deploy1002: samtar and samtar: Backport for Replace promise handling when AfD'ing pages (T238025), Replace promise handling when AfD'ing pages (T238025) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:37 samtar@deploy1002: Started scap: Backport for Replace promise handling when AfD'ing pages (T238025), Replace promise handling when AfD'ing pages (T238025)
  • 20:36 samtar@deploy1002: Backport cancelled.
  • 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:34 thcipriani@deploy1002: Finished scap: Backport for Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396) (duration: 09m 51s)
  • 20:33 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudnet1003.eqiad.wmnet
  • 20:33 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:32 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 20:29 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:27 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudnet1003.eqiad.wmnet
  • 20:25 thcipriani@deploy1002: thcipriani and jdlrobson: Backport for Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:24 thcipriani@deploy1002: Started scap: Backport for Skin: Map namespaces to associated pages inside runOnSkinTemplateNavigationHooks (T319396)
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:17 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:10 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:05 samtar@deploy1002: backport aborted: (duration: 03m 13s)
  • 20:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:51 SandraEbele: Started airflow projectview_hourly_dag
  • 19:50 SandraEbele: killed Oozie projectview-hourly job
  • 19:41 SandraEbele: deployed airflow to fix projectview_hourly_dag
  • 19:34 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@cbdc509]: (no justification provided) (duration: 00m 14s)
  • 19:34 ebysans@deploy1002: Started deploy [airflow-dags/analytics@cbdc509]: (no justification provided)
  • 19:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:31 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:29 ryankemper@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 6 hosts with reason: T313431
  • 19:28 ryankemper@cumin1001: START - Cookbook sre.hosts.downtime for 3:00:00 on 6 hosts with reason: T313431
  • 19:28 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.40.0-wmf.3 refs T314193
  • 19:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 19:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 19:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 19:21 brennen@deploy1002: rebuilt and synchronized wikiversions files: all wikis to 1.40.0-wmf.4 refs T314193
  • 19:15 brennen: train 1.40.0-wmf.4 (T314193) no current blockers, rolling train to all wikis
  • 19:03 inflatador: 'bking@elastic restarted elastic2025, 2031, 2061, 2084 T313431
  • 18:52 gehel@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on elastic[2025,2031].codfw.wmnet with reason: restarting for config reload - T313431
  • 18:52 gehel@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on elastic[2025,2031].codfw.wmnet with reason: restarting for config reload - T313431
  • 18:51 gehel@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on elastic2084.codfw.wmnet with reason: restarting for config reload - T313431
  • 18:50 gehel@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on elastic2084.codfw.wmnet with reason: restarting for config reload - T313431
  • 18:50 gehel@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on elastic2061.codfw.wmnet with reason: restarting for config reload - T313431
  • 18:50 gehel@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on elastic2061.codfw.wmnet with reason: restarting for config reload - T313431
  • 18:39 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudnet1003.eqiad.wmnet
  • 18:39 andrew@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:35 andrew@cumin1001: START - Cookbook sre.dns.netbox
  • 18:29 andrew@cumin1001: START - Cookbook sre.hosts.decommission for hosts cloudnet1003.eqiad.wmnet
  • 16:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 16:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 15:57 topranks: Applying explicit BFD mode configuration to cr4-ulsfo for Anycast BGP groups.
  • 15:53 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 15:52 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 15:51 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 15:49 elukey@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 15:48 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1004.eqiad.wmnet with OS bullseye
  • 15:47 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1003.eqiad.wmnet with OS bullseye
  • 15:45 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:28 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 15:22 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1009.eqiad.wmnet
  • 15:19 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:17 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 15:16 jynus: reload haproxy config on dbproxy1016, dbproxy1017
  • 15:11 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1009.eqiad.wmnet
  • 15:10 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1008.eqiad.wmnet
  • 15:10 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:08 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:08 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 15:05 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 15:01 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1008.eqiad.wmnet
  • 15:01 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 14:56 bblack: eqiad front edge depooled in DNS
  • 14:49 XioNoX: move asw2-d-eqiad<->cr1 link to new 40G link - T313385
  • 14:45 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 14:43 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cloudnet1005.eqiad.wmnet on all recursors
  • 14:43 cmooney@cumin1001: START - Cookbook sre.dns.wipe-cache cloudnet1005.eqiad.wmnet on all recursors
  • 14:42 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:40 volans@cumin1001: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) failoid2001.codfw.wmnet on codfw recursors
  • 14:40 volans@cumin1001: START - Cookbook sre.dns.wipe-cache failoid2001.codfw.wmnet on codfw recursors
  • 14:40 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 14:30 XioNoX: moving eqiad row C vrrp mastership to cr1-eqiad
  • 14:28 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 14:16 hashar: Gerrit upgraded from 3.4.5 to 3.4.6 # T319513
  • 14:13 XioNoX: move asw2-c-eqiad<->cr1 link to new 40G link - T313385
  • 14:12 hashar@deploy1002: Finished deploy [gerrit/gerrit@132ac68]: Gerrit to 3.4.6 on gerrit1001 (duration: 00m 08s)
  • 14:12 hashar@deploy1002: Started deploy [gerrit/gerrit@132ac68]: Gerrit to 3.4.6 on gerrit1001
  • 14:12 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 14:12 hashar: Upgrading primary Gerrit # T319513
  • 14:08 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 14:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@132ac68]: Gerrit to 3.4.6 on gerrit2002 (duration: 00m 10s)
  • 14:08 hashar@deploy1002: Started deploy [gerrit/gerrit@132ac68]: Gerrit to 3.4.6 on gerrit2002
  • 14:07 vgutierrez: updating HAProxy to version 2.4.19 in ulsfo
  • 14:03 btullis@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts aqs1007.eqiad.wmnet
  • 14:03 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:01 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 13:48 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1007.eqiad.wmnet
  • 13:41 elukey@cumin1001: END (PASS) - Cookbook sre.kafka.roll-restart-brokers (exit_code=0) for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 13:20 urbanecm: UTC afternoon backport window done
  • 13:20 moritzm: draining ganeti1014 T311687
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 13:18 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 urbanecm@deploy1002: Finished scap: Backport for Show thumbnails on Special:Search for NS_FILE + PageImages (T306883) (duration: 05m 12s)
  • 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:17 vgutierrez: partition ats-be cache in cp6008 - T317748
  • 13:16 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 13:16 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1006.eqiad.wmnet
  • 13:16 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:15 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 13:14 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:14 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:13 urbanecm@deploy1002: urbanecm and mlitn: Backport for Show thumbnails on Special:Search for NS_FILE + PageImages (T306883) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:13 urbanecm@deploy1002: Started scap: Backport for Show thumbnails on Special:Search for NS_FILE + PageImages (T306883)
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:12 urbanecm@deploy1002: Finished scap: Backport for Explicit config for Wikistories discovery module (T314582) (duration: 06m 37s)
  • 13:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 13:12 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:08 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 13:06 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 13:06 urbanecm@deploy1002: urbanecm and sbisson: Backport for Explicit config for Wikistories discovery module (T314582) synced to the testservers: mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:06 aborrero@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 13:05 urbanecm@deploy1002: Started scap: Backport for Explicit config for Wikistories discovery module (T314582)
  • 12:59 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:58 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1026.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
  • 12:56 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1026.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
  • 12:54 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1006.eqiad.wmnet
  • 12:45 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti1029.eqiad.wmnet
  • 12:43 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:42 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:40 elukey@cumin1001: START - Cookbook sre.kafka.roll-restart-brokers for Kafka A:kafka-logging-codfw cluster: Roll restart of jvm daemons.
  • 12:39 cmooney@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:36 cmooney@cumin1001: START - Cookbook sre.dns.netbox
  • 12:34 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 12:24 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1005.eqiad.wmnet
  • 12:24 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:21 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 12:15 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1005.eqiad.wmnet
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1012.eqiad.wmnet to cluster eqiad and group C
  • 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts aqs1004.eqiad.wmnet
  • 11:32 btullis@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:28 jbond: enable puppet post deploy puppetdb change 814824
  • 11:27 jbond: switch puppetdb replication to use replications slots
  • 11:27 btullis@cumin1001: START - Cookbook sre.dns.netbox
  • 11:27 btullis: cold-reset the BMC on analytics1076
  • 11:22 btullis@cumin1001: START - Cookbook sre.hosts.decommission for hosts aqs1004.eqiad.wmnet
  • 10:58 jbond: disable puppet temporarily to deploy a puppetdb change 814824
  • 10:51 _joe_: installing the upgraded php package everywhere, T318918
  • 10:30 elukey: restart kafka on kafka-logging1003 to reload the conifg (cleanup old super.users related to past keystore)
  • 10:16 moritzm: installing ruby-rack security updates
  • 10:11 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all remaining wikis
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging NOkafor out of all services on: 1213 hosts
  • 10:07 jmm@cumin2002: START - Cookbook sre.idm.logout Logging NOkafor out of all services on: 1213 hosts
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging NOkafor out of all services on: 799 hosts
  • 10:06 jmm@cumin2002: START - Cookbook sre.idm.logout Logging NOkafor out of all services on: 799 hosts
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jmads out of all services on: 799 hosts
  • 10:05 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jmads out of all services on: 799 hosts
  • 10:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:02 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for all wikis (duration: 03m 39s)
  • 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jmads out of all services on: 1213 hosts
  • 10:00 jmm@cumin2002: START - Cookbook sre.idm.logout Logging Jmads out of all services on: 1213 hosts
  • 09:57 moritzm: installing glib2.0 security updates on buster
  • 09:52 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for itwiki, arzwiki, ptwiki
  • 09:41 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudnet1005.eqiad.wmnet
  • 09:34 aborrero@cumin1001: START - Cookbook sre.hosts.reboot-single for host cloudnet1005.eqiad.wmnet
  • 09:32 moritzm: installing python-oslo.utils security updates
  • 09:28 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for viwiki, metawiki, frwiktionary
  • 09:22 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for nlwiktionary, ruwiki, jawiki
  • 09:21 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:21 _joe_: installed the upgraded php package to mw1414, T318918
  • 09:20 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:20 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:19 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:18 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for nine wikis (duration: 03m 41s)
  • 09:05 topranks: re-pooling esams after cr2-esams line card reboot
  • 09:04 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for cebwiki
  • 09:04 hoo: Ran extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for specieswiki
  • 09:04 hoo: Ran extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for ruwiktionary
  • 09:04 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:59 _joe_: uploaded new php 7.4 packages T318918
  • 08:58 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:58 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:54 topranks: rebooting line card fpc 0 on cr2-esams (T318783)
  • 08:53 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for three wikis (duration: 04m 03s)
  • 08:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 08:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 08:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 08:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 08:48 moritzm: installing jetty9 security updates
  • 08:42 moritzm: installing rails security updates
  • 08:37 moritzm: installing puma security updates
  • 08:27 topranks: disabling OSPF on cr2-esams
  • 08:24 cmooney@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-esams,cr2-esams IPv6,re0.cr2-esams.mgmt with reason: line card reboot
  • 08:24 cmooney@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cr2-esams,cr2-esams IPv6,re0.cr2-esams.mgmt with reason: line card reboot
  • 08:21 topranks: disabling external BGP sessions on cr2-esams prior to line card reboot
  • 08:12 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 08:10 elukey: restart kafka on kafka-logging1002 to reload the conifg (cleanup old super.users related to past keystore)
  • 08:10 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
  • 08:09 elukey: kafka logging old cert cleanup - `cumin 'A:kafka-logging' 'rm -f /etc/kafka/ssl/kafka_logging-eqiad_broker.keystore.jks'`
  • 08:01 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1012.eqiad.wmnet to cluster eqiad and group C
  • 08:00 elukey: delete /etc/kafka/ssl/kafka_logging-eqiad_broker.keystore.jks on kafka-logging1001 and restart (old puppet cert + settings deleted)
  • 07:50 topranks: De-pooling esams in advance of cr2-esams line card reboot
  • 07:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
  • 07:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
  • 07:36 moritzm: draining ganeti1026 T311687
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1012.eqiad.wmnet with OS bullseye
  • 07:15 moritzm: draining ganeti1005 T311687
  • 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1012.eqiad.wmnet with reason: host reimage
  • 07:11 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1012.eqiad.wmnet with reason: host reimage
  • 06:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1012.eqiad.wmnet with OS bullseye
  • 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 6079
  • 06:25 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 6079
  • 06:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 22616
  • 06:24 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 22616
  • 01:12 reedy@deploy1002: Finished deploy [integration/docroot@dc380cb]: Update jQuery (duration: 00m 11s)
  • 01:12 reedy@deploy1002: Started deploy [integration/docroot@dc380cb]: Update jQuery
  • 01:03 reedy@deploy1002: Finished deploy [integration/docroot@5cd2243]: Minor fixes (duration: 00m 12s)
  • 01:03 reedy@deploy1002: Started deploy [integration/docroot@5cd2243]: Minor fixes
  • 00:35 reedy@deploy1002: Finished deploy [integration/docroot@13687ed]: More minor updates (duration: 00m 30s)
  • 00:35 reedy@deploy1002: Started deploy [integration/docroot@13687ed]: More minor updates

2022-10-05

  • 22:27 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: Cleanup and timestamps (duration: 00m 07s)
  • 22:27 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: Cleanup and timestamps
  • 22:21 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: (no justification provided) (duration: 00m 06s)
  • 22:21 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: (no justification provided)
  • 22:19 reedy@deploy1002: deploy aborted: Cleanup and timestamps (duration: 00m 22s)
  • 22:19 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: Cleanup and timestamps
  • 22:18 dancy@deploy1002: Finished deploy [integration/docroot@a136ce6]: (no justification provided) (duration: 00m 10s)
  • 22:17 dancy@deploy1002: Started deploy [integration/docroot@a136ce6]: (no justification provided)
  • 22:17 dancy@deploy1002: Installation of scap version "4.27.0" completed for 559 hosts
  • 22:17 dancy@deploy1002: Installing scap version "4.27.0" for 559 hosts
  • 21:41 dancy@deploy1002: Installation of scap version "4.26.0" completed for 559 hosts
  • 21:41 dancy@deploy1002: Installing scap version "4.26.0" for 559 hosts
  • 20:33 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 01m 05s)
  • 20:32 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
  • 20:27 sukhe: running authdns-update for CR 838882
  • 20:26 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 00m 10s)
  • 20:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:26 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
  • 20:25 sukhe: homer "cr*-ulsfo*" commit "Gerrit 838239: sites.yaml: add dns4003 to anycast_neighbors"
  • 20:24 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 00m 06s)
  • 20:23 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
  • 20:22 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:22 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:22 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 00m 31s)
  • 20:22 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
  • 20:21 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:19 reedy@deploy1002: Finished deploy [integration/docroot@a136ce6]: More minor cleanup (duration: 00m 42s)
  • 20:19 urbanecm@deploy1002: Finished scap: Backport for Remove Research Incentive survey from arwiki (T318328) (duration: 05m 13s)
  • 20:19 reedy@deploy1002: Started deploy [integration/docroot@a136ce6]: More minor cleanup
  • 20:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:15 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:14 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:14 urbanecm@deploy1002: urbanecm and dani: Backport for Remove Research Incentive survey from arwiki (T318328) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:14 urbanecm@deploy1002: Started scap: Backport for Remove Research Incentive survey from arwiki (T318328)
  • 20:11 urbanecm@deploy1002: Finished scap: Backport for Deploy Research Incentive survey on eswiki (T318331) (duration: 06m 51s)
  • 20:09 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:08 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:05 urbanecm@deploy1002: urbanecm and dani: Backport for Deploy Research Incentive survey on eswiki (T318331) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet
  • 20:05 urbanecm@deploy1002: Started scap: Backport for Deploy Research Incentive survey on eswiki (T318331)
  • 20:03 mutante: registry* (4 servers) - disabling puppet, deploying gerrit:838859 - T308501
  • 19:57 reedy@deploy1002: Finished deploy [integration/docroot@09eb565]: T319461 and cleanup (duration: 00m 10s)
  • 19:56 reedy@deploy1002: Started deploy [integration/docroot@09eb565]: T319461 and cleanup
  • 18:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:31 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dns4003.wikimedia.org with OS buster
  • 18:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:29 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:27 brennen@deploy1002: Synchronized php: group1 wikis to 1.40.0-wmf.4 refs T314193 (duration: 03m 40s)
  • 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:23 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:23 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.40.0-wmf.4 refs T314193
  • 18:22 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:18 brennen: train 1.40.0-wmf.4 (T314193) no current blockers, rolling train to group1
  • 18:05 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 18:01 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 17:54 ejegg: payments-wiki upgraded from aeee9676 to 4e1f308b
  • 17:43 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS buster
  • 17:20 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a] (duration: 00m 14s)
  • 17:20 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a]
  • 17:18 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a] (duration: 00m 18s)
  • 17:18 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a]
  • 17:17 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a] (duration: 04m 24s)
  • 17:12 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a]
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:55 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:55 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:54 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:53 cjming: deployed labs-only config
  • 15:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ganeti1012.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
  • 15:39 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ganeti1012.eqiad.wmnet with reason: Downtime for removal from Ganeti cluster and eventual bullseye reimage
  • 15:29 moritzm: installing gdal security updates
  • 15:27 SandraEbele: deployed refinery source
  • 14:51 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 15 days, 0:00:00 on cloudnet[1005-1006].eqiad.wmnet with reason: migrating
  • 14:39 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudnet1004.eqiad.wmnet with reason: decom
  • 14:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudnet1004.eqiad.wmnet with reason: decom
  • 14:38 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cloudnet1003.eqiad.wmnet with reason: decom
  • 14:38 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on cloudnet1003.eqiad.wmnet with reason: decom
  • 14:36 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 8359
  • 14:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 8359
  • 14:30 papaul: on going maintenance on msw1-eqiad
  • 14:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1032.eqiad.wmnet with OS bullseye
  • 14:20 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a] (duration: 04m 24s)
  • 14:16 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a] (thin): Regular analytics weekly train THIN [analytics/refinery@7e16d2a]
  • 14:16 mforns@deploy1002: Finished deploy [analytics/refinery@7e16d2a]: Regular analytics weekly train [analytics/refinery@7e16d2a] (duration: 10m 27s)
  • 14:15 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 14:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1032.eqiad.wmnet with reason: host reimage
  • 14:08 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1032.eqiad.wmnet with reason: host reimage
  • 14:07 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 14:06 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 14:05 mforns@deploy1002: Started deploy [analytics/refinery@7e16d2a]: Regular analytics weekly train [analytics/refinery@7e16d2a]
  • 13:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1032.eqiad.wmnet with OS bullseye
  • 13:37 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@f7a68c2]: (no justification provided) (duration: 00m 12s)
  • 13:36 ebysans@deploy1002: Started deploy [airflow-dags/analytics@f7a68c2]: (no justification provided)
  • 13:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:26 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:25 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:22 SandraEbele: deploying fix for projectview dags on airflow
  • 13:21 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for enwiktionary/frwiki (duration: 03m 38s)
  • 13:20 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:19 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:18 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1031.eqiad.wmnet with OS bullseye
  • 13:07 moritzm: draining ganeti1012 T311687
  • 13:04 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for zhwiki
  • 13:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1031.eqiad.wmnet with reason: host reimage
  • 13:00 vgutierrez: test HAProxy 2.4.19 in cp4026 && cp4032
  • 12:59 vgutierrez: vgutierrez@apt1001:~$ sudo -i reprepro --component thirdparty/haproxy24 update buster-wikimedia # fetch HAProxy 2.4.19
  • 12:59 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1031.eqiad.wmnet with reason: host reimage
  • 12:48 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 12:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1031.eqiad.wmnet with OS bullseye
  • 12:47 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti1031.eqiad.wmnet with OS bullseye
  • 12:46 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1031.eqiad.wmnet with OS bullseye
  • 12:45 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 12:41 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for enwiki
  • 12:33 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 12:32 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 12:32 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 12:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1030.eqiad.wmnet with OS bullseye
  • 12:30 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:28 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for enwiki/zhwiki (duration: 03m 46s)
  • 12:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1030.eqiad.wmnet with reason: host reimage
  • 12:16 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1030.eqiad.wmnet with reason: host reimage
  • 12:13 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 12:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1030.eqiad.wmnet with OS bullseye
  • 11:54 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 11:53 XioNoX: fix MTU between eqiad core routers and cloudsw - T315838
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1029.eqiad.wmnet with OS bullseye
  • 11:52 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 11:49 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 11:49 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1029.eqiad.wmnet with reason: host reimage
  • 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 11:33 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 11:33 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1029.eqiad.wmnet with reason: host reimage
  • 11:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1029.eqiad.wmnet with OS bullseye
  • 11:04 moritzm: running "gnt-cluster upgrade --to 3.0" for ganeti/eqiad T311687
  • 11:01 vgutierrez: repool cp2036 - T319394
  • 10:53 vgutierrez: powercycle cp2036 - T319394
  • 10:52 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:51 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:51 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:50 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:48 vgutierrez@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp2036.codfw.wmnet
  • 10:46 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for commonswiki
  • 10:45 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 10:44 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for commonswiki (duration: 03m 51s)
  • 10:44 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 10:44 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 10:43 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 10:36 moritzm: installing gdk-pixbuf security updates
  • 09:52 hoo: Running extensions/Wikibase/client/maintenance/populateUnexpectedUnconnectedPagePageProp.php for all of ruwikinews
  • 09:51 hoo: Ran extensions/Wikibase/client/maintenance/PopulateUnexpectedUnconnectedPagePageProp.php for all of arwiki
  • 09:32 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:31 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:31 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:31 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for ruwikinews (duration: 03m 39s)
  • 09:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:21 moritzm: upgrading ganeti/eqiad nodes to Ganeti 3 T311687
  • 09:20 dcausse: restarting blazegraph on wdqs1014 (BlazegraphFreeAllocatorsDecreasingRapidly)
  • 09:15 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 09:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 09:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 09:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 09:09 hoo@deploy1002: Synchronized wmf-config/InitialiseSettings.php: Disable UnconnectedPagePagePropMigrationLegacyFormat for arwiki (duration: 03m 49s)
  • 09:06 moritzm: reimport ganeti 3.0.1-1~bpo10+1 to component/ganeti3 (got removed alongside via a reprepro bug/misfeature when the bullseye component was removed)
  • 07:54 elukey: restart kafka on kafka-logging1003 to pick up new PKI TLS settings
  • 07:50 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1003.eqiad.wmnet with reason: Kafka PKI upgrade
  • 07:49 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1003.eqiad.wmnet with reason: Kafka PKI upgrade
  • 06:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35360 and previous config saved to /var/cache/conftool/dbconfig/20221005-065519-root.json
  • 06:40 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35359 and previous config saved to /var/cache/conftool/dbconfig/20221005-064014-root.json
  • 06:30 elukey: restart kafka on kafka-logging1002 to pick up the new cert+settings for PKI
  • 06:27 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1002.eqiad.wmnet with reason: Kafka PKI upgrade
  • 06:27 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1002.eqiad.wmnet with reason: Kafka PKI upgrade
  • 06:25 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35358 and previous config saved to /var/cache/conftool/dbconfig/20221005-062509-root.json
  • 06:10 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35357 and previous config saved to /var/cache/conftool/dbconfig/20221005-061004-root.json
  • 05:55 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35356 and previous config saved to /var/cache/conftool/dbconfig/20221005-055459-root.json
  • 05:50 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 62044
  • 05:39 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35355 and previous config saved to /var/cache/conftool/dbconfig/20221005-053954-root.json
  • 05:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 62044
  • 05:24 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35354 and previous config saved to /var/cache/conftool/dbconfig/20221005-052449-root.json
  • 05:09 marostegui@cumin1001: dbctl commit (dc=all): 'es2030 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35353 and previous config saved to /var/cache/conftool/dbconfig/20221005-050944-root.json
  • 05:00 marostegui@cumin1001: dbctl commit (dc=all): 'Depool es2030', diff saved to https://phabricator.wikimedia.org/P35352 and previous config saved to /var/cache/conftool/dbconfig/20221005-050018-root.json
  • 02:59 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1023.eqiad.wmnet
  • 02:21 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1023.eqiad.wmnet
  • 02:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt1023.eqiad.wmnet
  • 02:20 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1023.eqiad.wmnet
  • 02:19 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudvirt1023.eqiad.wmnet
  • 02:19 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host cloudvirt1023.eqiad.wmnet
  • 00:05 sukhe: disable puppet on dns4003 till we resolve the puppet failures

2022-10-04

  • 23:09 andrew@cumin1001: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 22:53 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 21:28 cjming: end of UTC late backport window
  • 21:26 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:25 cjming@deploy1002: Finished scap: Backport for Revert "Revert "Add wordmark and tagline for Bengali Wikibooks"" (duration: 05m 06s)
  • 21:25 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:25 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:24 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:21 cjming@deploy1002: cjming and cjming: Backport for Revert "Revert "Add wordmark and tagline for Bengali Wikibooks"" synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:20 cjming@deploy1002: Started scap: Backport for Revert "Revert "Add wordmark and tagline for Bengali Wikibooks""
  • 21:14 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:10 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:07 cjming@deploy1002: Finished scap: Backport for Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317) (duration: 05m 40s)
  • 21:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 21:04 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 21:04 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 21:03 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 21:01 cjming@deploy1002: cjming and mdsshakil: Backport for Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317) synced to the testservers: mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 21:01 cjming@deploy1002: Started scap: Backport for Enable wgMinervaEnableSiteNotice for bnwikibooks (T319317)
  • 20:59 cjming@deploy1002: Finished scap: Backport for Revert "Add wordmark and tagline for Bengali Wikibooks" (duration: 06m 35s)
  • 20:58 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:57 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:56 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:53 cjming@deploy1002: cjming and trainbranchbot: Backport for Revert "Add wordmark and tagline for Bengali Wikibooks" synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 20:52 cjming@deploy1002: Started scap: Backport for Revert "Add wordmark and tagline for Bengali Wikibooks"
  • 20:51 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:49 cjming@deploy1002: Sync cancelled.
  • 20:47 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:47 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:46 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:42 cjming@deploy1002: cjming and aishik: Backport for Add wordmark and tagline for Bengali Wikibooks (T319320) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet
  • 20:41 cjming@deploy1002: Started scap: Backport for Add wordmark and tagline for Bengali Wikibooks (T319320)
  • 20:39 cjming@deploy1002: Finished scap: Backport for ParsoidHandler: use metrics from SiteConfig (duration: 14m 29s)
  • 20:36 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 20:35 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 20:35 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 20:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 20:25 cjming@deploy1002: cjming and d3r1ck01: Backport for ParsoidHandler: use metrics from SiteConfig synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 20:25 cjming@deploy1002: Started scap: Backport for ParsoidHandler: use metrics from SiteConfig
  • 19:54 sukhe@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS buster
  • 18:51 sukhe@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 18:48 sukhe@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 18:34 mutante: gerrit - deploying puppet refactoring change
  • 18:34 tzatziki: removing 1 file for legal compliance
  • 18:29 sukhe@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS buster
  • 18:27 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:24 tzatziki: removing 1 file for legal compliance
  • 18:24 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:24 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 18:23 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 18:21 moritzm: installing gdk-pixbuf security updates
  • 18:19 demon@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.40.0-wmf.4 refs T314193
  • 18:03 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 18:02 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 18:02 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 17:59 ejegg: turned fundraising scheduled jobs back on
  • 17:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 17:57 urbanecm@deploy1002: Finished scap: Backport for Mentee table: fix wrong less import (T319321) (duration: 06m 58s)
  • 17:55 moritzm: installing libsndfile security updates
  • 17:50 urbanecm@deploy1002: urbanecm and urbanecm: Backport for Mentee table: fix wrong less import (T319321) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet
  • 17:50 urbanecm@deploy1002: Started scap: Backport for Mentee table: fix wrong less import (T319321)
  • 17:49 ejegg: turned off fundraising scheduled jobs for civi deploy
  • 17:28 tzatziki: removing 4 files for legal compliance
  • 17:04 mutante: gerrit - deployed 832345 - scap and daemon users became decoupled (T317412)
  • 17:02 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:56 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:56 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:49 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:33 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:30 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:25 brennen@deploy1002: Pruned MediaWiki: 1.40.0-wmf.2 (duration: 02m 02s)
  • 16:24 brennen@deploy1002: Finished scap: testwikis wikis to 1.40.0-wmf.4 refs T314193 (duration: 28m 55s)
  • 16:21 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host dns4003.wikimedia.org with OS bullseye
  • 16:03 robh@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 16:00 robh@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dns4003.wikimedia.org with reason: host reimage
  • 16:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:57 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:54 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2003.codfw.wmnet with OS buster
  • 15:54 brennen@deploy1002: Started scap: testwikis wikis to 1.40.0-wmf.4 refs T314193
  • 15:53 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 15:53 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
  • 15:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
  • 15:51 brennen: restarting `/usr/bin/scap stage-train --yes auto` after failed staging (T314193), cc: ^demon
  • 15:48 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
  • 15:47 sukhe: disable Puppet on A:cp and A:eqiad for T309651
  • 15:42 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
  • 15:33 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
  • 15:29 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2003.codfw.wmnet with reason: host reimage
  • 15:25 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:25 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:16 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2003.codfw.wmnet with OS buster
  • 15:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:11 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:11 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2003.codfw.wmnet with reason: Prep for reimage
  • 15:10 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2003.codfw.wmnet with reason: Prep for reimage
  • 15:10 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 15:10 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore2002.codfw.wmnet with OS buster
  • 15:09 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
  • 15:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
  • 15:06 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
  • 15:05 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 15:03 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 15:03 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 15:02 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 15:02 moritzm: installing snakeyaml security updates
  • 14:58 andrew@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 14:55 papaul: maintenance complete on msw1-codfw
  • 14:51 sukhe: disable Puppet on A:cp and A:esams for T309651
  • 14:50 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
  • 14:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2002.codfw.wmnet with reason: host reimage
  • 14:40 moritzm: installing maven-shared-utils security updates
  • 14:34 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2002.codfw.wmnet with OS buster
  • 14:32 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2002.codfw.wmnet with reason: Prep for reimage
  • 14:32 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2002.codfw.wmnet with reason: Prep for reimage
  • 14:30 papaul: on going maintenance on msw1-codfw
  • 14:29 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 14:27 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 14:22 filippo@cumin1001: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 14:14 XioNoX: netbox - Move VRRP IPs to FHRP group feature - T311218
  • 14:13 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 14:12 filippo@cumin1001: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 14:12 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/tests/phpunit/: Backport: Revert "Introduce LanguageVariantConverter" (T319282) (2/2; no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 52s)
  • 14:12 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 14:08 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/includes/: Backport: Revert "Introduce LanguageVariantConverter" (T319282) (1/2; no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 43s)
  • 14:07 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 14:06 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 14:06 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 14:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 14:03 lucaswerkmeister-wmde@deploy1002: Synchronized php-1.40.0-wmf.4/extensions/Kartographer/modules/dialog: Backport: Log basic nearby and fullscreen events (T315972, T318678) (no wikis use wmf.4 yet, but the code exists, so the change needs to be synced) (duration: 03m 42s)
  • 14:02 andrew@cumin1001: START - Cookbook sre.hosts.reimage for host cloudvirt1023.eqiad.wmnet with OS bullseye
  • 14:00 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:59 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:59 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:58 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:55 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=codfw
  • 13:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
  • 13:54 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
  • 13:49 jbond@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync data - jbond@cumin1001"
  • 13:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35347 and previous config saved to /var/cache/conftool/dbconfig/20221004-134947-root.json
  • 13:49 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=codfw
  • 13:48 sukhe: disable Puppet on A:cp and A:eqsin for T309651
  • 13:47 jbond@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync data - jbond@cumin1001"
  • 13:42 awight: EU backport window finished.
  • 13:40 filippo@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 13:38 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "sync-mgmt - filippo@cumin1001"
  • 13:38 jmm@cumin2002: END (PASS) - Cookbook sre.maps.roll-restart (exit_code=0) rolling restart_daemons on A:maps-replica-eqiad
  • 13:38 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:37 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:36 awight@deploy1002: Finished scap: Backport for Wire new event stream for maps interactions (T315972 T318678) (duration: 06m 49s)
  • 13:36 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:35 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-replica-eqiad
  • 13:35 filippo@cumin1001: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "filippo test - filippo@cumin1001"
  • 13:34 filippo@cumin1001: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "filippo test - filippo@cumin1001"
  • 13:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35346 and previous config saved to /var/cache/conftool/dbconfig/20221004-133442-root.json
  • 13:32 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbox - try 2 - CR826559 - ayounsi@cumin1001
  • 13:31 jbond: re-enable puppet post deploy a puppetmaster change 838144
  • 13:30 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbox - try 2 - CR826559 - ayounsi@cumin1001
  • 13:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbx CR826559 - ayounsi@cumin1001
  • 13:30 awight@deploy1002: awight and awight: Backport for Wire new event stream for maps interactions (T315972 T318678) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet
  • 13:29 awight@deploy1002: Started scap: Backport for Wire new event stream for maps interactions (T315972 T318678)
  • 13:28 ayounsi@cumin1001: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1001.eqiad.wmnet with reason: update to wmf-netbx CR826559 - ayounsi@cumin1001
  • 13:27 awight@deploy1002: Finished scap: Backport for ukwiki: Create flood group (T319243) (duration: 05m 16s)
  • 13:24 jbond: disable puppet to deploy a puppetmaster change 838144
  • 13:22 awight@deploy1002: awight and stang: Backport for ukwiki: Create flood group (T319243) synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet
  • 13:21 awight@deploy1002: Started scap: Backport for ukwiki: Create flood group (T319243)
  • 13:21 awight@deploy1002: Finished scap: Backport for throttle: Add throttle rule for 2022-10-13 (T319244) (duration: 12m 48s)
  • 13:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35345 and previous config saved to /var/cache/conftool/dbconfig/20221004-131937-root.json
  • 13:16 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 13:14 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:13 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
  • 13:13 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 13:13 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 13:12 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 13:11 awight@deploy1002: awight and stang: Backport for throttle: Add throttle rule for 2022-10-13 (T319244) synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet
  • 13:08 awight@deploy1002: Started scap: Backport for throttle: Add throttle rule for 2022-10-13 (T319244)
  • 13:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35343 and previous config saved to /var/cache/conftool/dbconfig/20221004-130432-root.json
  • 12:58 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 12:56 aborrero@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 12:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1006.eqiad.wmnet with reason: host reimage
  • 12:53 aborrero@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudnet1005.eqiad.wmnet with reason: host reimage
  • 12:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35342 and previous config saved to /var/cache/conftool/dbconfig/20221004-124927-root.json
  • 12:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:37 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 12:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35341 and previous config saved to /var/cache/conftool/dbconfig/20221004-123422-root.json
  • 12:31 cgoubert@deploy1002: Finished deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # T310458 (duration: 00m 58s)
  • 12:30 cgoubert@deploy1002: Started deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # T310458
  • 12:29 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS buster
  • 12:26 cgoubert@deploy1002: Finished deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # T310458 (duration: 00m 14s)
  • 12:26 cgoubert@deploy1002: Started deploy [docker-pkg/deploy@24fbee1]: Release 3.0.3 # T310458
  • 12:21 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 12:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35340 and previous config saved to /var/cache/conftool/dbconfig/20221004-121917-root.json
  • 12:14 volans: uploaded python3-gjson_0.1.0 to apt.wikimedia.org bullseye-wikimedia
  • 12:13 jbond@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:10 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 12:09 jbond@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:08 hnowlan@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host sessionstore2001.codfw.wmnet with OS buster
  • 12:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2181 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35339 and previous config saved to /var/cache/conftool/dbconfig/20221004-120413-root.json
  • 11:55 jbond@cumin1001: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS buster
  • 11:43 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
  • 11:40 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore2001.codfw.wmnet with reason: host reimage
  • 11:24 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
  • 11:22 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
  • 11:11 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 11:10 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 11:05 jayme: published calico 3.23.3 debian packages in bullseye component/calico323 as well as corresponding docker images - T307943
  • 11:04 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 10:58 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 10:58 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 10:56 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS buster
  • 10:55 aborrero@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 10:54 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1006.eqiad.wmnet with OS bullseye
  • 10:54 hnowlan@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sessionstore2001.codfw.wmnet with OS buster
  • 10:53 aborrero@cumin1001: START - Cookbook sre.hosts.reimage for host cloudnet1005.eqiad.wmnet with OS bullseye
  • 10:44 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 135158
  • 10:43 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 135158
  • 10:43 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 9119
  • 10:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 9119
  • 10:41 moritzm: installing expat security updates
  • 09:59 jmm@cumin2002: END (FAIL) - Cookbook sre.maps.roll-restart (exit_code=1) rolling restart_daemons on A:maps-codfw
  • 09:47 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 09:46 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 09:46 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 09:46 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 09:45 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 09:44 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 09:44 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 09:43 btullis@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 09:42 jayme: deployed istio-ingressgateway with additional envoy native metrics to wikikube codfw and eqiad
  • 09:40 hnowlan@cumin2002: START - Cookbook sre.hosts.reimage for host sessionstore2001.codfw.wmnet with OS buster
  • 09:37 jmm@cumin2002: START - Cookbook sre.maps.roll-restart rolling restart_daemons on A:maps-codfw
  • 09:36 hnowlan@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore2001.codfw.wmnet with reason: Prep for reimage
  • 09:36 hnowlan@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore2001.codfw.wmnet with reason: Prep for reimage
  • 09:36 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 20 hosts
  • 09:35 cgoubert@cumin1001: START - Cookbook sre.hosts.remove-downtime for 20 hosts
  • 09:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35338 and previous config saved to /var/cache/conftool/dbconfig/20221004-093530-root.json
  • 09:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35337 and previous config saved to /var/cache/conftool/dbconfig/20221004-092025-root.json
  • 09:08 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 09:08 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 09:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35336 and previous config saved to /var/cache/conftool/dbconfig/20221004-090520-root.json
  • 08:56 cgoubert@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 20 hosts with reason: php7.2 removal
  • 08:55 cgoubert@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on 20 hosts with reason: php7.2 removal
  • 08:52 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 08:52 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35335 and previous config saved to /var/cache/conftool/dbconfig/20221004-085015-root.json
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35334 and previous config saved to /var/cache/conftool/dbconfig/20221004-083511-root.json
  • 08:20 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35333 and previous config saved to /var/cache/conftool/dbconfig/20221004-082005-root.json
  • 08:17 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 08:16 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db2181.codfw.wmnet with reason: Upgrading
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35332 and previous config saved to /var/cache/conftool/dbconfig/20221004-080500-root.json
  • 08:03 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2181', diff saved to https://phabricator.wikimedia.org/P35331 and previous config saved to /var/cache/conftool/dbconfig/20221004-080338-root.json
  • 07:52 moritzm: installing libdatetime-timezone-perl updates (catching up with latest timezone changes)
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2178 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35330 and previous config saved to /var/cache/conftool/dbconfig/20221004-074955-root.json
  • 07:36 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: sync
  • 07:36 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: sync
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35329 and previous config saved to /var/cache/conftool/dbconfig/20221004-072158-root.json
  • 07:16 elukey: restart kafka on kafka-logging1001 to pick up its new PKI TLS cert
  • 07:11 elukey@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade
  • 07:11 elukey@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on kafka-logging1001.eqiad.wmnet with reason: Kafka PKI upgrade
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35328 and previous config saved to /var/cache/conftool/dbconfig/20221004-070653-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35327 and previous config saved to /var/cache/conftool/dbconfig/20221004-065148-root.json
  • 06:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 06:42 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 06:42 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 06:39 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35326 and previous config saved to /var/cache/conftool/dbconfig/20221004-063643-root.json
  • 06:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 25885
  • 06:32 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 25885
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35325 and previous config saved to /var/cache/conftool/dbconfig/20221004-062138-root.json
  • 06:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35324 and previous config saved to /var/cache/conftool/dbconfig/20221004-060633-root.json
  • 05:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 3%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35323 and previous config saved to /var/cache/conftool/dbconfig/20221004-055128-root.json
  • 05:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: After HW maintenance', diff saved to https://phabricator.wikimedia.org/P35322 and previous config saved to /var/cache/conftool/dbconfig/20221004-053623-root.json
  • 03:12 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 03:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 03:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 03:07 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:31 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:30 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:28 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 02:13 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 02:09 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 02:09 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 02:05 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply

2022-10-03

  • 21:45 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:44 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 21:44 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
  • 21:18 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
  • 19:41 ryankemper: [Elastic] Unbanned `elastic1066`
  • 19:37 ryankemper: [Elastic] Restarted psi on `elastic1066`; will unban host after process is up and running
  • 19:32 robh: msw1-ulsfo swap successful, mgmt recovering in icinga and tested connection with 3 servers all work
  • 19:25 robh: msw1-ulsfo swap, some mgmt flapping expected, swap complete but not powered back up yet
  • 19:22 ryankemper: [Elastic] Banned `elastic1066` (`curl -H 'Content-Type: application/json' -XPUT http://localhost:9600/_cluster/settings -d '{"transient":{"cluster.routing.allocation.exclude":{"_host": "","_name": "elastic1066-production-search-psi-eqiad"}'`); will restart elasticsearch-psi after shards drain}}
  • 19:15 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
  • 18:48 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
  • 18:41 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dns4003.wikimedia.org with OS bullseye
  • 18:34 robh@cumin2002: START - Cookbook sre.hosts.reimage for host dns4003.wikimedia.org with OS bullseye
  • 18:30 robh@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:30 bblack@cumin1001: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp4045.ulsfo.wmnet with OS buster
  • 18:21 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:12 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:06 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:04 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 18:00 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:52 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:42 robh@cumin2002: START - Cookbook sre.hosts.provision for host dns4003.mgmt.ulsfo.wmnet with reboot policy FORCED
  • 17:41 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dns4003
  • 17:41 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dns4003
  • 17:40 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:37 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:29 bblack@cumin1001: START - Cookbook sre.hosts.reimage for host cp4045.ulsfo.wmnet with OS buster
  • 17:29 sukhe: running homer "cr*-ulsfo*" commit "Gerrit 837727: remove dns4001 for anycast neighbors."
  • 17:13 robh@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dns4001.wikimedia.org
  • 17:13 robh@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:08 robh@cumin2002: START - Cookbook sre.dns.netbox
  • 17:04 robh@cumin2002: START - Cookbook sre.hosts.decommission for hosts dns4001.wikimedia.org
  • 16:43 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:39 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:39 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:34 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:33 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 30781
  • 16:33 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 30781
  • 16:29 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:28 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:28 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:27 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:24 urbanecm@deploy1002: Finished scap: Backport for throttle: Remove out of date rules (duration: 04m 16s)
  • 16:22 mwdebug-deploy@deploy1002: helmfile [codfw] DONE helmfile.d/services/mwdebug: apply
  • 16:21 mwdebug-deploy@deploy1002: helmfile [codfw] START helmfile.d/services/mwdebug: apply
  • 16:21 mwdebug-deploy@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mwdebug: apply
  • 16:20 mwdebug-deploy@deploy1002: helmfile [eqiad] START helmfile.d/services/mwdebug: apply
  • 16:20 urbanecm@deploy1002: urbanecm and urbanecm: Backport for throttle: Remove out of date rules synced to the testservers: mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet
  • 16:20 urbanecm@deploy1002: Started scap: Backport for throttle: Remove out of date rules
  • 16:18 urbanecm@deploy1002: Synchronized wmf-config/InitialiseSettings.php: cae49b8: throttle: Add throttle rule for 2022-10-06 (T319212) (duration: 04m 21s)
  • 16:14 sukhe: disable Puppet on cp hosts in codfw: rolling out T309651
  • 15:15 sukhe: disable Puppet on cp hosts in ulsfo: rolling out T309651
  • 15:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35320 and previous config saved to /var/cache/conftool/dbconfig/20221003-151438-root.json
  • 15:06 papaul: maintenance complete on mr1-esams
  • 14:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35319 and previous config saved to /var/cache/conftool/dbconfig/20221003-145933-root.json
  • 14:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35318 and previous config saved to /var/cache/conftool/dbconfig/20221003-144428-root.json
  • 14:35 sukhe: upgrade A:cp and A:drmrs to ATS 9.1.3-1wm2 from 9.1.3-1wm1: T309651
  • 14:31 papaul: on going maintenance on mr1-esams
  • 14:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35317 and previous config saved to /var/cache/conftool/dbconfig/20221003-142923-root.json
  • 14:14 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35316 and previous config saved to /var/cache/conftool/dbconfig/20221003-141417-root.json
  • 14:08 sukhe: upgrade cp4026, cp4032 to ATS 9.1.3-1wm2 from 9.1.3-1wm1: T309651
  • 13:59 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35315 and previous config saved to /var/cache/conftool/dbconfig/20221003-135912-root.json
  • 13:57 sukhe: reprepro -C component/trafficserver9 include buster-wikimedia trafficserver_9.1.3-1wm2_amd64.changes: T309651
  • 13:44 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35314 and previous config saved to /var/cache/conftool/dbconfig/20221003-134407-root.json
  • 13:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35313 and previous config saved to /var/cache/conftool/dbconfig/20221003-134024-root.json
  • 13:29 marostegui@cumin1001: dbctl commit (dc=all): 'db2123 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35312 and previous config saved to /var/cache/conftool/dbconfig/20221003-132902-root.json
  • 13:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35311 and previous config saved to /var/cache/conftool/dbconfig/20221003-132519-root.json
  • 13:18 vgutierrez: enforcing origin-form|asterisk-form for request-target on varnish (could trigger spikes of HTTP 400 errors) - T318676
  • 13:10 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35310 and previous config saved to /var/cache/conftool/dbconfig/20221003-131014-root.json
  • 12:55 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35308 and previous config saved to /var/cache/conftool/dbconfig/20221003-125509-root.json
  • 12:40 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35307 and previous config saved to /var/cache/conftool/dbconfig/20221003-124004-root.json
  • 12:25 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35306 and previous config saved to /var/cache/conftool/dbconfig/20221003-122459-root.json
  • 12:09 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35305 and previous config saved to /var/cache/conftool/dbconfig/20221003-120954-root.json
  • 12:02 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2123', diff saved to https://phabricator.wikimedia.org/P35303 and previous config saved to /var/cache/conftool/dbconfig/20221003-120208-root.json
  • 12:01 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
  • 12:01 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db2123.codfw.wmnet with reason: Cloning
  • 12:00 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
  • 12:00 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1116.eqiad.wmnet with reason: Reboot
  • 11:54 marostegui@cumin1001: dbctl commit (dc=all): 'db2157 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35302 and previous config saved to /var/cache/conftool/dbconfig/20221003-115449-root.json
  • 11:54 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
  • 11:54 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on db1117.eqiad.wmnet with reason: Reboot
  • 11:28 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
  • 11:28 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
  • 11:27 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1003.eqiad.wmnet with OS buster
  • 11:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
  • 11:20 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
  • 11:08 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
  • 11:04 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1003.eqiad.wmnet with reason: host reimage
  • 10:52 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1003.eqiad.wmnet with OS buster
  • 10:49 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
  • 10:48 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1003.eqiad.wmnet with reason: Prep for reimage
  • 10:41 hnowlan@puppetmaster1001: conftool action : set/pooled=true; selector: dnsdisc=sessionstore,name=eqiad
  • 10:41 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sessionstore1002.eqiad.wmnet with OS buster
  • 10:40 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
  • 10:40 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
  • 10:39 hnowlan: starting cassandra on reimaged sessionstore1002
  • 10:37 _joe_: remove stale druid.svc.eqiad.wmnet certificate from the puppetmaster CA; it was expired anyways
  • 10:32 hnowlan@puppetmaster1001: conftool action : set/pooled=false; selector: dnsdisc=sessionstore,name=eqiad
  • 10:31 jelto@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 10:31 jelto@cumin1001: START - Cookbook sre.hosts.downtime for 0:20:00 on gitlab1004.wikimedia.org with reason: upgrade gitlab1004 to new version
  • 10:19 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
  • 10:16 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 2:00:00 on sessionstore1002.eqiad.wmnet with reason: host reimage
  • 10:05 hnowlan@cumin1001: START - Cookbook sre.hosts.reimage for host sessionstore1002.eqiad.wmnet with OS buster
  • 10:00 hnowlan: c-foreach-nt drain on sessionstore1002
  • 10:00 hnowlan@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
  • 10:00 hnowlan@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on sessionstore1002.eqiad.wmnet with reason: Prep for reimage
  • 09:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35300 and previous config saved to /var/cache/conftool/dbconfig/20221003-092519-root.json
  • 09:22 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 31133
  • 09:21 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 31133
  • 09:11 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 62044
  • 09:11 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 62044
  • 09:10 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35299 and previous config saved to /var/cache/conftool/dbconfig/20221003-091014-root.json
  • 08:59 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
  • 08:59 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 8:00:00 on db[2157,2178].codfw.wmnet with reason: Reclone
  • 08:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2157', diff saved to https://phabricator.wikimedia.org/P35297 and previous config saved to /var/cache/conftool/dbconfig/20221003-085840-root.json
  • 08:55 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35296 and previous config saved to /var/cache/conftool/dbconfig/20221003-085509-root.json
  • 08:54 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 12975
  • 08:53 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 12975
  • 08:50 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35295 and previous config saved to /var/cache/conftool/dbconfig/20221003-085007-root.json
  • 08:40 vgutierrez@cumin1001: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cp5001.eqsin.wmnet
  • 08:40 vgutierrez@cumin1001: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:40 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35294 and previous config saved to /var/cache/conftool/dbconfig/20221003-084004-root.json
  • 08:39 ayounsi@cumin1001: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'email' for AS: 3303
  • 08:38 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 3303
  • 08:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 100%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35293 and previous config saved to /var/cache/conftool/dbconfig/20221003-083729-root.json
  • 08:36 vgutierrez@cumin1001: START - Cookbook sre.dns.netbox
  • 08:35 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12956
  • 08:35 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35292 and previous config saved to /var/cache/conftool/dbconfig/20221003-083502-root.json
  • 08:34 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12956
  • 08:30 vgutierrez@cumin1001: START - Cookbook sre.hosts.decommission for hosts cp5001.eqsin.wmnet
  • 08:29 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 15557
  • 08:28 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 15557
  • 08:26 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 12975
  • 08:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 12975
  • 08:25 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35291 and previous config saved to /var/cache/conftool/dbconfig/20221003-082459-root.json
  • 08:24 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 30781
  • 08:23 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 30781
  • 08:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 75%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35290 and previous config saved to /var/cache/conftool/dbconfig/20221003-082224-root.json
  • 08:21 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 39386
  • 08:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35289 and previous config saved to /var/cache/conftool/dbconfig/20221003-081955-root.json
  • 08:16 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 39386
  • 08:09 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35288 and previous config saved to /var/cache/conftool/dbconfig/20221003-080954-root.json
  • 08:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 50%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35287 and previous config saved to /var/cache/conftool/dbconfig/20221003-080719-root.json
  • 08:06 ayounsi@cumin1001: END (ERROR) - Cookbook sre.network.peering (exit_code=97) with action 'email' for AS: 16509
  • 08:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35286 and previous config saved to /var/cache/conftool/dbconfig/20221003-080556-root.json
  • 08:05 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16509
  • 08:05 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
  • 08:05 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
  • 08:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35285 and previous config saved to /var/cache/conftool/dbconfig/20221003-080451-root.json
  • 07:57 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
  • 07:57 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db2178.codfw.wmnet with reason: Upgrade to 10.6
  • 07:56 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2178', diff saved to https://phabricator.wikimedia.org/P35284 and previous config saved to /var/cache/conftool/dbconfig/20221003-075643-root.json
  • 07:54 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35283 and previous config saved to /var/cache/conftool/dbconfig/20221003-075449-root.json
  • 07:52 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 25%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35282 and previous config saved to /var/cache/conftool/dbconfig/20221003-075214-root.json
  • 07:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35281 and previous config saved to /var/cache/conftool/dbconfig/20221003-075051-root.json
  • 07:49 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35280 and previous config saved to /var/cache/conftool/dbconfig/20221003-074946-root.json
  • 07:42 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 16637
  • 07:42 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'email' for AS: 16637
  • 07:39 marostegui@cumin1001: dbctl commit (dc=all): 'db1200 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35279 and previous config saved to /var/cache/conftool/dbconfig/20221003-073944-root.json
  • 07:37 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 10%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35278 and previous config saved to /var/cache/conftool/dbconfig/20221003-073709-root.json
  • 07:36 marostegui@cumin1001: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
  • 07:36 XioNoX: cr2-drmrs# set chassis fpc 0 sampling-instance pmacct
  • 07:36 marostegui@cumin1001: START - Cookbook sre.hosts.downtime for 1:00:00 on db1200.eqiad.wmnet with reason: Upgrade to 10.6
  • 07:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35277 and previous config saved to /var/cache/conftool/dbconfig/20221003-073627-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1200', diff saved to https://phabricator.wikimedia.org/P35276 and previous config saved to /var/cache/conftool/dbconfig/20221003-073556-root.json
  • 07:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35275 and previous config saved to /var/cache/conftool/dbconfig/20221003-073546-root.json
  • 07:34 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35274 and previous config saved to /var/cache/conftool/dbconfig/20221003-073441-root.json
  • 07:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35273 and previous config saved to /var/cache/conftool/dbconfig/20221003-072741-root.json
  • 07:22 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 5%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35272 and previous config saved to /var/cache/conftool/dbconfig/20221003-072204-root.json
  • 07:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35271 and previous config saved to /var/cache/conftool/dbconfig/20221003-072122-root.json
  • 07:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35270 and previous config saved to /var/cache/conftool/dbconfig/20221003-072041-root.json
  • 07:19 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35269 and previous config saved to /var/cache/conftool/dbconfig/20221003-071936-root.json
  • 07:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35268 and previous config saved to /var/cache/conftool/dbconfig/20221003-071236-root.json
  • 07:07 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 3%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35267 and previous config saved to /var/cache/conftool/dbconfig/20221003-070659-root.json
  • 07:06 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35266 and previous config saved to /var/cache/conftool/dbconfig/20221003-070617-root.json
  • 07:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35265 and previous config saved to /var/cache/conftool/dbconfig/20221003-070536-root.json
  • 07:04 marostegui@cumin1001: dbctl commit (dc=all): 'db2175 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35264 and previous config saved to /var/cache/conftool/dbconfig/20221003-070431-root.json
  • 06:58 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db2175', diff saved to https://phabricator.wikimedia.org/P35263 and previous config saved to /var/cache/conftool/dbconfig/20221003-065844-root.json
  • 06:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35262 and previous config saved to /var/cache/conftool/dbconfig/20221003-065731-root.json
  • 06:52 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 6128
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1182 (re)pooling @ 1%: After upgrade to 10.6', diff saved to https://phabricator.wikimedia.org/P35261 and previous config saved to /var/cache/conftool/dbconfig/20221003-065154-root.json
  • 06:51 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35260 and previous config saved to /var/cache/conftool/dbconfig/20221003-065112-root.json
  • 06:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35259 and previous config saved to /var/cache/conftool/dbconfig/20221003-065031-root.json
  • 06:48 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 6128
  • 06:46 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1182', diff saved to https://phabricator.wikimedia.org/P35258 and previous config saved to /var/cache/conftool/dbconfig/20221003-064638-root.json
  • 06:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35257 and previous config saved to /var/cache/conftool/dbconfig/20221003-064226-root.json
  • 06:36 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35256 and previous config saved to /var/cache/conftool/dbconfig/20221003-063607-root.json
  • 06:35 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35255 and previous config saved to /var/cache/conftool/dbconfig/20221003-063527-root.json
  • 06:30 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 11039
  • 06:30 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 11039
  • 06:27 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35254 and previous config saved to /var/cache/conftool/dbconfig/20221003-062721-root.json
  • 06:27 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 5400
  • 06:26 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 5400
  • 06:21 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35253 and previous config saved to /var/cache/conftool/dbconfig/20221003-062102-root.json
  • 06:20 marostegui@cumin1001: dbctl commit (dc=all): 'db1158 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35252 and previous config saved to /var/cache/conftool/dbconfig/20221003-062022-root.json
  • 06:15 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 3300
  • 06:13 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 3300
  • 06:12 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35251 and previous config saved to /var/cache/conftool/dbconfig/20221003-061216-root.json
  • 06:07 ayounsi@cumin1001: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15133
  • 06:05 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35250 and previous config saved to /var/cache/conftool/dbconfig/20221003-060557-root.json
  • 06:04 ayounsi@cumin1001: START - Cookbook sre.network.peering with action 'configure' for AS: 15133
  • 05:57 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 3%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35249 and previous config saved to /var/cache/conftool/dbconfig/20221003-055711-root.json
  • 05:54 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1158', diff saved to https://phabricator.wikimedia.org/P35248 and previous config saved to /var/cache/conftool/dbconfig/20221003-055401-root.json
  • 05:50 marostegui@cumin1001: dbctl commit (dc=all): 'db1167 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35247 and previous config saved to /var/cache/conftool/dbconfig/20221003-055052-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1167', diff saved to https://phabricator.wikimedia.org/P35246 and previous config saved to /var/cache/conftool/dbconfig/20221003-054245-root.json
  • 05:42 marostegui@cumin1001: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: After upgrade', diff saved to https://phabricator.wikimedia.org/P35245 and previous config saved to /var/cache/conftool/dbconfig/20221003-054206-root.json
  • 05:29 marostegui@cumin1001: dbctl commit (dc=all): 'Depool db1179', diff saved to https://phabricator.wikimedia.org/P35244 and previous config saved to /var/cache/conftool/dbconfig/20221003-052927-root.json

2022-10-02

  • 08:13 elukey: `apt-get clean` on an-airflow1001 to free some space on the root partition

2022-10-01

  • 13:24 fab@deploy1002: Finished deploy [airflow-dags/research@44a1158]: (no justification provided) (duration: 00m 08s)
  • 13:24 fab@deploy1002: Started deploy [airflow-dags/research@44a1158]: (no justification provided)
  • 13:12 fab@deploy1002: Finished deploy [airflow-dags/research@d6b3e82]: (no justification provided) (duration: 03m 35s)
  • 13:08 fab@deploy1002: Started deploy [airflow-dags/research@d6b3e82]: (no justification provided)

Other archives

2000s

2010s

2020s