Jump to content

Server Admin Log/Archive 96

From Wikitech

2025-08-31

  • 13:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T402925)', diff saved to https://phabricator.wikimedia.org/P82281 and previous config saved to /var/cache/conftool/dbconfig/20250831-133713-ladsgroup.json
  • 13:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P82280 and previous config saved to /var/cache/conftool/dbconfig/20250831-132205-ladsgroup.json
  • 13:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P82279 and previous config saved to /var/cache/conftool/dbconfig/20250831-130657-ladsgroup.json
  • 12:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T402925)', diff saved to https://phabricator.wikimedia.org/P82278 and previous config saved to /var/cache/conftool/dbconfig/20250831-125150-ladsgroup.json
  • 12:24 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2238 (T402925)', diff saved to https://phabricator.wikimedia.org/P82277 and previous config saved to /var/cache/conftool/dbconfig/20250831-122416-ladsgroup.json
  • 12:24 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2238.codfw.wmnet with reason: Maintenance
  • 12:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T402925)', diff saved to https://phabricator.wikimedia.org/P82276 and previous config saved to /var/cache/conftool/dbconfig/20250831-122353-ladsgroup.json
  • 12:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P82275 and previous config saved to /var/cache/conftool/dbconfig/20250831-120846-ladsgroup.json
  • 11:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P82274 and previous config saved to /var/cache/conftool/dbconfig/20250831-115338-ladsgroup.json
  • 11:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T402925)', diff saved to https://phabricator.wikimedia.org/P82273 and previous config saved to /var/cache/conftool/dbconfig/20250831-113830-ladsgroup.json
  • 11:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2226 (T402925)', diff saved to https://phabricator.wikimedia.org/P82272 and previous config saved to /var/cache/conftool/dbconfig/20250831-113606-ladsgroup.json
  • 11:35 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2226.codfw.wmnet with reason: Maintenance
  • 11:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T402925)', diff saved to https://phabricator.wikimedia.org/P82271 and previous config saved to /var/cache/conftool/dbconfig/20250831-113542-ladsgroup.json
  • 11:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P82270 and previous config saved to /var/cache/conftool/dbconfig/20250831-112035-ladsgroup.json
  • 10:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2225 (T402925)', diff saved to https://phabricator.wikimedia.org/P82267 and previous config saved to /var/cache/conftool/dbconfig/20250831-102336-ladsgroup.json
  • 10:23 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2225.codfw.wmnet with reason: Maintenance
  • 09:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T402925)', diff saved to https://phabricator.wikimedia.org/P82266 and previous config saved to /var/cache/conftool/dbconfig/20250831-093753-ladsgroup.json
  • 09:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2204 (T402925)', diff saved to https://phabricator.wikimedia.org/P82265 and previous config saved to /var/cache/conftool/dbconfig/20250831-093628-ladsgroup.json
  • 09:36 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 09:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T402925)', diff saved to https://phabricator.wikimedia.org/P82264 and previous config saved to /var/cache/conftool/dbconfig/20250831-090824-ladsgroup.json
  • 08:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P82263 and previous config saved to /var/cache/conftool/dbconfig/20250831-085316-ladsgroup.json
  • 08:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P82262 and previous config saved to /var/cache/conftool/dbconfig/20250831-083808-ladsgroup.json
  • 08:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T402925)', diff saved to https://phabricator.wikimedia.org/P82261 and previous config saved to /var/cache/conftool/dbconfig/20250831-082301-ladsgroup.json
  • 07:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2189 (T402925)', diff saved to https://phabricator.wikimedia.org/P82260 and previous config saved to /var/cache/conftool/dbconfig/20250831-075648-ladsgroup.json
  • 07:56 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 07:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T402925)', diff saved to https://phabricator.wikimedia.org/P82259 and previous config saved to /var/cache/conftool/dbconfig/20250831-075624-ladsgroup.json
  • 07:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P82258 and previous config saved to /var/cache/conftool/dbconfig/20250831-074117-ladsgroup.json
  • 07:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P82257 and previous config saved to /var/cache/conftool/dbconfig/20250831-072610-ladsgroup.json
  • 07:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T402925)', diff saved to https://phabricator.wikimedia.org/P82256 and previous config saved to /var/cache/conftool/dbconfig/20250831-071102-ladsgroup.json
  • 06:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2175 (T402925)', diff saved to https://phabricator.wikimedia.org/P82255 and previous config saved to /var/cache/conftool/dbconfig/20250831-064052-ladsgroup.json
  • 06:40 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 06:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T402925)', diff saved to https://phabricator.wikimedia.org/P82254 and previous config saved to /var/cache/conftool/dbconfig/20250831-064028-ladsgroup.json
  • 06:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P82253 and previous config saved to /var/cache/conftool/dbconfig/20250831-062521-ladsgroup.json
  • 06:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P82252 and previous config saved to /var/cache/conftool/dbconfig/20250831-061013-ladsgroup.json
  • 05:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T402925)', diff saved to https://phabricator.wikimedia.org/P82251 and previous config saved to /var/cache/conftool/dbconfig/20250831-055506-ladsgroup.json
  • 05:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2148 (T402925)', diff saved to https://phabricator.wikimedia.org/P82250 and previous config saved to /var/cache/conftool/dbconfig/20250831-052048-ladsgroup.json
  • 05:20 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 04:50 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 04:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 (T402925)', diff saved to https://phabricator.wikimedia.org/P82249 and previous config saved to /var/cache/conftool/dbconfig/20250831-045032-ladsgroup.json
  • 04:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P82248 and previous config saved to /var/cache/conftool/dbconfig/20250831-043524-ladsgroup.json
  • 04:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P82247 and previous config saved to /var/cache/conftool/dbconfig/20250831-042017-ladsgroup.json
  • 04:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 (T402925)', diff saved to https://phabricator.wikimedia.org/P82246 and previous config saved to /var/cache/conftool/dbconfig/20250831-040509-ladsgroup.json
  • 03:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1259 (T402925)', diff saved to https://phabricator.wikimedia.org/P82245 and previous config saved to /var/cache/conftool/dbconfig/20250831-033533-ladsgroup.json
  • 03:35 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1259.eqiad.wmnet with reason: Maintenance
  • 03:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T402925)', diff saved to https://phabricator.wikimedia.org/P82244 and previous config saved to /var/cache/conftool/dbconfig/20250831-033510-ladsgroup.json
  • 03:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P82243 and previous config saved to /var/cache/conftool/dbconfig/20250831-032003-ladsgroup.json
  • 03:04 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P82242 and previous config saved to /var/cache/conftool/dbconfig/20250831-030455-ladsgroup.json
  • 02:49 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T402925)', diff saved to https://phabricator.wikimedia.org/P82241 and previous config saved to /var/cache/conftool/dbconfig/20250831-024947-ladsgroup.json
  • 02:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1254 (T402925)', diff saved to https://phabricator.wikimedia.org/P82240 and previous config saved to /var/cache/conftool/dbconfig/20250831-022308-ladsgroup.json
  • 02:23 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1254.eqiad.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 01:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T402925)', diff saved to https://phabricator.wikimedia.org/P82239 and previous config saved to /var/cache/conftool/dbconfig/20250831-015516-ladsgroup.json
  • 01:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P82238 and previous config saved to /var/cache/conftool/dbconfig/20250831-014009-ladsgroup.json
  • 01:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P82237 and previous config saved to /var/cache/conftool/dbconfig/20250831-012501-ladsgroup.json
  • 01:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T402925)', diff saved to https://phabricator.wikimedia.org/P82236 and previous config saved to /var/cache/conftool/dbconfig/20250831-010954-ladsgroup.json
  • 00:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1233 (T402925)', diff saved to https://phabricator.wikimedia.org/P82235 and previous config saved to /var/cache/conftool/dbconfig/20250831-004320-ladsgroup.json
  • 00:43 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 00:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T402925)', diff saved to https://phabricator.wikimedia.org/P82234 and previous config saved to /var/cache/conftool/dbconfig/20250831-004257-ladsgroup.json
  • 00:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P82233 and previous config saved to /var/cache/conftool/dbconfig/20250831-002750-ladsgroup.json
  • 00:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P82232 and previous config saved to /var/cache/conftool/dbconfig/20250831-001242-ladsgroup.json

2025-08-30

  • 23:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T402925)', diff saved to https://phabricator.wikimedia.org/P82231 and previous config saved to /var/cache/conftool/dbconfig/20250830-235735-ladsgroup.json
  • 23:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1229 (T402925)', diff saved to https://phabricator.wikimedia.org/P82230 and previous config saved to /var/cache/conftool/dbconfig/20250830-235521-ladsgroup.json
  • 23:55 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 23:27 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 23:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T402925)', diff saved to https://phabricator.wikimedia.org/P82229 and previous config saved to /var/cache/conftool/dbconfig/20250830-232712-ladsgroup.json
  • 23:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P82228 and previous config saved to /var/cache/conftool/dbconfig/20250830-231204-ladsgroup.json
  • 22:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P82227 and previous config saved to /var/cache/conftool/dbconfig/20250830-225656-ladsgroup.json
  • 22:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T402925)', diff saved to https://phabricator.wikimedia.org/P82226 and previous config saved to /var/cache/conftool/dbconfig/20250830-224149-ladsgroup.json
  • 22:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1197 (T402925)', diff saved to https://phabricator.wikimedia.org/P82225 and previous config saved to /var/cache/conftool/dbconfig/20250830-223936-ladsgroup.json
  • 22:39 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 22:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T402925)', diff saved to https://phabricator.wikimedia.org/P82224 and previous config saved to /var/cache/conftool/dbconfig/20250830-223914-ladsgroup.json
  • 22:24 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P82223 and previous config saved to /var/cache/conftool/dbconfig/20250830-222406-ladsgroup.json
  • 22:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P82222 and previous config saved to /var/cache/conftool/dbconfig/20250830-220859-ladsgroup.json
  • 21:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T402925)', diff saved to https://phabricator.wikimedia.org/P82221 and previous config saved to /var/cache/conftool/dbconfig/20250830-215351-ladsgroup.json
  • 21:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1188 (T402925)', diff saved to https://phabricator.wikimedia.org/P82220 and previous config saved to /var/cache/conftool/dbconfig/20250830-215138-ladsgroup.json
  • 21:51 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 21:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T402925)', diff saved to https://phabricator.wikimedia.org/P82219 and previous config saved to /var/cache/conftool/dbconfig/20250830-215116-ladsgroup.json
  • 21:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P82218 and previous config saved to /var/cache/conftool/dbconfig/20250830-213609-ladsgroup.json
  • 21:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P82217 and previous config saved to /var/cache/conftool/dbconfig/20250830-212101-ladsgroup.json
  • 21:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T402925)', diff saved to https://phabricator.wikimedia.org/P82216 and previous config saved to /var/cache/conftool/dbconfig/20250830-210554-ladsgroup.json
  • 20:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1182 (T402925)', diff saved to https://phabricator.wikimedia.org/P82215 and previous config saved to /var/cache/conftool/dbconfig/20250830-203611-ladsgroup.json
  • 20:36 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 20:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T402925)', diff saved to https://phabricator.wikimedia.org/P82214 and previous config saved to /var/cache/conftool/dbconfig/20250830-203548-ladsgroup.json
  • 20:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P82213 and previous config saved to /var/cache/conftool/dbconfig/20250830-202041-ladsgroup.json
  • 20:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P82212 and previous config saved to /var/cache/conftool/dbconfig/20250830-200533-ladsgroup.json
  • 19:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T402925)', diff saved to https://phabricator.wikimedia.org/P82211 and previous config saved to /var/cache/conftool/dbconfig/20250830-195026-ladsgroup.json
  • 19:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1162 (T402925)', diff saved to https://phabricator.wikimedia.org/P82210 and previous config saved to /var/cache/conftool/dbconfig/20250830-194814-ladsgroup.json
  • 19:48 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 19:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T402925)', diff saved to https://phabricator.wikimedia.org/P82209 and previous config saved to /var/cache/conftool/dbconfig/20250830-194751-ladsgroup.json
  • 19:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P82208 and previous config saved to /var/cache/conftool/dbconfig/20250830-193244-ladsgroup.json
  • 19:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P82207 and previous config saved to /var/cache/conftool/dbconfig/20250830-191736-ladsgroup.json
  • 19:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T402925)', diff saved to https://phabricator.wikimedia.org/P82206 and previous config saved to /var/cache/conftool/dbconfig/20250830-190228-ladsgroup.json
  • 18:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1156 (T402925)', diff saved to https://phabricator.wikimedia.org/P82205 and previous config saved to /var/cache/conftool/dbconfig/20250830-183119-ladsgroup.json
  • 18:31 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 18:31 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 18:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T402925)', diff saved to https://phabricator.wikimedia.org/P82204 and previous config saved to /var/cache/conftool/dbconfig/20250830-180128-ladsgroup.json
  • 17:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P82203 and previous config saved to /var/cache/conftool/dbconfig/20250830-174620-ladsgroup.json
  • 17:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P82202 and previous config saved to /var/cache/conftool/dbconfig/20250830-173113-ladsgroup.json
  • 17:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T402925)', diff saved to https://phabricator.wikimedia.org/P82201 and previous config saved to /var/cache/conftool/dbconfig/20250830-171605-ladsgroup.json
  • 16:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2222 (T402925)', diff saved to https://phabricator.wikimedia.org/P82200 and previous config saved to /var/cache/conftool/dbconfig/20250830-165149-ladsgroup.json
  • 16:51 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2222.codfw.wmnet with reason: Maintenance
  • 16:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T402925)', diff saved to https://phabricator.wikimedia.org/P82199 and previous config saved to /var/cache/conftool/dbconfig/20250830-165126-ladsgroup.json
  • 16:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P82198 and previous config saved to /var/cache/conftool/dbconfig/20250830-163619-ladsgroup.json
  • 16:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P82197 and previous config saved to /var/cache/conftool/dbconfig/20250830-162111-ladsgroup.json
  • 16:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T402925)', diff saved to https://phabricator.wikimedia.org/P82196 and previous config saved to /var/cache/conftool/dbconfig/20250830-160603-ladsgroup.json
  • 15:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2221 (T402925)', diff saved to https://phabricator.wikimedia.org/P82195 and previous config saved to /var/cache/conftool/dbconfig/20250830-154151-ladsgroup.json
  • 15:41 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2221.codfw.wmnet with reason: Maintenance
  • 15:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T402925)', diff saved to https://phabricator.wikimedia.org/P82194 and previous config saved to /var/cache/conftool/dbconfig/20250830-154128-ladsgroup.json
  • 15:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P82193 and previous config saved to /var/cache/conftool/dbconfig/20250830-152621-ladsgroup.json
  • 15:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P82192 and previous config saved to /var/cache/conftool/dbconfig/20250830-151113-ladsgroup.json
  • 14:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T402925)', diff saved to https://phabricator.wikimedia.org/P82191 and previous config saved to /var/cache/conftool/dbconfig/20250830-145606-ladsgroup.json
  • 14:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2218 (T402925)', diff saved to https://phabricator.wikimedia.org/P82190 and previous config saved to /var/cache/conftool/dbconfig/20250830-143150-ladsgroup.json
  • 14:31 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 14:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T402925)', diff saved to https://phabricator.wikimedia.org/P82189 and previous config saved to /var/cache/conftool/dbconfig/20250830-143127-ladsgroup.json
  • 14:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P82188 and previous config saved to /var/cache/conftool/dbconfig/20250830-141619-ladsgroup.json
  • 14:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P82187 and previous config saved to /var/cache/conftool/dbconfig/20250830-140112-ladsgroup.json
  • 13:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T402925)', diff saved to https://phabricator.wikimedia.org/P82186 and previous config saved to /var/cache/conftool/dbconfig/20250830-134604-ladsgroup.json
  • 13:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2208 (T402925)', diff saved to https://phabricator.wikimedia.org/P82185 and previous config saved to /var/cache/conftool/dbconfig/20250830-132148-ladsgroup.json
  • 13:21 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 12:57 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 12:33 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 12:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T402925)', diff saved to https://phabricator.wikimedia.org/P82183 and previous config saved to /var/cache/conftool/dbconfig/20250830-123249-ladsgroup.json
  • 12:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P82182 and previous config saved to /var/cache/conftool/dbconfig/20250830-121741-ladsgroup.json
  • 12:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P82181 and previous config saved to /var/cache/conftool/dbconfig/20250830-120234-ladsgroup.json
  • 11:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T402925)', diff saved to https://phabricator.wikimedia.org/P82180 and previous config saved to /var/cache/conftool/dbconfig/20250830-114726-ladsgroup.json
  • 11:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2182 (T402925)', diff saved to https://phabricator.wikimedia.org/P82179 and previous config saved to /var/cache/conftool/dbconfig/20250830-111913-ladsgroup.json
  • 11:19 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T402925)', diff saved to https://phabricator.wikimedia.org/P82178 and previous config saved to /var/cache/conftool/dbconfig/20250830-111900-ladsgroup.json
  • 11:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P82177 and previous config saved to /var/cache/conftool/dbconfig/20250830-110353-ladsgroup.json
  • 10:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P82176 and previous config saved to /var/cache/conftool/dbconfig/20250830-104845-ladsgroup.json
  • 10:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T402925)', diff saved to https://phabricator.wikimedia.org/P82175 and previous config saved to /var/cache/conftool/dbconfig/20250830-103338-ladsgroup.json
  • 10:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2168 (T402925)', diff saved to https://phabricator.wikimedia.org/P82174 and previous config saved to /var/cache/conftool/dbconfig/20250830-100630-ladsgroup.json
  • 10:06 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 10:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T402925)', diff saved to https://phabricator.wikimedia.org/P82173 and previous config saved to /var/cache/conftool/dbconfig/20250830-100606-ladsgroup.json
  • 09:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P82172 and previous config saved to /var/cache/conftool/dbconfig/20250830-095059-ladsgroup.json
  • 09:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P82171 and previous config saved to /var/cache/conftool/dbconfig/20250830-093552-ladsgroup.json
  • 09:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T402925)', diff saved to https://phabricator.wikimedia.org/P82170 and previous config saved to /var/cache/conftool/dbconfig/20250830-092044-ladsgroup.json
  • 08:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2159 (T402925)', diff saved to https://phabricator.wikimedia.org/P82169 and previous config saved to /var/cache/conftool/dbconfig/20250830-085311-ladsgroup.json
  • 08:53 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 08:52 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T402925)', diff saved to https://phabricator.wikimedia.org/P82168 and previous config saved to /var/cache/conftool/dbconfig/20250830-085248-ladsgroup.json
  • 08:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P82167 and previous config saved to /var/cache/conftool/dbconfig/20250830-083741-ladsgroup.json
  • 08:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P82166 and previous config saved to /var/cache/conftool/dbconfig/20250830-082233-ladsgroup.json
  • 08:07 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T402925)', diff saved to https://phabricator.wikimedia.org/P82165 and previous config saved to /var/cache/conftool/dbconfig/20250830-080726-ladsgroup.json
  • 07:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2150 (T402925)', diff saved to https://phabricator.wikimedia.org/P82164 and previous config saved to /var/cache/conftool/dbconfig/20250830-073953-ladsgroup.json
  • 07:39 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 07:39 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 07:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T402925)', diff saved to https://phabricator.wikimedia.org/P82163 and previous config saved to /var/cache/conftool/dbconfig/20250830-073921-ladsgroup.json
  • 07:24 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P82162 and previous config saved to /var/cache/conftool/dbconfig/20250830-072414-ladsgroup.json
  • 07:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P82161 and previous config saved to /var/cache/conftool/dbconfig/20250830-070906-ladsgroup.json
  • 06:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T402925)', diff saved to https://phabricator.wikimedia.org/P82160 and previous config saved to /var/cache/conftool/dbconfig/20250830-065358-ladsgroup.json
  • 06:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1253 (T402925)', diff saved to https://phabricator.wikimedia.org/P82159 and previous config saved to /var/cache/conftool/dbconfig/20250830-065046-ladsgroup.json
  • 06:50 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1253.eqiad.wmnet with reason: Maintenance
  • 06:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T402925)', diff saved to https://phabricator.wikimedia.org/P82158 and previous config saved to /var/cache/conftool/dbconfig/20250830-065023-ladsgroup.json
  • 06:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P82157 and previous config saved to /var/cache/conftool/dbconfig/20250830-063515-ladsgroup.json
  • 06:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P82156 and previous config saved to /var/cache/conftool/dbconfig/20250830-062007-ladsgroup.json
  • 06:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T402925)', diff saved to https://phabricator.wikimedia.org/P82155 and previous config saved to /var/cache/conftool/dbconfig/20250830-060459-ladsgroup.json
  • 05:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1227 (T402925)', diff saved to https://phabricator.wikimedia.org/P82154 and previous config saved to /var/cache/conftool/dbconfig/20250830-054034-ladsgroup.json
  • 05:40 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 05:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T402925)', diff saved to https://phabricator.wikimedia.org/P82153 and previous config saved to /var/cache/conftool/dbconfig/20250830-054011-ladsgroup.json
  • 05:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P82152 and previous config saved to /var/cache/conftool/dbconfig/20250830-052503-ladsgroup.json
  • 05:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P82151 and previous config saved to /var/cache/conftool/dbconfig/20250830-050956-ladsgroup.json
  • 04:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T402925)', diff saved to https://phabricator.wikimedia.org/P82150 and previous config saved to /var/cache/conftool/dbconfig/20250830-045448-ladsgroup.json
  • 04:49 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1202 (T402925)', diff saved to https://phabricator.wikimedia.org/P82149 and previous config saved to /var/cache/conftool/dbconfig/20250830-044936-ladsgroup.json
  • 04:49 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 04:49 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T402925)', diff saved to https://phabricator.wikimedia.org/P82148 and previous config saved to /var/cache/conftool/dbconfig/20250830-044913-ladsgroup.json
  • 04:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P82147 and previous config saved to /var/cache/conftool/dbconfig/20250830-043406-ladsgroup.json
  • 04:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P82146 and previous config saved to /var/cache/conftool/dbconfig/20250830-041858-ladsgroup.json
  • 04:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T402925)', diff saved to https://phabricator.wikimedia.org/P82145 and previous config saved to /var/cache/conftool/dbconfig/20250830-040350-ladsgroup.json
  • 04:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1194 (T402925)', diff saved to https://phabricator.wikimedia.org/P82144 and previous config saved to /var/cache/conftool/dbconfig/20250830-040139-ladsgroup.json
  • 04:01 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 04:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T402925)', diff saved to https://phabricator.wikimedia.org/P82143 and previous config saved to /var/cache/conftool/dbconfig/20250830-040116-ladsgroup.json
  • 03:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P82142 and previous config saved to /var/cache/conftool/dbconfig/20250830-034609-ladsgroup.json
  • 03:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P82141 and previous config saved to /var/cache/conftool/dbconfig/20250830-033101-ladsgroup.json
  • 03:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T402925)', diff saved to https://phabricator.wikimedia.org/P82140 and previous config saved to /var/cache/conftool/dbconfig/20250830-031553-ladsgroup.json
  • 02:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1191 (T402925)', diff saved to https://phabricator.wikimedia.org/P82139 and previous config saved to /var/cache/conftool/dbconfig/20250830-025740-ladsgroup.json
  • 02:57 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 02:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T402925)', diff saved to https://phabricator.wikimedia.org/P82138 and previous config saved to /var/cache/conftool/dbconfig/20250830-025717-ladsgroup.json
  • 02:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P82137 and previous config saved to /var/cache/conftool/dbconfig/20250830-024210-ladsgroup.json
  • 02:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P82136 and previous config saved to /var/cache/conftool/dbconfig/20250830-022702-ladsgroup.json
  • 02:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T402925)', diff saved to https://phabricator.wikimedia.org/P82135 and previous config saved to /var/cache/conftool/dbconfig/20250830-021154-ladsgroup.json
  • 02:07 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1181 (T402925)', diff saved to https://phabricator.wikimedia.org/P82134 and previous config saved to /var/cache/conftool/dbconfig/20250830-020744-ladsgroup.json
  • 02:07 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 02:07 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T402925)', diff saved to https://phabricator.wikimedia.org/P82133 and previous config saved to /var/cache/conftool/dbconfig/20250830-020720-ladsgroup.json
  • 01:52 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P82132 and previous config saved to /var/cache/conftool/dbconfig/20250830-015213-ladsgroup.json
  • 01:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P82131 and previous config saved to /var/cache/conftool/dbconfig/20250830-013705-ladsgroup.json
  • 01:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T402925)', diff saved to https://phabricator.wikimedia.org/P82130 and previous config saved to /var/cache/conftool/dbconfig/20250830-012158-ladsgroup.json
  • 01:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1174 (T402925)', diff saved to https://phabricator.wikimedia.org/P82129 and previous config saved to /var/cache/conftool/dbconfig/20250830-011446-ladsgroup.json
  • 01:14 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 00:47 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 00:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T402925)', diff saved to https://phabricator.wikimedia.org/P82128 and previous config saved to /var/cache/conftool/dbconfig/20250830-004659-ladsgroup.json
  • 00:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P82127 and previous config saved to /var/cache/conftool/dbconfig/20250830-003151-ladsgroup.json
  • 00:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P82126 and previous config saved to /var/cache/conftool/dbconfig/20250830-001644-ladsgroup.json
  • 00:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T402925)', diff saved to https://phabricator.wikimedia.org/P82125 and previous config saved to /var/cache/conftool/dbconfig/20250830-000136-ladsgroup.json

2025-08-29

  • 23:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1170 (T402925)', diff saved to https://phabricator.wikimedia.org/P82124 and previous config saved to /var/cache/conftool/dbconfig/20250829-233348-ladsgroup.json
  • 23:33 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 23:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T402925)', diff saved to https://phabricator.wikimedia.org/P82123 and previous config saved to /var/cache/conftool/dbconfig/20250829-233324-ladsgroup.json
  • 23:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P82122 and previous config saved to /var/cache/conftool/dbconfig/20250829-231817-ladsgroup.json
  • 23:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P82121 and previous config saved to /var/cache/conftool/dbconfig/20250829-230309-ladsgroup.json
  • 22:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T402925)', diff saved to https://phabricator.wikimedia.org/P82120 and previous config saved to /var/cache/conftool/dbconfig/20250829-224802-ladsgroup.json
  • 22:42 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on es2049.codfw.wmnet with reason: Being provisioned
  • 22:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1158 (T402925)', diff saved to https://phabricator.wikimedia.org/P82119 and previous config saved to /var/cache/conftool/dbconfig/20250829-224151-ladsgroup.json
  • 22:41 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:41 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 21:55 vriley@cumin1003: START - Cookbook sre.hosts.provision for host maps1013.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:54 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host maps1013
  • 21:53 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host maps1013
  • 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:50 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt maps1013 - vriley@cumin1003"
  • 21:50 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt maps1013 - vriley@cumin1003"
  • 21:42 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 21:42 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:40 vriley@cumin1003: START - Cookbook sre.hosts.provision for host maps1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:39 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:34 vriley@cumin1003: START - Cookbook sre.hosts.provision for host maps1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:30 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host maps1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host maps1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:51 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host deploy2003.codfw.wmnet with OS bookworm
  • 16:50 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host deploy2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:49 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host deploy2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:45 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2011.codfw.wmnet with OS bookworm
  • 16:45 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 16:44 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 16:31 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2014.codfw.wmnet with OS bookworm
  • 16:31 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 16:30 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 16:27 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2011.codfw.wmnet with reason: host reimage
  • 16:26 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2013.codfw.wmnet with OS bookworm
  • 16:26 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 16:24 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 16:22 jhancock@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2011.codfw.wmnet with reason: host reimage
  • 16:12 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2014.codfw.wmnet with reason: host reimage
  • 16:08 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2013.codfw.wmnet with reason: host reimage
  • 16:03 jhancock@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2014.codfw.wmnet with reason: host reimage
  • 16:03 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host maps2011.codfw.wmnet with OS bookworm
  • 16:02 jhancock@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2013.codfw.wmnet with reason: host reimage
  • 15:54 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host maps2011.codfw.wmnet with OS bookworm
  • 15:45 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host maps2014.codfw.wmnet with OS bookworm
  • 15:44 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host maps2013.codfw.wmnet with OS bookworm
  • 15:43 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:42 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:24 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host maps2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:24 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host maps2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps2012.codfw.wmnet with OS bookworm
  • 15:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps2012.codfw.wmnet with reason: host reimage
  • 14:59 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps2012.codfw.wmnet with reason: host reimage
  • 14:52 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2011.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:46 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host maps2012.codfw.wmnet with OS bookworm
  • 14:36 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host maps2011.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:30 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host maps2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:23 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host maps2011.codfw.wmnet with OS bookworm
  • 13:47 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:43 dcausse: restarting blazegraph wdqs on wdqs1022 (stuck)
  • 13:43 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 13:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T402925)', diff saved to https://phabricator.wikimedia.org/P82113 and previous config saved to /var/cache/conftool/dbconfig/20250829-134308-ladsgroup.json
  • 13:36 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:28 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P82112 and previous config saved to /var/cache/conftool/dbconfig/20250829-132800-ladsgroup.json
  • 13:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 13:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P82110 and previous config saved to /var/cache/conftool/dbconfig/20250829-131253-ladsgroup.json
  • 13:12 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=codfw
  • 13:06 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1013.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 12:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T402925)', diff saved to https://phabricator.wikimedia.org/P82109 and previous config saved to /var/cache/conftool/dbconfig/20250829-125745-ladsgroup.json
  • 12:41 arnaudb@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Update
  • 12:35 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=eqiad
  • 12:33 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:33 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host failoid2003.codfw.wmnet with OS trixie
  • 12:28 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:28 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:22 pfischer@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:22 pfischer@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on failoid2003.codfw.wmnet with reason: host reimage
  • 12:12 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on failoid2003.codfw.wmnet with reason: host reimage
  • 11:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host failoid2003.codfw.wmnet with OS trixie
  • 11:49 pfischer@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:49 pfischer@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2227 (T402925)', diff saved to https://phabricator.wikimedia.org/P82108 and previous config saved to /var/cache/conftool/dbconfig/20250829-114850-ladsgroup.json
  • 11:48 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2227.codfw.wmnet with reason: Maintenance
  • 11:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T402925)', diff saved to https://phabricator.wikimedia.org/P82107 and previous config saved to /var/cache/conftool/dbconfig/20250829-114827-ladsgroup.json
  • 11:42 pfischer@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:42 pfischer@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:42 pfischer@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:41 pfischer@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P82106 and previous config saved to /var/cache/conftool/dbconfig/20250829-113320-ladsgroup.json
  • 11:29 pfischer@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:29 pfischer@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P82105 and previous config saved to /var/cache/conftool/dbconfig/20250829-111812-ladsgroup.json
  • 11:18 pfischer@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:16 pfischer@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T402925)', diff saved to https://phabricator.wikimedia.org/P82104 and previous config saved to /var/cache/conftool/dbconfig/20250829-110304-ladsgroup.json
  • 10:58 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host failoid2003.codfw.wmnet with OS trixie
  • 10:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host failoid2003.codfw.wmnet with OS trixie
  • 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host failoid1003.eqiad.wmnet
  • 10:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host failoid1003.eqiad.wmnet with OS trixie
  • 09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on failoid1003.eqiad.wmnet with reason: host reimage
  • 09:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2209 (T402925)', diff saved to https://phabricator.wikimedia.org/P82097 and previous config saved to /var/cache/conftool/dbconfig/20250829-095653-ladsgroup.json
  • 09:56 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T402925)', diff saved to https://phabricator.wikimedia.org/P82096 and previous config saved to /var/cache/conftool/dbconfig/20250829-095631-ladsgroup.json
  • 09:56 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:56 taavi: taavi@doc2003 ~ $ sudo rm -rf /srv/doc/cloud/cloud-vps/terraform-cloudvps/ # T403178
  • 09:55 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on failoid1003.eqiad.wmnet with reason: host reimage
  • 09:51 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:48 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:43 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 09:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P82095 and previous config saved to /var/cache/conftool/dbconfig/20250829-094123-ladsgroup.json
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host failoid1003.eqiad.wmnet with OS trixie
  • 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM failoid1003.eqiad.wmnet - jmm@cumin2002"
  • 09:31 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM failoid1003.eqiad.wmnet - jmm@cumin2002"
  • 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) failoid1003.eqiad.wmnet on all recursors
  • 09:31 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache failoid1003.eqiad.wmnet on all recursors
  • 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM failoid1003.eqiad.wmnet - jmm@cumin2002"
  • 09:31 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM failoid1003.eqiad.wmnet - jmm@cumin2002"
  • 09:30 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:27 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:27 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host failoid1003.eqiad.wmnet
  • 09:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P82094 and previous config saved to /var/cache/conftool/dbconfig/20250829-092615-ladsgroup.json
  • 09:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host failoid2003.codfw.wmnet
  • 09:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host failoid2003.codfw.wmnet with OS trixie
  • 09:25 vriley@cumin1003: START - Cookbook sre.hosts.provision for host maps1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:25 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host failoid2003.codfw.wmnet with OS trixie
  • 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add failoid2003 - jmm@cumin2002"
  • 09:24 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add failoid2003 - jmm@cumin2002"
  • 09:24 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:24 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 09:16 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM failoid2003.codfw.wmnet - jmm@cumin2002"
  • 09:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM failoid2003.codfw.wmnet - jmm@cumin2002"
  • 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) failoid2003.codfw.wmnet on all recursors
  • 09:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache failoid2003.codfw.wmnet on all recursors
  • 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM failoid2003.codfw.wmnet - jmm@cumin2002"
  • 09:13 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM failoid2003.codfw.wmnet - jmm@cumin2002"
  • 09:11 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 09:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T402925)', diff saved to https://phabricator.wikimedia.org/P82093 and previous config saved to /var/cache/conftool/dbconfig/20250829-091108-ladsgroup.json
  • 09:08 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:08 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host failoid2003.codfw.wmnet
  • 08:53 vriley@cumin1003: START - Cookbook sre.hosts.provision for host maps1012.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:51 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:51 vriley@cumin1003: START - Cookbook sre.hosts.provision for host maps1011.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:50 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:50 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host maps1012
  • 08:49 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:49 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host maps1012
  • 08:48 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host maps1011
  • 08:48 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 08:47 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host maps1011
  • 08:47 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:44 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 08:44 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:44 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt maps1012 - vriley@cumin1003"
  • 08:44 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt maps1012 - vriley@cumin1003"
  • 08:40 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 08:40 vriley@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 08:38 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 08:27 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:22 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:19 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on install2004.wikimedia.org with reason: being replaced by install2005
  • 08:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2194 (T402925)', diff saved to https://phabricator.wikimedia.org/P82092 and previous config saved to /var/cache/conftool/dbconfig/20250829-080216-ladsgroup.json
  • 08:02 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 08:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T402925)', diff saved to https://phabricator.wikimedia.org/P82091 and previous config saved to /var/cache/conftool/dbconfig/20250829-080153-ladsgroup.json
  • 07:49 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 07:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P82090 and previous config saved to /var/cache/conftool/dbconfig/20250829-074645-ladsgroup.json
  • 07:46 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 07:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P82089 and previous config saved to /var/cache/conftool/dbconfig/20250829-073138-ladsgroup.json
  • 07:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T402925)', diff saved to https://phabricator.wikimedia.org/P82088 and previous config saved to /var/cache/conftool/dbconfig/20250829-071630-ladsgroup.json
  • 06:13 arnaudb@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Update
  • 06:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2190 (T402925)', diff saved to https://phabricator.wikimedia.org/P82087 and previous config saved to /var/cache/conftool/dbconfig/20250829-060644-ladsgroup.json
  • 06:06 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 06:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T402925)', diff saved to https://phabricator.wikimedia.org/P82086 and previous config saved to /var/cache/conftool/dbconfig/20250829-060621-ladsgroup.json
  • 05:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P82085 and previous config saved to /var/cache/conftool/dbconfig/20250829-055113-ladsgroup.json
  • 05:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P82084 and previous config saved to /var/cache/conftool/dbconfig/20250829-053606-ladsgroup.json
  • 05:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T402925)', diff saved to https://phabricator.wikimedia.org/P82083 and previous config saved to /var/cache/conftool/dbconfig/20250829-052059-ladsgroup.json
  • 04:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2177 (T402925)', diff saved to https://phabricator.wikimedia.org/P82082 and previous config saved to /var/cache/conftool/dbconfig/20250829-040849-ladsgroup.json
  • 04:08 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 04:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T402925)', diff saved to https://phabricator.wikimedia.org/P82081 and previous config saved to /var/cache/conftool/dbconfig/20250829-040826-ladsgroup.json
  • 03:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P82080 and previous config saved to /var/cache/conftool/dbconfig/20250829-035319-ladsgroup.json
  • 03:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P82079 and previous config saved to /var/cache/conftool/dbconfig/20250829-033811-ladsgroup.json
  • 03:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T402925)', diff saved to https://phabricator.wikimedia.org/P82078 and previous config saved to /var/cache/conftool/dbconfig/20250829-032304-ladsgroup.json
  • 02:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2156 (T402925)', diff saved to https://phabricator.wikimedia.org/P82077 and previous config saved to /var/cache/conftool/dbconfig/20250829-021120-ladsgroup.json
  • 02:11 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 02:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T402925)', diff saved to https://phabricator.wikimedia.org/P82076 and previous config saved to /var/cache/conftool/dbconfig/20250829-021056-ladsgroup.json
  • 01:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P82075 and previous config saved to /var/cache/conftool/dbconfig/20250829-015549-ladsgroup.json
  • 01:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P82074 and previous config saved to /var/cache/conftool/dbconfig/20250829-014041-ladsgroup.json
  • 01:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T402925)', diff saved to https://phabricator.wikimedia.org/P82073 and previous config saved to /var/cache/conftool/dbconfig/20250829-012534-ladsgroup.json
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 47s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:13 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2149 (T402925)', diff saved to https://phabricator.wikimedia.org/P82072 and previous config saved to /var/cache/conftool/dbconfig/20250829-001328-ladsgroup.json
  • 00:13 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance

2025-08-28

  • 23:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:22 krinkle@deploy1003: Finished scap sync-world: Backport for Enable wmgUseMdotRouting in Beta Cluster for testwiki only (T401595) (duration: 10m 54s)
  • 23:20 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host maps2012.codfw.wmnet with OS bookworm
  • 23:16 krinkle@deploy1003: krinkle: Continuing with sync
  • 23:16 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 23:16 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 23:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host maps2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:13 krinkle@deploy1003: krinkle: Backport for Enable wmgUseMdotRouting in Beta Cluster for testwiki only (T401595) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:13 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 23:12 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 23:12 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 23:11 krinkle@deploy1003: Started scap sync-world: Backport for Enable wmgUseMdotRouting in Beta Cluster for testwiki only (T401595)
  • 23:08 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 23:07 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 23:05 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 23:05 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 22:56 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 22:55 rzl@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 22:36 rzl@deploy1003: Finished scap sync-world: https://gerrit.wikimedia.org/r/1182945 (duration: 06m 50s)
  • 22:31 rzl@deploy1003: Started scap sync-world: https://gerrit.wikimedia.org/r/1182945
  • 22:18 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 22:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T402925)', diff saved to https://phabricator.wikimedia.org/P82071 and previous config saved to /var/cache/conftool/dbconfig/20250828-221753-ladsgroup.json
  • 22:11 maryum: Security deploy for T403093
  • 22:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P82070 and previous config saved to /var/cache/conftool/dbconfig/20250828-220245-ladsgroup.json
  • 22:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host maps2011.codfw.wmnet with OS bookworm
  • 21:59 maryum: Deployed security fix for T402313
  • 21:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P82069 and previous config saved to /var/cache/conftool/dbconfig/20250828-214737-ladsgroup.json
  • 21:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T402925)', diff saved to https://phabricator.wikimedia.org/P82068 and previous config saved to /var/cache/conftool/dbconfig/20250828-213230-ladsgroup.json
  • 21:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1212 (T402925)', diff saved to https://phabricator.wikimedia.org/P82067 and previous config saved to /var/cache/conftool/dbconfig/20250828-212609-ladsgroup.json
  • 21:26 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 21:25 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 21:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T402925)', diff saved to https://phabricator.wikimedia.org/P82066 and previous config saved to /var/cache/conftool/dbconfig/20250828-212528-ladsgroup.json
  • 21:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P82065 and previous config saved to /var/cache/conftool/dbconfig/20250828-211021-ladsgroup.json
  • 20:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P82064 and previous config saved to /var/cache/conftool/dbconfig/20250828-205513-ladsgroup.json
  • 20:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T402925)', diff saved to https://phabricator.wikimedia.org/P82063 and previous config saved to /var/cache/conftool/dbconfig/20250828-204006-ladsgroup.json
  • 20:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1198 (T402925)', diff saved to https://phabricator.wikimedia.org/P82062 and previous config saved to /var/cache/conftool/dbconfig/20250828-203413-ladsgroup.json
  • 20:34 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 20:33 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T402925)', diff saved to https://phabricator.wikimedia.org/P82061 and previous config saved to /var/cache/conftool/dbconfig/20250828-203350-ladsgroup.json
  • 20:20 dancy@deploy1003: Finished scap sync-world: Backport for Set $wgPHPSessionHandling to 'disable' on group0 wikis (T362324) (duration: 13m 56s)
  • 20:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P82060 and previous config saved to /var/cache/conftool/dbconfig/20250828-201843-ladsgroup.json
  • 20:15 dancy@deploy1003: hokwelum, dancy: Continuing with sync
  • 20:10 dancy@deploy1003: hokwelum, dancy: Backport for Set $wgPHPSessionHandling to 'disable' on group0 wikis (T362324) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:06 dancy@deploy1003: Started scap sync-world: Backport for Set $wgPHPSessionHandling to 'disable' on group0 wikis (T362324)
  • 20:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P82059 and previous config saved to /var/cache/conftool/dbconfig/20250828-200335-ladsgroup.json
  • 19:54 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2010.codfw.wmnet with reason: sleep test
  • 19:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T402925)', diff saved to https://phabricator.wikimedia.org/P82058 and previous config saved to /var/cache/conftool/dbconfig/20250828-194828-ladsgroup.json
  • 19:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1189 (T402925)', diff saved to https://phabricator.wikimedia.org/P82057 and previous config saved to /var/cache/conftool/dbconfig/20250828-194234-ladsgroup.json
  • 19:42 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 19:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T402925)', diff saved to https://phabricator.wikimedia.org/P82056 and previous config saved to /var/cache/conftool/dbconfig/20250828-194211-ladsgroup.json
  • 19:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P82055 and previous config saved to /var/cache/conftool/dbconfig/20250828-192704-ladsgroup.json
  • 19:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P82054 and previous config saved to /var/cache/conftool/dbconfig/20250828-191156-ladsgroup.json
  • 18:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T402925)', diff saved to https://phabricator.wikimedia.org/P82053 and previous config saved to /var/cache/conftool/dbconfig/20250828-185648-ladsgroup.json
  • 18:53 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2006.codfw.wmnet with reason: sleep test
  • 18:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1175 (T402925)', diff saved to https://phabricator.wikimedia.org/P82052 and previous config saved to /var/cache/conftool/dbconfig/20250828-185053-ladsgroup.json
  • 18:50 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 18:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T402925)', diff saved to https://phabricator.wikimedia.org/P82051 and previous config saved to /var/cache/conftool/dbconfig/20250828-185031-ladsgroup.json
  • 18:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P82050 and previous config saved to /var/cache/conftool/dbconfig/20250828-183523-ladsgroup.json
  • 18:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P82049 and previous config saved to /var/cache/conftool/dbconfig/20250828-182016-ladsgroup.json
  • 18:05 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd[1011-1015].eqiad.wmnet
  • 18:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T402925)', diff saved to https://phabricator.wikimedia.org/P82048 and previous config saved to /var/cache/conftool/dbconfig/20250828-180508-ladsgroup.json
  • 18:05 andrew@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:05 andrew@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd[1011-1015].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002"
  • 17:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1166 (T402925)', diff saved to https://phabricator.wikimedia.org/P82047 and previous config saved to /var/cache/conftool/dbconfig/20250828-175915-ladsgroup.json
  • 17:59 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 17:58 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T402925)', diff saved to https://phabricator.wikimedia.org/P82046 and previous config saved to /var/cache/conftool/dbconfig/20250828-175852-ladsgroup.json
  • 17:44 andrew@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd[1011-1015].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002"
  • 17:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P82045 and previous config saved to /var/cache/conftool/dbconfig/20250828-174345-ladsgroup.json
  • 17:31 andrew@cumin2002: START - Cookbook sre.dns.netbox
  • 17:28 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P82044 and previous config saved to /var/cache/conftool/dbconfig/20250828-172837-ladsgroup.json
  • 17:28 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:27 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:27 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:26 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:23 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:22 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:13 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T402925)', diff saved to https://phabricator.wikimedia.org/P82043 and previous config saved to /var/cache/conftool/dbconfig/20250828-171330-ladsgroup.json
  • 17:12 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 17:12 rzl@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 17:10 andrew@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcephosd[1011-1015].eqiad.wmnet
  • 17:07 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1157 (T402925)', diff saved to https://phabricator.wikimedia.org/P82042 and previous config saved to /var/cache/conftool/dbconfig/20250828-170736-ladsgroup.json
  • 17:07 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 17:07 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1010.eqiad.wmnet
  • 17:07 andrew@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:07 andrew@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002"
  • 17:01 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy improvements to policy validation handling - swfrench@cumin2002"
  • 17:01 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy improvements to policy validation handling - swfrench@cumin2002
  • 17:01 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy improvements to policy validation handling - swfrench@cumin2002
  • 17:01 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy improvements to policy validation handling - swfrench@cumin2002"
  • 16:36 amastilovic@deploy1003: Finished deploy [analytics/refinery@b967872] (thin): Urgent updates to Sqoop THIN [analytics/refinery@b9678720] (duration: 00m 49s)
  • 16:35 amastilovic@deploy1003: Started deploy [analytics/refinery@b967872] (thin): Urgent updates to Sqoop THIN [analytics/refinery@b9678720]
  • 16:34 amastilovic@deploy1003: Finished deploy [analytics/refinery@b967872]: Urgent updates to Sqoop [analytics/refinery@b9678720] (duration: 00m 36s)
  • 16:34 amastilovic@deploy1003: Started deploy [analytics/refinery@b967872]: Urgent updates to Sqoop [analytics/refinery@b9678720]
  • 16:30 amastilovic@deploy1003: Finished deploy [analytics/refinery@b967872]: Urgent updates to Sqoop [analytics/refinery@b9678720] (duration: 02m 01s)
  • 16:28 amastilovic@deploy1003: Started deploy [analytics/refinery@b967872]: Urgent updates to Sqoop [analytics/refinery@b9678720]
  • 16:28 amastilovic@deploy1003: Finished deploy [analytics/refinery@b967872] (hadoop-test): Urgent updates to Sqoop TEST [analytics/refinery@b9678720] (duration: 00m 51s)
  • 16:27 amastilovic@deploy1003: Started deploy [analytics/refinery@b967872] (hadoop-test): Urgent updates to Sqoop TEST [analytics/refinery@b9678720]
  • 16:16 andrew@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002"
  • 16:09 andrew@cumin2002: START - Cookbook sre.dns.netbox
  • 16:08 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:04 andrew@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1010.eqiad.wmnet
  • 16:02 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd[1005-1009].eqiad.wmnet
  • 16:02 andrew@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:02 andrew@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd[1005-1009].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002"
  • 16:02 andrew@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd[1005-1009].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002"
  • 15:57 andrew@cumin2002: START - Cookbook sre.dns.netbox
  • 15:55 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:54 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:54 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:54 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:54 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:54 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:43 mforns@deploy1003: Finished deploy [analytics/refinery@81abaad]: Deploy automated traffic detection improvements [analytics/refinery@81abaad3] (duration: 59m 46s)
  • 15:41 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Set a default value for remoteip (T379179) (duration: 10m 24s)
  • 15:36 kharlan@deploy1003: kharlan: Continuing with sync
  • 15:35 vgutierrez: repool cp7001
  • 15:35 andrew@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcephosd[1005-1009].eqiad.wmnet
  • 15:34 kharlan@deploy1003: kharlan: Backport for hCaptcha: Set a default value for remoteip (T379179) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:31 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2043.codfw.wmnet with OS bookworm
  • 15:31 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 15:30 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Set a default value for remoteip (T379179)
  • 15:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T401906)', diff saved to https://phabricator.wikimedia.org/P82041 and previous config saved to /var/cache/conftool/dbconfig/20250828-152839-fceratto.json
  • 15:27 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 15:27 vgutierrez: depool cp7001 before deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1182834
  • 15:26 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1013.eqiad.wmnet with OS bookworm
  • 15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P82039 and previous config saved to /var/cache/conftool/dbconfig/20250828-151331-fceratto.json
  • 15:09 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2043.codfw.wmnet with reason: host reimage
  • 15:04 ammarpad@deploy1003: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=frwiki --logwiki=metawiki Chaimabiosport 'Renamed user 5e8f10bb5b5df1e083b4b7b255d947ea' # T403158
  • 15:03 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2043.codfw.wmnet with reason: host reimage
  • 14:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P82037 and previous config saved to /var/cache/conftool/dbconfig/20250828-145824-fceratto.json
  • 14:48 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm
  • 14:44 mforns@deploy1003: Started deploy [analytics/refinery@81abaad]: Deploy automated traffic detection improvements [analytics/refinery@81abaad3]
  • 14:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T401906)', diff saved to https://phabricator.wikimedia.org/P82036 and previous config saved to /var/cache/conftool/dbconfig/20250828-144316-fceratto.json
  • 14:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2238 (T401906)', diff saved to https://phabricator.wikimedia.org/P82035 and previous config saved to /var/cache/conftool/dbconfig/20250828-144050-fceratto.json
  • 14:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2238.codfw.wmnet with reason: Maintenance
  • 14:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T401906)', diff saved to https://phabricator.wikimedia.org/P82034 and previous config saved to /var/cache/conftool/dbconfig/20250828-144027-fceratto.json
  • 14:32 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: foreachwikiindblist sul CentralAuth:FixRenameUserLocalLogs --logwiki=metawiki # T398177 (dry run)
  • 14:32 Emperor: remove dbg packages & repool ms-fe2010 T360913
  • 14:31 mforns@deploy1003: Finished deploy [analytics/refinery@81abaad] (thin): Deploy automated traffic detection improvements [analytics/refinery@81abaad3] (duration: 01m 08s)
  • 14:29 mforns@deploy1003: Started deploy [analytics/refinery@81abaad] (thin): Deploy automated traffic detection improvements [analytics/refinery@81abaad3]
  • 14:28 mforns@deploy1003: Finished deploy [analytics/refinery@81abaad]: Deploy automated traffic detection improvements [analytics/refinery@81abaad3] (duration: 06m 01s)
  • 14:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P82029 and previous config saved to /var/cache/conftool/dbconfig/20250828-142520-fceratto.json
  • 14:22 mforns@deploy1003: Started deploy [analytics/refinery@81abaad]: Deploy automated traffic detection improvements [analytics/refinery@81abaad3]
  • 14:22 Emperor: install python3.9-dbg for temporary debugging ms-fe2010 T360913
  • 14:22 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudcephosd1004.eqiad.wmnet
  • 14:22 andrew@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:22 andrew@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002"
  • 14:21 andrew@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudcephosd1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin2002"
  • 14:20 elukey@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=kartotherian,name=eqiad
  • 14:20 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 14:18 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 14:17 andrew@cumin2002: START - Cookbook sre.dns.netbox
  • 14:14 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --since=20250101000000 # T313900 (dry run)
  • 14:14 daimona@deploy1003: mwscript-k8s job started: CampaignEvents:UpdateCountriesColumn --wiki metawiki --nowarn # T402239
  • 14:10 andrew@cumin2002: START - Cookbook sre.hosts.decommission for hosts cloudcephosd1004.eqiad.wmnet
  • 14:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P82027 and previous config saved to /var/cache/conftool/dbconfig/20250828-141012-fceratto.json
  • 14:08 urbanecm@deploy1003: Finished scap sync-world: Backport for [Growth] wikidata: Do not disable Help panel (T400937) (duration: 09m 48s)
  • 14:06 daimona@deploy1003: mwscript-k8s job started: CampaignEvents:UpdateCountriesColumn --wiki officewiki --nowarn # T402239
  • 14:04 daimona@deploy1003: mwscript-k8s job started: CampaignEvents:UpdateCountriesColumn --wiki test2wiki --nowarn # T402239
  • 14:03 urbanecm@deploy1003: urbanecm: Continuing with sync
  • 14:02 urbanecm@deploy1003: urbanecm: Backport for [Growth] wikidata: Do not disable Help panel (T400937) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:01 Daimona: mwscript-k8s --comment="T402239" -f -- CampaignEvents:UpdateCountriesColumn --wiki testwiki
  • 13:58 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] wikidata: Do not disable Help panel (T400937)
  • 13:55 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for GrowthExperiments: remove unused wgGENewcomerTasksTopicType, Enable the CampaignEvents extension on all the remaining Wikisources (T402329) (duration: 12m 53s)
  • 13:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T401906)', diff saved to https://phabricator.wikimedia.org/P82022 and previous config saved to /var/cache/conftool/dbconfig/20250828-135505-fceratto.json
  • 13:54 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --since=20240101000000 --until=20250101000000 # T313900 (dry run)
  • 13:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2226 (T401906)', diff saved to https://phabricator.wikimedia.org/P82021 and previous config saved to /var/cache/conftool/dbconfig/20250828-135240-fceratto.json
  • 13:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2226.codfw.wmnet with reason: Maintenance
  • 13:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T401906)', diff saved to https://phabricator.wikimedia.org/P82020 and previous config saved to /var/cache/conftool/dbconfig/20250828-135217-fceratto.json
  • 13:49 lucaswerkmeister-wmde@deploy1003: daimona, lucaswerkmeister-wmde, migr: Continuing with sync
  • 13:48 lucaswerkmeister-wmde@deploy1003: daimona, lucaswerkmeister-wmde, migr: Backport for GrowthExperiments: remove unused wgGENewcomerTasksTopicType, Enable the CampaignEvents extension on all the remaining Wikisources (T402329) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:45 moritzm: upgrading puppetboard to Envoy 1.26.8 T402584
  • 13:42 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for GrowthExperiments: remove unused wgGENewcomerTasksTopicType, Enable the CampaignEvents extension on all the remaining Wikisources (T402329)
  • 13:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P82019 and previous config saved to /var/cache/conftool/dbconfig/20250828-133709-fceratto.json
  • 13:35 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for hawiki: revert temporary logo (T376049) (duration: 10m 17s)
  • 13:34 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2009.codfw.wmnet with OS bookworm
  • 13:34 fabfur: updating haproxykafka to version 0.3.15 on A:cp (https://gitlab.wikimedia.org/repos/sre/haproxykafka/-/merge_requests/99) (T403174)
  • 13:33 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --since=20230101000000 --until=20240101000000 # T313900 (dry run)
  • 13:29 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, anzx: Continuing with sync
  • 13:28 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, anzx: Backport for hawiki: revert temporary logo (T376049) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T402925)', diff saved to https://phabricator.wikimedia.org/P82018 and previous config saved to /var/cache/conftool/dbconfig/20250828-132738-ladsgroup.json
  • 13:24 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for hawiki: revert temporary logo (T376049)
  • 13:24 fabfur: updating haproxykafka to version 0.3.15 on cp3066 (test host) (https://gitlab.wikimedia.org/repos/sre/haproxykafka/-/merge_requests/99) (T403174)
  • 13:22 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: namespaceDupes tlwiktionary --fix # T402725
  • 13:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P82017 and previous config saved to /var/cache/conftool/dbconfig/20250828-132202-fceratto.json
  • 13:21 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki --since=20220310000000 --until=20230101000000 # T313900 (dry run)
  • 13:20 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for gotwiki: update wordmark and add tagline (T402706), tlwiktionary: set sitename and projectnamespace (T402725) (duration: 14m 06s)
  • 13:17 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2009.codfw.wmnet with reason: host reimage
  • 13:15 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, anzx: Continuing with sync
  • 13:12 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, anzx: Backport for gotwiki: update wordmark and add tagline (T402706), tlwiktionary: set sitename and projectnamespace (T402725) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:12 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2009.codfw.wmnet with reason: host reimage
  • 13:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P82016 and previous config saved to /var/cache/conftool/dbconfig/20250828-131231-ladsgroup.json
  • 13:11 Emperor: restart swift on ms-fe2012 T360913
  • 13:08 Emperor: restart swift on ms-fe2011 T360913
  • 13:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T401906)', diff saved to https://phabricator.wikimedia.org/P82015 and previous config saved to /var/cache/conftool/dbconfig/20250828-130654-fceratto.json
  • 13:06 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for gotwiki: update wordmark and add tagline (T402706), tlwiktionary: set sitename and projectnamespace (T402725)
  • 13:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2225 (T401906)', diff saved to https://phabricator.wikimedia.org/P82014 and previous config saved to /var/cache/conftool/dbconfig/20250828-130429-fceratto.json
  • 13:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2225.codfw.wmnet with reason: Maintenance
  • 13:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm
  • 13:00 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 13:00 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:59 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:59 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:58 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:58 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:57 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P82013 and previous config saved to /var/cache/conftool/dbconfig/20250828-125724-ladsgroup.json
  • 12:51 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2009.codfw.wmnet with OS bookworm
  • 12:49 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Enable processing of the risk score (T379179) (duration: 14m 13s)
  • 12:49 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm
  • 12:44 kharlan@deploy1003: kharlan: Continuing with sync
  • 12:43 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1013.eqiad.wmnet with reason: host reimage
  • 12:42 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T402925)', diff saved to https://phabricator.wikimedia.org/P82012 and previous config saved to /var/cache/conftool/dbconfig/20250828-124216-ladsgroup.json
  • 12:41 kharlan@deploy1003: kharlan: Backport for hCaptcha: Enable processing of the risk score (T379179) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:37 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1013.eqiad.wmnet with reason: host reimage
  • 12:35 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Enable processing of the risk score (T379179)
  • 12:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2224 (T402925)', diff saved to https://phabricator.wikimedia.org/P82011 and previous config saved to /var/cache/conftool/dbconfig/20250828-122608-ladsgroup.json
  • 12:26 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 12:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T402925)', diff saved to https://phabricator.wikimedia.org/P82010 and previous config saved to /var/cache/conftool/dbconfig/20250828-122545-ladsgroup.json
  • 12:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T401906)', diff saved to https://phabricator.wikimedia.org/P82009 and previous config saved to /var/cache/conftool/dbconfig/20250828-121844-fceratto.json
  • 12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T401906)', diff saved to https://phabricator.wikimedia.org/P82008 and previous config saved to /var/cache/conftool/dbconfig/20250828-121720-fceratto.json
  • 12:17 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 12:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 12:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T401906)', diff saved to https://phabricator.wikimedia.org/P82007 and previous config saved to /var/cache/conftool/dbconfig/20250828-121634-fceratto.json
  • 12:14 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1013.eqiad.wmnet with OS bookworm
  • 12:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P82006 and previous config saved to /var/cache/conftool/dbconfig/20250828-121038-ladsgroup.json
  • 12:04 moritzm: upgrading debmonitor to Envoy 1.26.8 T402584
  • 12:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P82005 and previous config saved to /var/cache/conftool/dbconfig/20250828-120128-fceratto.json
  • 11:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P82003 and previous config saved to /var/cache/conftool/dbconfig/20250828-115530-ladsgroup.json
  • 11:52 Amir1: delete from recentchanges where rc_timestamp < '20250725000000'; on all.dblist (T403002)
  • 11:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P82002 and previous config saved to /var/cache/conftool/dbconfig/20250828-114620-fceratto.json
  • 11:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T402925)', diff saved to https://phabricator.wikimedia.org/P82001 and previous config saved to /var/cache/conftool/dbconfig/20250828-114023-ladsgroup.json
  • 11:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T401906)', diff saved to https://phabricator.wikimedia.org/P82000 and previous config saved to /var/cache/conftool/dbconfig/20250828-113113-fceratto.json
  • 11:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T401906)', diff saved to https://phabricator.wikimedia.org/P81999 and previous config saved to /var/cache/conftool/dbconfig/20250828-112847-fceratto.json
  • 11:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 11:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T401906)', diff saved to https://phabricator.wikimedia.org/P81998 and previous config saved to /var/cache/conftool/dbconfig/20250828-112824-fceratto.json
  • 11:24 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2217 (T402925)', diff saved to https://phabricator.wikimedia.org/P81997 and previous config saved to /var/cache/conftool/dbconfig/20250828-112403-ladsgroup.json
  • 11:23 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T402925)', diff saved to https://phabricator.wikimedia.org/P81996 and previous config saved to /var/cache/conftool/dbconfig/20250828-112340-ladsgroup.json
  • 11:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P81995 and previous config saved to /var/cache/conftool/dbconfig/20250828-111317-fceratto.json
  • 11:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P81994 and previous config saved to /var/cache/conftool/dbconfig/20250828-110833-ladsgroup.json
  • 10:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P81993 and previous config saved to /var/cache/conftool/dbconfig/20250828-105810-fceratto.json
  • 10:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P81992 and previous config saved to /var/cache/conftool/dbconfig/20250828-105326-ladsgroup.json
  • 10:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T401906)', diff saved to https://phabricator.wikimedia.org/P81991 and previous config saved to /var/cache/conftool/dbconfig/20250828-104302-fceratto.json
  • 10:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T401906)', diff saved to https://phabricator.wikimedia.org/P81990 and previous config saved to /var/cache/conftool/dbconfig/20250828-104037-fceratto.json
  • 10:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 10:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T401906)', diff saved to https://phabricator.wikimedia.org/P81989 and previous config saved to /var/cache/conftool/dbconfig/20250828-104014-fceratto.json
  • 10:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T402925)', diff saved to https://phabricator.wikimedia.org/P81988 and previous config saved to /var/cache/conftool/dbconfig/20250828-103818-ladsgroup.json
  • 10:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P81987 and previous config saved to /var/cache/conftool/dbconfig/20250828-102507-fceratto.json
  • 10:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2214 (T402925)', diff saved to https://phabricator.wikimedia.org/P81986 and previous config saved to /var/cache/conftool/dbconfig/20250828-102229-ladsgroup.json
  • 10:22 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 10:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P81985 and previous config saved to /var/cache/conftool/dbconfig/20250828-100959-fceratto.json
  • 10:05 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 10:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T402925)', diff saved to https://phabricator.wikimedia.org/P81984 and previous config saved to /var/cache/conftool/dbconfig/20250828-100526-ladsgroup.json
  • 09:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T401906)', diff saved to https://phabricator.wikimedia.org/P81983 and previous config saved to /var/cache/conftool/dbconfig/20250828-095452-fceratto.json
  • 09:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T401906)', diff saved to https://phabricator.wikimedia.org/P81982 and previous config saved to /var/cache/conftool/dbconfig/20250828-095240-fceratto.json
  • 09:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 09:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P81981 and previous config saved to /var/cache/conftool/dbconfig/20250828-095019-ladsgroup.json
  • 09:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P81980 and previous config saved to /var/cache/conftool/dbconfig/20250828-093511-ladsgroup.json
  • 09:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T402925)', diff saved to https://phabricator.wikimedia.org/P81979 and previous config saved to /var/cache/conftool/dbconfig/20250828-092004-ladsgroup.json
  • 09:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2193 (T402925)', diff saved to https://phabricator.wikimedia.org/P81978 and previous config saved to /var/cache/conftool/dbconfig/20250828-091750-ladsgroup.json
  • 09:17 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 09:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T402925)', diff saved to https://phabricator.wikimedia.org/P81977 and previous config saved to /var/cache/conftool/dbconfig/20250828-091727-ladsgroup.json
  • 09:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P81975 and previous config saved to /var/cache/conftool/dbconfig/20250828-090219-ladsgroup.json
  • 08:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P81974 and previous config saved to /var/cache/conftool/dbconfig/20250828-084711-ladsgroup.json
  • 08:46 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 08:42 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 08:37 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 08:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T402925)', diff saved to https://phabricator.wikimedia.org/P81973 and previous config saved to /var/cache/conftool/dbconfig/20250828-083204-ladsgroup.json
  • 08:31 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 08:30 stevemunene@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on dse-k8s-worker2001.codfw.wmnet with reason: Bootstrapping new dse-k8s-codfw-cluster
  • 08:30 stevemunene@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on dse-k8s-worker2002.codfw.wmnet with reason: Bootstrapping new dse-k8s-codfw-cluster
  • 08:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2180 (T402925)', diff saved to https://phabricator.wikimedia.org/P81972 and previous config saved to /var/cache/conftool/dbconfig/20250828-082951-ladsgroup.json
  • 08:29 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 08:29 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T402925)', diff saved to https://phabricator.wikimedia.org/P81971 and previous config saved to /var/cache/conftool/dbconfig/20250828-082928-ladsgroup.json
  • 08:29 stevemunene@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on dse-k8s-ctrl2002.codfw.wmnet with reason: Bootstrapping new dse-k8s-codfw-cluster
  • 08:28 stevemunene@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on dse-k8s-ctrl2001.codfw.wmnet with reason: Bootstrapping new dse-k8s-codfw-cluster
  • 08:27 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 08:19 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 08:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P81970 and previous config saved to /var/cache/conftool/dbconfig/20250828-081420-ladsgroup.json
  • 08:12 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.16 refs T396377
  • 08:07 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 07:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P81969 and previous config saved to /var/cache/conftool/dbconfig/20250828-075912-ladsgroup.json
  • 07:44 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T402925)', diff saved to https://phabricator.wikimedia.org/P81967 and previous config saved to /var/cache/conftool/dbconfig/20250828-074404-ladsgroup.json
  • 07:27 dcausse: T271776: reindexing all lexemes in wikidatawiki
  • 07:27 dcausse: T271776: reindexing all lexemes in testwikidatawiki
  • 07:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2169 (T402925)', diff saved to https://phabricator.wikimedia.org/P81966 and previous config saved to /var/cache/conftool/dbconfig/20250828-072553-ladsgroup.json
  • 07:25 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 07:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T402925)', diff saved to https://phabricator.wikimedia.org/P81965 and previous config saved to /var/cache/conftool/dbconfig/20250828-072530-ladsgroup.json
  • 07:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P81964 and previous config saved to /var/cache/conftool/dbconfig/20250828-071022-ladsgroup.json
  • 06:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P81963 and previous config saved to /var/cache/conftool/dbconfig/20250828-065515-ladsgroup.json
  • 06:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T402925)', diff saved to https://phabricator.wikimedia.org/P81962 and previous config saved to /var/cache/conftool/dbconfig/20250828-064007-ladsgroup.json
  • 06:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2158 (T402925)', diff saved to https://phabricator.wikimedia.org/P81961 and previous config saved to /var/cache/conftool/dbconfig/20250828-062149-ladsgroup.json
  • 06:21 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 06:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T402925)', diff saved to https://phabricator.wikimedia.org/P81960 and previous config saved to /var/cache/conftool/dbconfig/20250828-062127-ladsgroup.json
  • 06:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P81959 and previous config saved to /var/cache/conftool/dbconfig/20250828-060620-ladsgroup.json
  • 05:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P81958 and previous config saved to /var/cache/conftool/dbconfig/20250828-055112-ladsgroup.json
  • 05:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T402925)', diff saved to https://phabricator.wikimedia.org/P81957 and previous config saved to /var/cache/conftool/dbconfig/20250828-053605-ladsgroup.json
  • 05:18 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2151 (T402925)', diff saved to https://phabricator.wikimedia.org/P81956 and previous config saved to /var/cache/conftool/dbconfig/20250828-051805-ladsgroup.json
  • 05:17 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 05:01 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 05:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T402925)', diff saved to https://phabricator.wikimedia.org/P81955 and previous config saved to /var/cache/conftool/dbconfig/20250828-050117-ladsgroup.json
  • 04:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P81954 and previous config saved to /var/cache/conftool/dbconfig/20250828-044610-ladsgroup.json
  • 04:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P81953 and previous config saved to /var/cache/conftool/dbconfig/20250828-043102-ladsgroup.json
  • 04:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T402925)', diff saved to https://phabricator.wikimedia.org/P81952 and previous config saved to /var/cache/conftool/dbconfig/20250828-041555-ladsgroup.json
  • 04:13 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1231 (T402925)', diff saved to https://phabricator.wikimedia.org/P81951 and previous config saved to /var/cache/conftool/dbconfig/20250828-041345-ladsgroup.json
  • 04:13 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 03:57 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 03:56 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T402925)', diff saved to https://phabricator.wikimedia.org/P81950 and previous config saved to /var/cache/conftool/dbconfig/20250828-035653-ladsgroup.json
  • 03:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P81949 and previous config saved to /var/cache/conftool/dbconfig/20250828-034146-ladsgroup.json
  • 03:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P81948 and previous config saved to /var/cache/conftool/dbconfig/20250828-032638-ladsgroup.json
  • 03:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T402925)', diff saved to https://phabricator.wikimedia.org/P81947 and previous config saved to /var/cache/conftool/dbconfig/20250828-031131-ladsgroup.json
  • 03:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1187 (T402925)', diff saved to https://phabricator.wikimedia.org/P81946 and previous config saved to /var/cache/conftool/dbconfig/20250828-031022-ladsgroup.json
  • 03:10 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 03:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T402925)', diff saved to https://phabricator.wikimedia.org/P81945 and previous config saved to /var/cache/conftool/dbconfig/20250828-030959-ladsgroup.json
  • 02:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P81944 and previous config saved to /var/cache/conftool/dbconfig/20250828-025451-ladsgroup.json
  • 02:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P81943 and previous config saved to /var/cache/conftool/dbconfig/20250828-023944-ladsgroup.json
  • 02:24 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T402925)', diff saved to https://phabricator.wikimedia.org/P81942 and previous config saved to /var/cache/conftool/dbconfig/20250828-022436-ladsgroup.json
  • 02:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1180 (T402925)', diff saved to https://phabricator.wikimedia.org/P81941 and previous config saved to /var/cache/conftool/dbconfig/20250828-022326-ladsgroup.json
  • 02:23 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 02:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T402925)', diff saved to https://phabricator.wikimedia.org/P81940 and previous config saved to /var/cache/conftool/dbconfig/20250828-022304-ladsgroup.json
  • 02:07 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P81939 and previous config saved to /var/cache/conftool/dbconfig/20250828-020756-ladsgroup.json
  • 01:52 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P81938 and previous config saved to /var/cache/conftool/dbconfig/20250828-015249-ladsgroup.json
  • 01:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T402925)', diff saved to https://phabricator.wikimedia.org/P81937 and previous config saved to /var/cache/conftool/dbconfig/20250828-013741-ladsgroup.json
  • 01:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1173 (T402925)', diff saved to https://phabricator.wikimedia.org/P81936 and previous config saved to /var/cache/conftool/dbconfig/20250828-013431-ladsgroup.json
  • 01:34 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 01:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T402925)', diff saved to https://phabricator.wikimedia.org/P81935 and previous config saved to /var/cache/conftool/dbconfig/20250828-013408-ladsgroup.json
  • 01:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P81934 and previous config saved to /var/cache/conftool/dbconfig/20250828-011900-ladsgroup.json
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 43s)
  • 01:03 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P81933 and previous config saved to /var/cache/conftool/dbconfig/20250828-010353-ladsgroup.json
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T402925)', diff saved to https://phabricator.wikimedia.org/P81932 and previous config saved to /var/cache/conftool/dbconfig/20250828-004845-ladsgroup.json
  • 00:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1168 (T402925)', diff saved to https://phabricator.wikimedia.org/P81931 and previous config saved to /var/cache/conftool/dbconfig/20250828-004536-ladsgroup.json
  • 00:45 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 00:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T402925)', diff saved to https://phabricator.wikimedia.org/P81930 and previous config saved to /var/cache/conftool/dbconfig/20250828-004513-ladsgroup.json
  • 00:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P81929 and previous config saved to /var/cache/conftool/dbconfig/20250828-003005-ladsgroup.json
  • 00:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P81928 and previous config saved to /var/cache/conftool/dbconfig/20250828-001458-ladsgroup.json

2025-08-27

  • 23:59 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T402925)', diff saved to https://phabricator.wikimedia.org/P81927 and previous config saved to /var/cache/conftool/dbconfig/20250827-235950-ladsgroup.json
  • 23:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1165 (T402925)', diff saved to https://phabricator.wikimedia.org/P81926 and previous config saved to /var/cache/conftool/dbconfig/20250827-235540-ladsgroup.json
  • 23:55 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 23:55 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 23:30 rzl@deploy1003: Finished scap sync-world: https://gerrit.wikimedia.org/r/1182666 (duration: 03m 03s)
  • 23:28 rzl@deploy1003: Started scap sync-world: https://gerrit.wikimedia.org/r/1182666
  • 23:25 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 23:23 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 23:22 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 23:18 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 23:17 rzl@deploy1003: helmfile [staging-codfw] DONE helmfile.d/services/mw-debug: apply
  • 23:17 rzl@deploy1003: helmfile [staging-codfw] START helmfile.d/services/mw-debug: apply
  • 23:16 rzl@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
  • 23:16 rzl@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
  • 23:08 arlolra@deploy1003: Finished scap sync-world: Backport for Revert "Ensure NFC from Language::uc/ucfirst/lc/lcfirst/ucwords/ucwordbreaks" (T403113 T400057) (duration: 11m 40s)
  • 23:03 arlolra@deploy1003: arlolra, cscott: Continuing with sync
  • 23:01 arlolra@deploy1003: arlolra, cscott: Backport for Revert "Ensure NFC from Language::uc/ucfirst/lc/lcfirst/ucwords/ucwordbreaks" (T403113 T400057) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:56 arlolra@deploy1003: Started scap sync-world: Backport for Revert "Ensure NFC from Language::uc/ucfirst/lc/lcfirst/ucwords/ucwordbreaks" (T403113 T400057)
  • 22:49 jdlrobson@deploy1003: Finished scap sync-world: Backport for Consolidate search config to match Minerva (T397084) (duration: 20m 48s)
  • 22:44 jdlrobson@deploy1003: jdlrobson: Continuing with sync
  • 22:34 jdlrobson@deploy1003: jdlrobson: Backport for Consolidate search config to match Minerva (T397084) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:28 jdlrobson@deploy1003: Started scap sync-world: Backport for Consolidate search config to match Minerva (T397084)
  • 21:31 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest2009.codfw.wmnet with OS bookworm
  • 21:28 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm
  • 20:50 bd808@deploy1003: Finished scap sync-world: Backport for FixRenamedUserGlobalEditCount: Add --since and --until parameters (T313900), FixRenamedUserGlobalEditCount: Improve script output (T313900), FixRenameUserLocalLogs: Old username may not be valid (T398177), FixRenameUserLocalLogs: Improve finding local log entries (T398177), FixRenamedUserGlobalEditCount: Add --since and --until parameters (T313900), FixRenamedUserGlobalEditCount: Improve script output (T313900), FixRenameUserLocalLogs: Old username may not be valid (T398177), FixRenameUserLocalLogs: Improve finding local log entries (T398177) (duration: 12m 59s)
  • 20:44 bd808@deploy1003: matmarex, bd808: Continuing with sync
  • 20:43 bd808@deploy1003: matmarex, bd808: Backport for FixRenamedUserGlobalEditCount: Add --since and --until parameters (T313900), FixRenamedUserGlobalEditCount: Improve script output (T313900), FixRenameUserLocalLogs: Old username may not be valid (T398177), FixRenameUserLocalLogs: Improve finding local log entries (T398177), FixRenamedUserGlobalEditCount: Add --since and --until parameters (T313900), FixRenamedUserGlobalEditCount: Improve script output (T313900), FixRenameUserLocalLogs: Old username may not be valid (T398177), FixRenameUserLocalLogs: Improve finding local log entries (T398177) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:37 bd808@deploy1003: Started scap sync-world: Backport for FixRenamedUserGlobalEditCount: Add --since and --until parameters (T313900), FixRenamedUserGlobalEditCount: Improve script output (T313900), FixRenameUserLocalLogs: Old username may not be valid (T398177), FixRenameUserLocalLogs: Improve finding local log entries (T398177), FixRenamedUserGlobalEditCount: Add --since and --until parameters (T313900), FixRenamedUserGlobalEditCount: Improve script output (T313900), FixRenameUserLocalLogs: Old username may not be valid (T398177), FixRenameUserLocalLogs: Improve finding local log entries (T398177)
  • 20:07 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Update
  • 19:57 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Update
  • 19:52 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Update
  • 19:42 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Update
  • 19:14 sukhe@dns1004: END - running authdns-update
  • 19:13 cdobbins@dns1004: START - running authdns-update
  • 19:13 sukhe@dns1004: START - running authdns-update
  • 18:42 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host maps2011.codfw.wmnet with OS bookworm
  • 18:25 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host maps2014.codfw.wmnet with OS bookworm
  • 18:25 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host maps2013.codfw.wmnet with OS bookworm
  • 18:25 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host maps2012.codfw.wmnet with OS bookworm
  • 18:24 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host maps2011.codfw.wmnet with OS bookworm
  • 18:21 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:21 jhancock@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['maps2011']
  • 18:21 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['maps2011']
  • 18:21 arnoldokoth: Upgrade envoyproxy on vrts2002 T402584
  • 18:20 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:20 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:20 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2011.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:20 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host maps2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:20 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host maps2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:19 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host maps2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:19 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host maps2011.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:07 taavi: reprepro: copy helmfile and helm-diff to trixie-wikimedia
  • 18:05 mutante: upgrading envoyproxy on phab2002, lists2001, contint2002 T402584
  • 18:05 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:04 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:04 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:04 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host maps2011.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:46 mutante: upgrading envoyproxy on doc* and etherpad* hosts T402584
  • 17:40 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host maps2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:40 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host maps2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:40 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host maps2012.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:39 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host maps2011.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:39 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host maps2014
  • 17:39 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host maps2013
  • 17:39 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host maps2012
  • 17:39 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host maps2011
  • 17:39 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host maps2014
  • 17:39 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host maps2013
  • 17:39 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host maps2012
  • 17:38 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host maps2011
  • 17:38 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:38 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding maps2011-2014 to codfw - jhancock@cumin1003"
  • 17:38 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding maps2011-2014 to codfw - jhancock@cumin1003"
  • 17:35 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 17:21 mutante: upgrading envoyproxy on releases* and planet* hosts T402584
  • 17:05 mutante: upgrading envoyproxy on aphlict* and zuul* hosts T402584
  • 16:58 mutante: upgrading envoyproxy on people* hosts T402584
  • 16:49 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 16:39 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 16:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T402925)', diff saved to https://phabricator.wikimedia.org/P81924 and previous config saved to /var/cache/conftool/dbconfig/20250827-163928-ladsgroup.json
  • 16:24 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P81923 and previous config saved to /var/cache/conftool/dbconfig/20250827-162420-ladsgroup.json
  • 16:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P81922 and previous config saved to /var/cache/conftool/dbconfig/20250827-160912-ladsgroup.json
  • 16:02 dreamyjazz@deploy1003: Finished scap sync-world: Backport for Temp accounts: Disable logged out editing on wikimaniawiki (T403067) (duration: 17m 29s)
  • 16:01 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum1002.eqiad.wmnet with OS trixie
  • 15:58 James_F: jforrester@deploy1003:~$ foreachwikiindblist wikifunctionsclient sql /srv/mediawiki-staging/php-1.45.0-wmf.16/extensions/WikiLambda/sql/mysql/table-usage.sql # T403079
  • 15:57 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 15:54 sukhe: reprepro -C main include trixie-wikimedia anycast-healthchecker_0.9.8-1+wmf13u1_amd64.changes: T401832
  • 15:54 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T402925)', diff saved to https://phabricator.wikimedia.org/P81920 and previous config saved to /var/cache/conftool/dbconfig/20250827-155405-ladsgroup.json
  • 15:51 dreamyjazz@deploy1003: dreamyjazz: Backport for Temp accounts: Disable logged out editing on wikimaniawiki (T403067) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:45 swfrench-wmf: finished etcd cfssl-PKI migration in eqiad - T352245
  • 15:45 dreamyjazz@deploy1003: Started scap sync-world: Backport for Temp accounts: Disable logged out editing on wikimaniawiki (T403067)
  • 15:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1230 (T402925)', diff saved to https://phabricator.wikimedia.org/P81918 and previous config saved to /var/cache/conftool/dbconfig/20250827-154350-ladsgroup.json
  • 15:43 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 15:43 zabe@deploy1003: Finished scap sync-world: Backport for Use cl_timestamp_id instead of cl_timestamp (T403069), Use cl_timestamp_id instead of cl_timestamp (T403069) (duration: 10m 52s)
  • 15:37 zabe@deploy1003: zabe: Continuing with sync
  • 15:36 zabe@deploy1003: zabe: Backport for Use cl_timestamp_id instead of cl_timestamp (T403069), Use cl_timestamp_id instead of cl_timestamp (T403069) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:32 zabe@deploy1003: Started scap sync-world: Backport for Use cl_timestamp_id instead of cl_timestamp (T403069), Use cl_timestamp_id instead of cl_timestamp (T403069)
  • 15:31 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T402925)', diff saved to https://phabricator.wikimedia.org/P81917 and previous config saved to /var/cache/conftool/dbconfig/20250827-153138-ladsgroup.json
  • 15:25 sgimeno@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:24 sgimeno@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:24 sgimeno@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:22 sgimeno@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:21 sgimeno@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P81916 and previous config saved to /var/cache/conftool/dbconfig/20250827-151630-ladsgroup.json
  • 15:16 sgimeno@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T401906)', diff saved to https://phabricator.wikimedia.org/P81915 and previous config saved to /var/cache/conftool/dbconfig/20250827-151546-fceratto.json
  • 15:09 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad (T352245)
  • 15:05 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad (T352245)
  • 15:04 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic2-eqiad (T352245)
  • 15:04 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic2-eqiad (T352245)
  • 15:03 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-serve1013.eqiad.wmnet with OS bookworm
  • 15:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P81914 and previous config saved to /var/cache/conftool/dbconfig/20250827-150123-ladsgroup.json
  • 15:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P81913 and previous config saved to /var/cache/conftool/dbconfig/20250827-150039-fceratto.json
  • 14:58 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1013.eqiad.wmnet with reason: host reimage
  • 14:54 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1013.eqiad.wmnet with reason: host reimage
  • 14:53 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 14:48 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum1002.eqiad.wmnet with reason: host reimage
  • 14:48 jforrester@deploy1003: Finished scap sync-world: Backport for Enable Wikifunctions client mode on Wiktionaries, Part I (T397401) (duration: 11m 45s)
  • 14:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T402925)', diff saved to https://phabricator.wikimedia.org/P81912 and previous config saved to /var/cache/conftool/dbconfig/20250827-144615-ladsgroup.json
  • 14:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P81911 and previous config saved to /var/cache/conftool/dbconfig/20250827-144532-fceratto.json
  • 14:43 jforrester@deploy1003: jforrester: Continuing with sync
  • 14:42 jforrester@deploy1003: jforrester: Backport for Enable Wikifunctions client mode on Wiktionaries, Part I (T397401) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:39 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:39 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:39 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:38 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:38 swfrench-wmf: starting etcd cfssl-PKI migration in eqiad - T352245
  • 14:38 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:37 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:36 jforrester@deploy1003: Started scap sync-world: Backport for Enable Wikifunctions client mode on Wiktionaries, Part I (T397401)
  • 14:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1207 (T402925)', diff saved to https://phabricator.wikimedia.org/P81910 and previous config saved to /var/cache/conftool/dbconfig/20250827-143602-ladsgroup.json
  • 14:35 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 14:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T402925)', diff saved to https://phabricator.wikimedia.org/P81909 and previous config saved to /var/cache/conftool/dbconfig/20250827-143539-ladsgroup.json
  • 14:33 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:32 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:32 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:31 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1013.eqiad.wmnet with OS bookworm
  • 14:31 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:31 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T401906)', diff saved to https://phabricator.wikimedia.org/P81908 and previous config saved to /var/cache/conftool/dbconfig/20250827-143024-fceratto.json
  • 14:30 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:28 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:27 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:26 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:26 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2228 (T401906)', diff saved to https://phabricator.wikimedia.org/P81907 and previous config saved to /var/cache/conftool/dbconfig/20250827-142650-fceratto.json
  • 14:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2228.codfw.wmnet with reason: Maintenance
  • 14:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T401906)', diff saved to https://phabricator.wikimedia.org/P81906 and previous config saved to /var/cache/conftool/dbconfig/20250827-142638-fceratto.json
  • 14:26 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:26 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:24 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:24 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum1002.eqiad.wmnet with OS trixie
  • 14:24 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:24 jforrester@deploy1003: Finished scap sync-world: Backport for Wikifunctions: Enable Wikidata input types in embedded calls (T397403) (duration: 13m 37s)
  • 14:22 brouberol@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:21 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P81905 and previous config saved to /var/cache/conftool/dbconfig/20250827-142032-ladsgroup.json
  • 14:19 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:18 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 14:18 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 14:18 jforrester@deploy1003: jforrester: Continuing with sync
  • 14:17 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 14:16 jforrester@deploy1003: jforrester: Backport for Wikifunctions: Enable Wikidata input types in embedded calls (T397403) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:16 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 14:15 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 14:14 brouberol@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:13 eevans@deploy1003: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 14:12 brouberol@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P81903 and previous config saved to /var/cache/conftool/dbconfig/20250827-141131-fceratto.json
  • 14:11 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:11 urbanecm@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:10 urbanecm@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:10 jforrester@deploy1003: Started scap sync-world: Backport for Wikifunctions: Enable Wikidata input types in embedded calls (T397403)
  • 14:09 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 14:09 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 14:06 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P81902 and previous config saved to /var/cache/conftool/dbconfig/20250827-140524-ladsgroup.json
  • 14:02 ottomata: deleted eventgate-analytics staging canary release
  • 14:01 urbanecm@deploy1003: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:00 urbanecm@deploy1003: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:00 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:59 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:58 urbanecm@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:57 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:57 urbanecm@deploy1003: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:57 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P81901 and previous config saved to /var/cache/conftool/dbconfig/20250827-135623-fceratto.json
  • 13:51 mvernon@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1066.eqiad.wmnet with reason: vacuum
  • 13:51 Emperor: stop puppet/swift/rsync to vacuum large DBs on ms-be1066 T377827
  • 13:50 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 13:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T402925)', diff saved to https://phabricator.wikimedia.org/P81900 and previous config saved to /var/cache/conftool/dbconfig/20250827-135017-ladsgroup.json
  • 13:50 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 13:50 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 13:48 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 13:48 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 13:47 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 13:47 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 13:46 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 13:46 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 13:45 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 13:44 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 13:44 ottomata: deploying eventgate-analytics and eventgate-logging-external to pick up meta.dt change - T376026
  • 13:44 brouberol@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:44 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 13:43 brouberol@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T401906)', diff saved to https://phabricator.wikimedia.org/P81899 and previous config saved to /var/cache/conftool/dbconfig/20250827-134116-fceratto.json
  • 13:39 brouberol@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:39 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1200 (T402925)', diff saved to https://phabricator.wikimedia.org/P81898 and previous config saved to /var/cache/conftool/dbconfig/20250827-133904-ladsgroup.json
  • 13:38 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 13:38 brouberol@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T402925)', diff saved to https://phabricator.wikimedia.org/P81897 and previous config saved to /var/cache/conftool/dbconfig/20250827-133842-ladsgroup.json
  • 13:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2223 (T401906)', diff saved to https://phabricator.wikimedia.org/P81896 and previous config saved to /var/cache/conftool/dbconfig/20250827-133741-fceratto.json
  • 13:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2223.codfw.wmnet with reason: Maintenance
  • 13:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T401906)', diff saved to https://phabricator.wikimedia.org/P81895 and previous config saved to /var/cache/conftool/dbconfig/20250827-133729-fceratto.json
  • 13:29 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=tegola-vector-tiles,name=codfw
  • 13:28 elukey@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=kartotherian,name=codfw
  • 13:25 hashar: restarted Gerrit
  • 13:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P81894 and previous config saved to /var/cache/conftool/dbconfig/20250827-132334-ladsgroup.json
  • 13:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P81893 and previous config saved to /var/cache/conftool/dbconfig/20250827-132222-fceratto.json
  • 13:16 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Release CampaignEvents extension to all active wikisources (T402329) (duration: 12m 12s)
  • 13:10 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, mhorsey: Continuing with sync
  • 13:10 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, mhorsey: Backport for Release CampaignEvents extension to all active wikisources (T402329) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:08 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P81892 and previous config saved to /var/cache/conftool/dbconfig/20250827-130827-ladsgroup.json
  • 13:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P81891 and previous config saved to /var/cache/conftool/dbconfig/20250827-130714-fceratto.json
  • 13:03 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Release CampaignEvents extension to all active wikisources (T402329)
  • 12:59 brouberol@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:59 brouberol@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:58 moritzm: upgrading envoy on testreduce T402584
  • 12:54 jmm@dns1004: END - running authdns-update
  • 12:53 jmm@dns1004: START - running authdns-update
  • 12:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T402925)', diff saved to https://phabricator.wikimedia.org/P81890 and previous config saved to /var/cache/conftool/dbconfig/20250827-125319-ladsgroup.json
  • 12:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T401906)', diff saved to https://phabricator.wikimedia.org/P81889 and previous config saved to /var/cache/conftool/dbconfig/20250827-125207-fceratto.json
  • 12:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T401906)', diff saved to https://phabricator.wikimedia.org/P81888 and previous config saved to /var/cache/conftool/dbconfig/20250827-124832-fceratto.json
  • 12:48 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 12:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 12:46 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 12:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T401906)', diff saved to https://phabricator.wikimedia.org/P81887 and previous config saved to /var/cache/conftool/dbconfig/20250827-124650-fceratto.json
  • 12:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1185 (T402925)', diff saved to https://phabricator.wikimedia.org/P81886 and previous config saved to /var/cache/conftool/dbconfig/20250827-123708-ladsgroup.json
  • 12:37 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 12:36 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T402925)', diff saved to https://phabricator.wikimedia.org/P81885 and previous config saved to /var/cache/conftool/dbconfig/20250827-123645-ladsgroup.json
  • 12:33 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve1012.eqiad.wmnet with reason: host reimage
  • 12:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P81884 and previous config saved to /var/cache/conftool/dbconfig/20250827-123143-fceratto.json
  • 12:28 brouberol@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:27 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve1012.eqiad.wmnet with reason: host reimage
  • 12:26 brouberol@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P81883 and previous config saved to /var/cache/conftool/dbconfig/20250827-122138-ladsgroup.json
  • 12:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P81882 and previous config saved to /var/cache/conftool/dbconfig/20250827-121635-fceratto.json
  • 12:13 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 12:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2005.codfw.wmnet with OS bookworm
  • 12:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P81881 and previous config saved to /var/cache/conftool/dbconfig/20250827-120630-ladsgroup.json
  • 12:04 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host ml-serve1012.eqiad.wmnet with OS bookworm
  • 12:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T401906)', diff saved to https://phabricator.wikimedia.org/P81880 and previous config saved to /var/cache/conftool/dbconfig/20250827-120128-fceratto.json
  • 11:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T401906)', diff saved to https://phabricator.wikimedia.org/P81879 and previous config saved to /var/cache/conftool/dbconfig/20250827-115854-fceratto.json
  • 11:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 11:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T401906)', diff saved to https://phabricator.wikimedia.org/P81878 and previous config saved to /var/cache/conftool/dbconfig/20250827-115843-fceratto.json
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
  • 11:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T402925)', diff saved to https://phabricator.wikimedia.org/P81876 and previous config saved to /var/cache/conftool/dbconfig/20250827-115122-ladsgroup.json
  • 11:48 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2005.codfw.wmnet with reason: host reimage
  • 11:47 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:47 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:46 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P81875 and previous config saved to /var/cache/conftool/dbconfig/20250827-114335-fceratto.json
  • 11:43 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:42 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1161 (T402925)', diff saved to https://phabricator.wikimedia.org/P81874 and previous config saved to /var/cache/conftool/dbconfig/20250827-113754-ladsgroup.json
  • 11:37 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:37 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 11:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T402925)', diff saved to https://phabricator.wikimedia.org/P81873 and previous config saved to /var/cache/conftool/dbconfig/20250827-113714-ladsgroup.json
  • 11:34 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host testvm2005.codfw.wmnet with OS bookworm
  • 11:29 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2003.codfw.wmnet with OS bookworm
  • 11:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P81872 and previous config saved to /var/cache/conftool/dbconfig/20250827-112827-fceratto.json
  • 11:22 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P81871 and previous config saved to /var/cache/conftool/dbconfig/20250827-112206-ladsgroup.json
  • 11:15 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:13 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:13 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 11:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T401906)', diff saved to https://phabricator.wikimedia.org/P81870 and previous config saved to /var/cache/conftool/dbconfig/20250827-111320-fceratto.json
  • 11:13 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 11:12 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 11:12 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:12 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:11 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 11:11 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 11:09 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T401906)', diff saved to https://phabricator.wikimedia.org/P81869 and previous config saved to /var/cache/conftool/dbconfig/20250827-110948-fceratto.json
  • 11:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 11:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T401906)', diff saved to https://phabricator.wikimedia.org/P81868 and previous config saved to /var/cache/conftool/dbconfig/20250827-110936-fceratto.json
  • 11:09 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 11:09 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:08 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:07 ladsgroup@cumin1003: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 11:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P81867 and previous config saved to /var/cache/conftool/dbconfig/20250827-110659-ladsgroup.json
  • 11:01 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2003.codfw.wmnet with OS bookworm
  • 11:00 ladsgroup@cumin1003: START - Cookbook sre.wikireplicas.update-views
  • 11:00 ladsgroup@cumin1003: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 10:59 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 10:58 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2009.codfw.wmnet with OS bookworm
  • 10:57 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 10:55 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS bookworm
  • 10:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P81866 and previous config saved to /var/cache/conftool/dbconfig/20250827-105428-fceratto.json
  • 10:54 ladsgroup@cumin1003: START - Cookbook sre.wikireplicas.update-views
  • 10:54 slyngs: idm1001.wikimedia.org - Update EnvoyProxy to version 1.26.8 - https://phabricator.wikimedia.org/T402584
  • 10:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T402925)', diff saved to https://phabricator.wikimedia.org/P81865 and previous config saved to /var/cache/conftool/dbconfig/20250827-105151-ladsgroup.json
  • 10:49 slyngs: idm2001.wikimedia.org - Update EnvoyProxy to version 1.26.8 - https://phabricator.wikimedia.org/T402584
  • 10:47 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bookworm
  • 10:44 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2009.codfw.wmnet with OS trixie
  • 10:41 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS trixie
  • 10:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P81863 and previous config saved to /var/cache/conftool/dbconfig/20250827-103921-fceratto.json
  • 10:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1159 (T402925)', diff saved to https://phabricator.wikimedia.org/P81862 and previous config saved to /var/cache/conftool/dbconfig/20250827-103834-ladsgroup.json
  • 10:38 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 10:29 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm
  • 10:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T401906)', diff saved to https://phabricator.wikimedia.org/P81861 and previous config saved to /var/cache/conftool/dbconfig/20250827-102414-fceratto.json
  • 10:20 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T401906)', diff saved to https://phabricator.wikimedia.org/P81860 and previous config saved to /var/cache/conftool/dbconfig/20250827-102041-fceratto.json
  • 10:20 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 10:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T401906)', diff saved to https://phabricator.wikimedia.org/P81859 and previous config saved to /var/cache/conftool/dbconfig/20250827-102029-fceratto.json
  • 10:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P81857 and previous config saved to /var/cache/conftool/dbconfig/20250827-100521-fceratto.json
  • 10:01 moritzm: installing libxslt security updates
  • 09:56 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 09:56 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2009.codfw.wmnet with OS trixie
  • 09:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P81856 and previous config saved to /var/cache/conftool/dbconfig/20250827-095014-fceratto.json
  • 09:49 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2009.codfw.wmnet with OS trixie
  • 09:43 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host testvm2005.codfw.wmnet with OS trixie
  • 09:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T401906)', diff saved to https://phabricator.wikimedia.org/P81855 and previous config saved to /var/cache/conftool/dbconfig/20250827-093507-fceratto.json
  • 09:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T401906)', diff saved to https://phabricator.wikimedia.org/P81854 and previous config saved to /var/cache/conftool/dbconfig/20250827-093239-fceratto.json
  • 09:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 08:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host testvm2005.codfw.wmnet with OS trixie
  • 08:35 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on install2004.wikimedia.org with reason: being replaced by install2005
  • 08:13 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.16 refs T396377
  • 08:03 jmm@dns1004: END - running authdns-update
  • 08:02 jmm@dns1004: START - running authdns-update
  • 06:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T402925)', diff saved to https://phabricator.wikimedia.org/P81851 and previous config saved to /var/cache/conftool/dbconfig/20250827-060103-ladsgroup.json
  • 05:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P81850 and previous config saved to /var/cache/conftool/dbconfig/20250827-054555-ladsgroup.json
  • 05:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P81849 and previous config saved to /var/cache/conftool/dbconfig/20250827-053047-ladsgroup.json
  • 05:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T402925)', diff saved to https://phabricator.wikimedia.org/P81848 and previous config saved to /var/cache/conftool/dbconfig/20250827-051540-ladsgroup.json
  • 05:04 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2228 (T402925)', diff saved to https://phabricator.wikimedia.org/P81846 and previous config saved to /var/cache/conftool/dbconfig/20250827-050446-ladsgroup.json
  • 05:04 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2228.codfw.wmnet with reason: Maintenance
  • 05:04 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T402925)', diff saved to https://phabricator.wikimedia.org/P81845 and previous config saved to /var/cache/conftool/dbconfig/20250827-050423-ladsgroup.json
  • 04:49 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P81844 and previous config saved to /var/cache/conftool/dbconfig/20250827-044915-ladsgroup.json
  • 04:34 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P81843 and previous config saved to /var/cache/conftool/dbconfig/20250827-043407-ladsgroup.json
  • 04:19 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T402925)', diff saved to https://phabricator.wikimedia.org/P81842 and previous config saved to /var/cache/conftool/dbconfig/20250827-041900-ladsgroup.json
  • 04:06 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2223 (T402925)', diff saved to https://phabricator.wikimedia.org/P81841 and previous config saved to /var/cache/conftool/dbconfig/20250827-040605-ladsgroup.json
  • 04:05 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2223.codfw.wmnet with reason: Maintenance
  • 04:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T402925)', diff saved to https://phabricator.wikimedia.org/P81840 and previous config saved to /var/cache/conftool/dbconfig/20250827-040542-ladsgroup.json
  • 03:50 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P81839 and previous config saved to /var/cache/conftool/dbconfig/20250827-035035-ladsgroup.json
  • 03:35 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P81838 and previous config saved to /var/cache/conftool/dbconfig/20250827-033527-ladsgroup.json
  • 03:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T402925)', diff saved to https://phabricator.wikimedia.org/P81837 and previous config saved to /var/cache/conftool/dbconfig/20250827-032019-ladsgroup.json
  • 03:07 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2211 (T402925)', diff saved to https://phabricator.wikimedia.org/P81836 and previous config saved to /var/cache/conftool/dbconfig/20250827-030713-ladsgroup.json
  • 03:07 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 02:55 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 02:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T402925)', diff saved to https://phabricator.wikimedia.org/P81835 and previous config saved to /var/cache/conftool/dbconfig/20250827-025529-ladsgroup.json
  • 02:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P81834 and previous config saved to /var/cache/conftool/dbconfig/20250827-024021-ladsgroup.json
  • 02:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P81833 and previous config saved to /var/cache/conftool/dbconfig/20250827-022513-ladsgroup.json
  • 02:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T402925)', diff saved to https://phabricator.wikimedia.org/P81829 and previous config saved to /var/cache/conftool/dbconfig/20250827-021006-ladsgroup.json
  • 02:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2192 (T402925)', diff saved to https://phabricator.wikimedia.org/P81828 and previous config saved to /var/cache/conftool/dbconfig/20250827-020046-ladsgroup.json
  • 02:00 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 02:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T402925)', diff saved to https://phabricator.wikimedia.org/P81827 and previous config saved to /var/cache/conftool/dbconfig/20250827-020023-ladsgroup.json
  • 01:45 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P81826 and previous config saved to /var/cache/conftool/dbconfig/20250827-014516-ladsgroup.json
  • 01:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P81824 and previous config saved to /var/cache/conftool/dbconfig/20250827-013008-ladsgroup.json
  • 01:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T402925)', diff saved to https://phabricator.wikimedia.org/P81823 and previous config saved to /var/cache/conftool/dbconfig/20250827-011501-ladsgroup.json
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 25s)
  • 01:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2178 (T402925)', diff saved to https://phabricator.wikimedia.org/P81822 and previous config saved to /var/cache/conftool/dbconfig/20250827-010246-ladsgroup.json
  • 01:02 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 01:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T402925)', diff saved to https://phabricator.wikimedia.org/P81821 and previous config saved to /var/cache/conftool/dbconfig/20250827-010223-ladsgroup.json
  • 01:01 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:50 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2039
  • 00:50 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es2039
  • 00:47 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P81819 and previous config saved to /var/cache/conftool/dbconfig/20250827-004716-ladsgroup.json
  • 00:35 zabe@deploy1003: Finished scap sync-world: Backport for BacklinkCache: Use LinksMigration for categorylinks, BacklinkCache: Use LinksMigration for categorylinks (duration: 11m 49s)
  • 00:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P81818 and previous config saved to /var/cache/conftool/dbconfig/20250827-003208-ladsgroup.json
  • 00:30 zabe@deploy1003: zabe: Continuing with sync
  • 00:29 zabe@deploy1003: zabe: Backport for BacklinkCache: Use LinksMigration for categorylinks, BacklinkCache: Use LinksMigration for categorylinks synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 00:23 zabe@deploy1003: Started scap sync-world: Backport for BacklinkCache: Use LinksMigration for categorylinks, BacklinkCache: Use LinksMigration for categorylinks
  • 00:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T402925)', diff saved to https://phabricator.wikimedia.org/P81817 and previous config saved to /var/cache/conftool/dbconfig/20250827-001701-ladsgroup.json
  • 00:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2171 (T402925)', diff saved to https://phabricator.wikimedia.org/P81816 and previous config saved to /var/cache/conftool/dbconfig/20250827-000227-ladsgroup.json
  • 00:02 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 00:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T402925)', diff saved to https://phabricator.wikimedia.org/P81815 and previous config saved to /var/cache/conftool/dbconfig/20250827-000203-ladsgroup.json

2025-08-26

  • 23:46 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P81814 and previous config saved to /var/cache/conftool/dbconfig/20250826-234656-ladsgroup.json
  • 23:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P81813 and previous config saved to /var/cache/conftool/dbconfig/20250826-233148-ladsgroup.json
  • 23:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2039* gradually with 4 steps - Work done
  • 23:16 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T402925)', diff saved to https://phabricator.wikimedia.org/P81811 and previous config saved to /var/cache/conftool/dbconfig/20250826-231641-ladsgroup.json
  • 23:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db2157 (T402925)', diff saved to https://phabricator.wikimedia.org/P81809 and previous config saved to /var/cache/conftool/dbconfig/20250826-230201-ladsgroup.json
  • 23:01 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 23:01 rzl: reprepro -C main includedeb bullseye-wikimedia /srv/wikimedia/pool/component/envoy-future/e/envoyproxy/envoyproxy_1.26.8-1_amd64.deb # T402584
  • 22:50 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 22:49 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 22:40 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 22:40 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 22:40 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 22:37 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 22:35 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es2039* gradually with 4 steps - Work done
  • 22:33 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 22:33 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 22:31 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 22:30 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 22:29 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 22:25 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 22:23 jdlrobson@deploy1003: Finished scap sync-world: Backport for Fix VectorTypeahead config to avoid + (duration: 12m 28s)
  • 22:18 jdlrobson@deploy1003: bwang, jdlrobson: Continuing with sync
  • 22:17 jdlrobson@deploy1003: bwang, jdlrobson: Backport for Fix VectorTypeahead config to avoid + synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:11 jdlrobson@deploy1003: Started scap sync-world: Backport for Fix VectorTypeahead config to avoid +
  • 22:07 jdlrobson@deploy1003: Finished scap sync-world: Backport for Add support for typeahead search options in config (T402051), Explicitly define enwiki wgVectorTypeahead config (T397084) (duration: 16m 35s)
  • 22:00 jdlrobson@deploy1003: jdlrobson: Continuing with sync
  • 21:57 jdlrobson@deploy1003: jdlrobson: Backport for Add support for typeahead search options in config (T402051), Explicitly define enwiki wgVectorTypeahead config (T397084) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:51 jdlrobson@deploy1003: Started scap sync-world: Backport for Add support for typeahead search options in config (T402051), Explicitly define enwiki wgVectorTypeahead config (T397084)
  • 21:36 jdlrobson@deploy1003: Sync cancelled.
  • 21:23 jdlrobson@deploy1003: bwang, jdlrobson: Backport for Update vector search config with new wgVectorTypeahead (T397084) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:17 jdlrobson@deploy1003: Started scap sync-world: Backport for Update vector search config with new wgVectorTypeahead (T397084)
  • 20:30 dancy@deploy1003: Finished scap sync-world: Testing T402508 fix (phase 2) (duration: 13m 25s)
  • 20:16 dancy@deploy1003: Started scap sync-world: Testing T402508 fix (phase 2)
  • 20:16 dancy@deploy1003: Finished scap sync-world: Testing T402508 fix (duration: 55m 01s)
  • 20:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T401906)', diff saved to https://phabricator.wikimedia.org/P81805 and previous config saved to /var/cache/conftool/dbconfig/20250826-200437-fceratto.json
  • 19:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P81804 and previous config saved to /var/cache/conftool/dbconfig/20250826-194930-fceratto.json
  • 19:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P81803 and previous config saved to /var/cache/conftool/dbconfig/20250826-193422-fceratto.json
  • 19:21 dancy@deploy1003: Started scap sync-world: Testing T402508 fix
  • 19:20 dancy@deploy1003: Installation of scap version "4.210.0" completed for 2 hosts
  • 19:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T401906)', diff saved to https://phabricator.wikimedia.org/P81802 and previous config saved to /var/cache/conftool/dbconfig/20250826-191915-fceratto.json
  • 19:18 dancy@deploy1003: Installing scap version "4.210.0" for 2 host(s)
  • 19:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T401906)', diff saved to https://phabricator.wikimedia.org/P81801 and previous config saved to /var/cache/conftool/dbconfig/20250826-191702-fceratto.json
  • 19:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 19:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T401906)', diff saved to https://phabricator.wikimedia.org/P81800 and previous config saved to /var/cache/conftool/dbconfig/20250826-191640-fceratto.json
  • 19:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P81799 and previous config saved to /var/cache/conftool/dbconfig/20250826-190133-fceratto.json
  • 18:59 rzl@deploy1003: helmfile [staging] DONE helmfile.d/services/mathoid: apply
  • 18:49 rzl@deploy1003: helmfile [staging] START helmfile.d/services/mathoid: apply
  • 18:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P81798 and previous config saved to /var/cache/conftool/dbconfig/20250826-184625-fceratto.json
  • 18:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T401906)', diff saved to https://phabricator.wikimedia.org/P81797 and previous config saved to /var/cache/conftool/dbconfig/20250826-183118-fceratto.json
  • 18:29 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T401906)', diff saved to https://phabricator.wikimedia.org/P81796 and previous config saved to /var/cache/conftool/dbconfig/20250826-182905-fceratto.json
  • 18:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 18:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T401906)', diff saved to https://phabricator.wikimedia.org/P81795 and previous config saved to /var/cache/conftool/dbconfig/20250826-182842-fceratto.json
  • 18:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P81794 and previous config saved to /var/cache/conftool/dbconfig/20250826-181334-fceratto.json
  • 17:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P81793 and previous config saved to /var/cache/conftool/dbconfig/20250826-175827-fceratto.json
  • 17:44 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on people1005.eqiad.wmnet with reason: T402596
  • 17:43 ammarpad@deploy1003: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=mediawikiwiki 'API:Main page' 'API:Action API' Ammarpad '--reason=per phab:T402800' # T402800
  • 17:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T401906)', diff saved to https://phabricator.wikimedia.org/P81791 and previous config saved to /var/cache/conftool/dbconfig/20250826-174319-fceratto.json
  • 17:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 17:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 17:41 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T401906)', diff saved to https://phabricator.wikimedia.org/P81790 and previous config saved to /var/cache/conftool/dbconfig/20250826-174106-fceratto.json
  • 17:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 17:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 17:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T401906)', diff saved to https://phabricator.wikimedia.org/P81789 and previous config saved to /var/cache/conftool/dbconfig/20250826-174023-fceratto.json
  • 17:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P81788 and previous config saved to /var/cache/conftool/dbconfig/20250826-172516-fceratto.json
  • 17:19 swfrench@deploy1003: Finished scap sync-world: Backport for image-suggestion: cleanup unused refs to service listener (T368096) (duration: 12m 15s)
  • 17:13 swfrench@deploy1003: eevans, swfrench: Continuing with sync
  • 17:12 swfrench@deploy1003: eevans, swfrench: Backport for image-suggestion: cleanup unused refs to service listener (T368096) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P81787 and previous config saved to /var/cache/conftool/dbconfig/20250826-171008-fceratto.json
  • 17:06 swfrench@deploy1003: Started scap sync-world: Backport for image-suggestion: cleanup unused refs to service listener (T368096)
  • 17:03 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cp2042.codfw.wmnet
  • 17:02 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2042.codfw.wmnet
  • 17:00 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cp2042.codfw.wmnet
  • 16:59 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2042.codfw.wmnet
  • 16:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T401906)', diff saved to https://phabricator.wikimedia.org/P81786 and previous config saved to /var/cache/conftool/dbconfig/20250826-165501-fceratto.json
  • 16:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T401906)', diff saved to https://phabricator.wikimedia.org/P81785 and previous config saved to /var/cache/conftool/dbconfig/20250826-165248-fceratto.json
  • 16:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 16:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T401906)', diff saved to https://phabricator.wikimedia.org/P81784 and previous config saved to /var/cache/conftool/dbconfig/20250826-165226-fceratto.json
  • 16:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P81783 and previous config saved to /var/cache/conftool/dbconfig/20250826-163718-fceratto.json
  • 16:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P81782 and previous config saved to /var/cache/conftool/dbconfig/20250826-162211-fceratto.json
  • 16:19 mutante: phabricator - added FCeratto-WMF to acl*sre-team
  • 16:17 moritzm: installing libxslt security updates
  • 16:12 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T352245)
  • 16:11 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T352245)
  • 16:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T401906)', diff saved to https://phabricator.wikimedia.org/P81781 and previous config saved to /var/cache/conftool/dbconfig/20250826-160703-fceratto.json
  • 16:04 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T401906)', diff saved to https://phabricator.wikimedia.org/P81780 and previous config saved to /var/cache/conftool/dbconfig/20250826-160451-fceratto.json
  • 16:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 16:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T401906)', diff saved to https://phabricator.wikimedia.org/P81779 and previous config saved to /var/cache/conftool/dbconfig/20250826-160427-fceratto.json
  • 15:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P81778 and previous config saved to /var/cache/conftool/dbconfig/20250826-154920-fceratto.json
  • 15:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be2005.codfw.wmnet with OS bullseye
  • 15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2082.codfw.wmnet with OS bullseye
  • 15:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2081.codfw.wmnet with OS bullseye
  • 15:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P81777 and previous config saved to /var/cache/conftool/dbconfig/20250826-153412-fceratto.json
  • 15:28 swfrench-wmf: finished restart of all codfw-associated confds - T352245
  • 15:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
  • 15:25 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
  • 15:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage
  • 15:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T401906)', diff saved to https://phabricator.wikimedia.org/P81776 and previous config saved to /var/cache/conftool/dbconfig/20250826-151905-fceratto.json
  • 15:18 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be2005.codfw.wmnet with reason: host reimage
  • 15:18 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2082.codfw.wmnet with reason: host reimage
  • 15:17 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2081.codfw.wmnet with reason: host reimage
  • 15:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T401906)', diff saved to https://phabricator.wikimedia.org/P81775 and previous config saved to /var/cache/conftool/dbconfig/20250826-151653-fceratto.json
  • 15:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 15:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T401906)', diff saved to https://phabricator.wikimedia.org/P81774 and previous config saved to /var/cache/conftool/dbconfig/20250826-151630-fceratto.json
  • 15:12 swfrench-wmf: finished etcd cfssl-PKI migration in codfw - T352245
  • 15:05 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host thanos-be2005.codfw.wmnet with OS bullseye
  • 15:04 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2082.codfw.wmnet with OS bullseye
  • 15:04 brennen@deploy1003: Finished deploy [phabricator/deployment@27d2f0b]: deploy phab1004 for T402930 (duration: 00m 38s)
  • 15:03 brennen@deploy1003: Started deploy [phabricator/deployment@27d2f0b]: deploy phab1004 for T402930
  • 15:03 brennen@deploy1003: Finished deploy [phabricator/deployment@27d2f0b]: deploy phab2002 for T402930 (duration: 00m 42s)
  • 15:03 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2081.codfw.wmnet with OS bullseye
  • 15:02 brennen@deploy1003: Started deploy [phabricator/deployment@27d2f0b]: deploy phab2002 for T402930
  • 15:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P81773 and previous config saved to /var/cache/conftool/dbconfig/20250826-150123-fceratto.json
  • 14:57 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on phab2002.codfw.wmnet,phab[1004-1005].eqiad.wmnet with reason: T402930
  • 14:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P81772 and previous config saved to /var/cache/conftool/dbconfig/20250826-144616-fceratto.json
  • 14:42 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: foreachwikiindblist sul CentralAuth:FixRenameUserLocalLogs --logwiki=metawiki # T398177 (dry run)
  • 14:31 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1052
  • 14:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T401906)', diff saved to https://phabricator.wikimedia.org/P81770 and previous config saved to /var/cache/conftool/dbconfig/20250826-143109-fceratto.json
  • 14:30 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1052
  • 14:29 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T401906)', diff saved to https://phabricator.wikimedia.org/P81769 and previous config saved to /var/cache/conftool/dbconfig/20250826-142857-fceratto.json
  • 14:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T401906)', diff saved to https://phabricator.wikimedia.org/P81768 and previous config saved to /var/cache/conftool/dbconfig/20250826-142833-fceratto.json
  • 14:26 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 14:26 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:23 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 14:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P81767 and previous config saved to /var/cache/conftool/dbconfig/20250826-141325-fceratto.json
  • 14:11 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1052.eqiad.wmnet with OS bullseye
  • 14:11 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:10 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:08 swfrench-wmf: starting etcd cfssl-PKI migration in codfw - T352245
  • 13:59 Lucas_WMDE: UTC afternoon backport+config window done (maintenance scripts are ongoing and will probably take a while longer)
  • 13:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P81765 and previous config saved to /var/cache/conftool/dbconfig/20250826-135818-fceratto.json
  • 13:58 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2020.codfw.wmnet
  • 13:58 mvernon@cumin2002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2020.codfw.wmnet
  • 13:58 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2020.codfw.wmnet
  • 13:58 mvernon@cumin2002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe2020.codfw.wmnet
  • 13:57 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2019.codfw.wmnet
  • 13:57 mvernon@cumin2002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2019.codfw.wmnet
  • 13:57 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2019.codfw.wmnet
  • 13:57 mvernon@cumin2002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe2019.codfw.wmnet
  • 13:57 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2018.codfw.wmnet
  • 13:57 mvernon@cumin2002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2018.codfw.wmnet
  • 13:57 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2018.codfw.wmnet
  • 13:57 mvernon@cumin2002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe2018.codfw.wmnet
  • 13:57 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe2017.codfw.wmnet
  • 13:57 mvernon@cumin2002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe2017.codfw.wmnet
  • 13:57 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe2017.codfw.wmnet
  • 13:57 mvernon@cumin2002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe2017.codfw.wmnet
  • 13:57 dcausse@deploy1003: Finished scap sync-world: Backport for Revert "NetworkSession: Only enable for private wikis" (T373826) (duration: 13m 49s)
  • 13:57 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1020.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1020.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1020.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1020.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1019.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1019.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1019.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1019.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1018.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1018.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1018.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1018.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1017.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1017.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1017.eqiad.wmnet
  • 13:56 mvernon@cumin2002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1017.eqiad.wmnet
  • 13:56 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 13:51 dcausse@deploy1003: dcausse: Continuing with sync
  • 13:49 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 13:48 dcausse@deploy1003: dcausse: Backport for Revert "NetworkSession: Only enable for private wikis" (T373826) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1052.eqiad.wmnet with reason: host reimage
  • 13:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T401906)', diff saved to https://phabricator.wikimedia.org/P81764 and previous config saved to /var/cache/conftool/dbconfig/20250826-134311-fceratto.json
  • 13:42 dcausse@deploy1003: Started scap sync-world: Backport for Revert "NetworkSession: Only enable for private wikis" (T373826)
  • 13:42 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1052.eqiad.wmnet with reason: host reimage
  • 13:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T401906)', diff saved to https://phabricator.wikimedia.org/P81763 and previous config saved to /var/cache/conftool/dbconfig/20250826-134201-fceratto.json
  • 13:41 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 13:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-fe[1017-1020].eqiad.wmnet
  • 13:40 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-fe[1017-1020].eqiad.wmnet
  • 13:35 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: CentralAuth:FixRenamedUserGlobalEditCount metawiki # T313900 (dry run)
  • 13:35 stevemunene@cumin1003: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 13:34 mvernon@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-fe[1017-1020].eqiad.wmnet with reason: reboot before bringing into service
  • 13:33 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for PHPSessionHandler: Better handle objects stored in the session (T402602), Add maint script to fix global edit count of renamed users (T313900), Add maint script to fix wrong actors in local log entries for global renames (T398177) (duration: 12m 54s)
  • 13:28 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host frmx2002
  • 13:28 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host frmx2002
  • 13:28 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Continuing with sync
  • 13:28 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2039
  • 13:28 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es2039
  • 13:27 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:27 lucaswerkmeister-wmde@deploy1003: matmarex, lucaswerkmeister-wmde: Backport for PHPSessionHandler: Better handle objects stored in the session (T402602), Add maint script to fix global edit count of renamed users (T313900), Add maint script to fix wrong actors in local log entries for global renames (T398177) synced to the testservers (see https://wikitech.wikim
  • 13:26 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 13:24 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 13:24 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:21 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 13:20 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for PHPSessionHandler: Better handle objects stored in the session (T402602), Add maint script to fix global edit count of renamed users (T313900), Add maint script to fix wrong actors in local log entries for global renames (T398177)
  • 13:20 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 13:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1052.eqiad.wmnet with OS bullseye
  • 13:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1052.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:06 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 13:02 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 12:57 jmm@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 12:56 jmm@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 12:55 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1052.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:54 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:54 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dnscloudcephosd1052 - jclark@cumin1002"
  • 12:54 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dnscloudcephosd1052 - jclark@cumin1002"
  • 12:50 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:48 stevemunene@cumin1003: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 12:16 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:15 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:15 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:14 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:12 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:11 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:56 Daimona: Running queries from T402239#11118710 in x1.wikishared to fix broken event addresses (again)
  • 11:25 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on es2039.codfw.wmnet with reason: Glow up (T399927)
  • 11:25 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on es1039.eqiad.wmnet with reason: Glow up (T399927)
  • 11:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1244 gradually with 4 steps - Work done
  • 11:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool es2039 T402912', diff saved to https://phabricator.wikimedia.org/P81760 and previous config saved to /var/cache/conftool/dbconfig/20250826-111927-ladsgroup.json
  • 11:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Promote es2038 to es7 primary T402912', diff saved to https://phabricator.wikimedia.org/P81759 and previous config saved to /var/cache/conftool/dbconfig/20250826-111630-ladsgroup.json
  • 11:14 Amir1: Starting es7 codfw failover from es2039 to es2038 - T402912
  • 11:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set es2038 with weight 0 T402912', diff saved to https://phabricator.wikimedia.org/P81758 and previous config saved to /var/cache/conftool/dbconfig/20250826-111015-ladsgroup.json
  • 11:09 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Primary switchover es7 T402912
  • 10:37 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool db1244 gradually with 4 steps - Work done
  • 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install6002.wikimedia.org
  • 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install6002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:31 jmm@dns1004: END - running authdns-update
  • 10:31 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install6002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:30 jmm@dns1004: START - running authdns-update
  • 10:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:18 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install6002.wikimedia.org
  • 10:05 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.16 refs T396377
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install5002.wikimedia.org
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:56 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 09:56 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install5002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 09:54 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 09:54 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 09:52 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:49 aklapper@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.16 refs T396377 (duration: 45m 59s)
  • 09:47 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install5002.wikimedia.org
  • 09:43 jmm@dns1004: END - running authdns-update
  • 09:41 jmm@dns1004: START - running authdns-update
  • 09:03 aklapper@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.16 refs T396377
  • 08:57 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 08:35 kartik@deploy1003: Finished scap sync-world: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496) (duration: 26m 46s)
  • 08:29 kartik@deploy1003: abi, kartik: Continuing with sync
  • 08:13 kartik@deploy1003: abi, kartik: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:08 kartik@deploy1003: Started scap sync-world: Backport for Setup tracking for CentralNotice banners experiment for WE2.1.1 (T402496)
  • 08:02 dcausse@deploy1003: Finished scap sync-world: Backport for Revert "SECURITY: declare PoolCounter settings for cirrusbuilddoc" (duration: 10m 53s)
  • 07:57 dcausse@deploy1003: dcausse: Continuing with sync
  • 07:57 dcausse@deploy1003: dcausse: Backport for Revert "SECURITY: declare PoolCounter settings for cirrusbuilddoc" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:52 dcausse@deploy1003: Started scap sync-world: Backport for Revert "SECURITY: declare PoolCounter settings for cirrusbuilddoc"
  • 07:48 dcausse@deploy1003: Finished scap sync-world: Backport for SECURITY: declare PoolCounter settings for cirrusbuilddoc (T401220) (duration: 45m 38s)
  • 07:42 dcausse@deploy1003: dcausse: Continuing with sync
  • 07:08 dcausse@deploy1003: dcausse: Backport for SECURITY: declare PoolCounter settings for cirrusbuilddoc (T401220) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:02 dcausse@deploy1003: Started scap sync-world: Backport for SECURITY: declare PoolCounter settings for cirrusbuilddoc (T401220)
  • 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.13 (duration: 01m 11s)
  • 02:17 TimStarling: on db2202 creating copy of enwiki.recentchanges for performance analysis T400696
  • 01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T391056)', diff saved to https://phabricator.wikimedia.org/P81751 and previous config saved to /var/cache/conftool/dbconfig/20250826-015141-ladsgroup.json
  • 01:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P81750 and previous config saved to /var/cache/conftool/dbconfig/20250826-013633-ladsgroup.json
  • 01:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P81749 and previous config saved to /var/cache/conftool/dbconfig/20250826-012125-ladsgroup.json
  • 01:08 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 01:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T391056)', diff saved to https://phabricator.wikimedia.org/P81748 and previous config saved to /var/cache/conftool/dbconfig/20250826-010618-ladsgroup.json
  • 00:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T391056)', diff saved to https://phabricator.wikimedia.org/P81747 and previous config saved to /var/cache/conftool/dbconfig/20250826-005952-ladsgroup.json
  • 00:59 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 00:55 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 00:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 00:49 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 00:49 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 00:48 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 00:47 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 00:47 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 00:39 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 00:36 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 00:35 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 00:24 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 00:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1244.eqiad.wmnet
  • 00:07 brett: Run systemctl reset-failed on disappeared nrpe2nodexp-disk_space.timer units (T395446)

2025-08-25

  • 23:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1244 - Upgrading db1244.eqiad.wmnet
  • 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.depool db1244 - Upgrading db1244.eqiad.wmnet
  • 23:59 ladsgroup@cumin1002: START - Cookbook sre.mysql.upgrade for db1244.eqiad.wmnet
  • 23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1244 T402871', diff saved to https://phabricator.wikimedia.org/P81746 and previous config saved to /var/cache/conftool/dbconfig/20250825-234856-ladsgroup.json
  • 23:47 ladsgroup@dns1004: END - running authdns-update
  • 23:45 ladsgroup@dns1004: START - running authdns-update
  • 23:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Promote db1160 to s4 primary and set section read-write T402871', diff saved to https://phabricator.wikimedia.org/P81745 and previous config saved to /var/cache/conftool/dbconfig/20250825-234303-ladsgroup.json
  • 23:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T402871', diff saved to https://phabricator.wikimedia.org/P81744 and previous config saved to /var/cache/conftool/dbconfig/20250825-233934-ladsgroup.json
  • 23:39 Amir1: Starting s4 eqiad failover from db1244 to db1160 - T402871
  • 23:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Set db1160 with weight 0 T402871', diff saved to https://phabricator.wikimedia.org/P81743 and previous config saved to /var/cache/conftool/dbconfig/20250825-233128-ladsgroup.json
  • 23:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T402871
  • 23:23 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 23:21 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 23:17 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest1003.eqiad.wmnet with reason: sleep test
  • 23:00 maryum: Deploy security fix for T397396
  • 22:55 maryum: Deploy security fix for T401220
  • 22:27 maryum: Deployed security fix for T298690
  • 22:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for Move update of category members count to a dedicated job (T365303) (duration: 12m 26s)
  • 22:15 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 22:14 ladsgroup@deploy1003: ladsgroup: Backport for Move update of category members count to a dedicated job (T365303) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:08 ladsgroup@deploy1003: Started scap sync-world: Backport for Move update of category members count to a dedicated job (T365303)
  • 22:05 ladsgroup@deploy1003: Sync cancelled.
  • 21:53 ladsgroup@deploy1003: ladsgroup: Backport for Move update of category members count to a dedicated job (T365303) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:47 ladsgroup@deploy1003: Started scap sync-world: Backport for Move update of category members count to a dedicated job (T365303)
  • 21:47 sbassett: Deployed updated security mitigations for T399627
  • 21:23 sbassett: Deployed security mitigations for T402146, T402077, T402095, T400525
  • 21:21 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 21:19 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest1003.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 21:18 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 21:17 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 21:17 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest1003.eqiad.wmnet with reason: sleep test
  • 21:16 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest1002.eqiad.wmnet with reason: sleep test
  • 21:15 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:13 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:10 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:08 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:08 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2009.codfw.wmnet with reason: sleep test
  • 21:07 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 21:03 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2006.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 21:00 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2006.codfw.wmnet with reason: sleep test
  • 20:59 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:54 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:53 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2001.codfw.wmnet with reason: sleep test
  • 20:53 rzl@deploy1003: mwscript-k8s job started: foreachwikiindblist mwscript.dblist Version.php # dblist: https://phabricator.wikimedia.org/P81742
  • 20:50 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:48 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:48 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:46 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2003.codfw.wmnet with reason: sleep test
  • 20:45 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:38 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:35 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:34 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host deploy2003.codfw.wmnet with OS bookworm
  • 20:31 ebernhardson@deploy1003: Finished scap sync-world: Backport for EventStream: Enable hive ingestion for wcqs-external.sparql-query (T391383), cirrus: Enable phrase suggester variant (T397083) (duration: 13m 04s)
  • 20:26 ebernhardson@deploy1003: ebernhardson: Continuing with sync
  • 20:23 ebernhardson@deploy1003: ebernhardson: Backport for EventStream: Enable hive ingestion for wcqs-external.sparql-query (T391383), cirrus: Enable phrase suggester variant (T397083) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:19 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:19 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:18 ebernhardson@deploy1003: Started scap sync-world: Backport for EventStream: Enable hive ingestion for wcqs-external.sparql-query (T391383), cirrus: Enable phrase suggester variant (T397083)
  • 20:15 arlolra@deploy1003: Finished scap sync-world: Backport for Deploy Parsoid Read Views to ~20 Wikipedias (T402349) (duration: 12m 40s)
  • 20:14 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:14 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 20:10 arlolra@deploy1003: arlolra: Continuing with sync
  • 20:08 arlolra@deploy1003: arlolra: Backport for Deploy Parsoid Read Views to ~20 Wikipedias (T402349) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:03 arlolra@deploy1003: Started scap sync-world: Backport for Deploy Parsoid Read Views to ~20 Wikipedias (T402349)
  • 20:00 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:58 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:51 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:49 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2010.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:48 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:43 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:40 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2001.codfw.wmnet with reason: sleep test
  • 19:26 dancy@deploy1003: Installation of scap version "4.209.0" completed for 169 hosts
  • 19:26 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1045
  • 19:26 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1045
  • 19:22 dancy@deploy1003: Installing scap version "4.209.0" for 169 host(s)
  • 19:21 thcipriani: restart apache gerrit1003
  • 19:15 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host deploy2003.codfw.wmnet with OS bookworm
  • 19:14 jhancock@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['deploy2003']
  • 19:14 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['deploy2003']
  • 19:09 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host deploy2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:01 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host deploy2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T399249)', diff saved to https://phabricator.wikimedia.org/P81739 and previous config saved to /var/cache/conftool/dbconfig/20250825-181920-fceratto.json
  • 18:16 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host deploy2003.codfw.wmnet with OS bookworm
  • 18:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P81737 and previous config saved to /var/cache/conftool/dbconfig/20250825-180413-fceratto.json
  • 17:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P81736 and previous config saved to /var/cache/conftool/dbconfig/20250825-174905-fceratto.json
  • 17:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T399249)', diff saved to https://phabricator.wikimedia.org/P81735 and previous config saved to /var/cache/conftool/dbconfig/20250825-173358-fceratto.json
  • 17:26 swfrench@deploy1003: Finished scap sync-world: Helmfile-only deployment for php.version override cleanup - T401721 (duration: 03m 34s)
  • 17:24 swfrench@deploy1003: Started scap sync-world: Helmfile-only deployment for php.version override cleanup - T401721
  • 16:56 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host deploy2003.codfw.wmnet with OS bookworm
  • 16:43 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:43 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating new f servers in codfw - jhancock@cumin1003"
  • 16:38 sukhe@dns1004: END - running authdns-update
  • 16:37 sukhe@dns1004: START - running authdns-update
  • 16:36 hnowlan@dns1004: END - running authdns-update
  • 16:35 hnowlan@dns1004: START - running authdns-update
  • 16:16 urbanecm@deploy1003: Finished scap sync-world: Backport for [Growth] wikidata: Preconfigure for limited Growth features release (T400937) (duration: 11m 49s)
  • 16:13 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cr1-codfw with reason: suppress alerts so we can re-seat one of the PSUs
  • 16:11 urbanecm@deploy1003: urbanecm: Continuing with sync
  • 16:10 urbanecm@deploy1003: urbanecm: Backport for [Growth] wikidata: Preconfigure for limited Growth features release (T400937) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:07 topranks: set unused FPC 0 line card to offline mode on cr1-codfw T401937
  • 16:04 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] wikidata: Preconfigure for limited Growth features release (T400937)
  • 15:52 stevemunene@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-druid1002.eqiad.wmnet
  • 15:52 stevemunene@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:50 stevemunene@cumin1003: START - Cookbook sre.dns.netbox
  • 15:49 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating new f servers in codfw - jhancock@cumin1003"
  • 15:45 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 15:41 stevemunene@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-druid1002.eqiad.wmnet
  • 15:33 Daimona: Running queries from T402239#11115333 in x1.wikishared to fix broken event addresses
  • 15:29 stevemunene@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-druid1001.eqiad.wmnet
  • 15:29 stevemunene@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:29 stevemunene@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-druid1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1003"
  • 15:29 stevemunene@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-druid1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - stevemunene@cumin1003"
  • 15:24 stevemunene@cumin1003: START - Cookbook sre.dns.netbox
  • 15:17 stevemunene@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-druid1001.eqiad.wmnet
  • 15:17 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:17 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating new frack mgmt ips - jhancock@cumin1003"
  • 15:17 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating new frack mgmt ips - jhancock@cumin1003"
  • 15:12 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 15:05 moritzm: imported wmf-laptop 1.0.3 to apt.wikimedia.org
  • 15:03 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:03 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for frac pdus - jclark@cumin1002"
  • 15:03 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for frac pdus - jclark@cumin1002"
  • 15:00 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 15:00 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 14:59 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 14:56 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:36 ejegg: payments-wiki upgraded from 37616266 to 69bfc93f
  • 14:21 zabe@deploy1003: Finished scap sync-world: Backport for Set categorylinks to read new on commonswiki (T397912) (duration: 11m 56s)
  • 14:16 zabe@deploy1003: zabe: Continuing with sync
  • 14:15 zabe@deploy1003: zabe: Backport for Set categorylinks to read new on commonswiki (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:09 zabe@deploy1003: Started scap sync-world: Backport for Set categorylinks to read new on commonswiki (T397912)
  • 13:28 ejegg: standalone SmashPig upgraded from 77dc08bd to 5be2918e
  • 13:27 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:26 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for PHPSessionHandler: In warn mode, report the changed keys (T400668), Set wgPHPSessionHandling to 'warn' again (T362324) (duration: 13m 14s)
  • 13:21 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, matmarex: Continuing with sync
  • 13:20 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, matmarex: Backport for PHPSessionHandler: In warn mode, report the changed keys (T400668), Set wgPHPSessionHandling to 'warn' again (T362324) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2034.codfw.wmnet
  • 13:17 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:14 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 13:13 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for PHPSessionHandler: In warn mode, report the changed keys (T400668), Set wgPHPSessionHandling to 'warn' again (T362324)
  • 13:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
  • 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2034.codfw.wmnet
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
  • 13:11 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2034.codfw.wmnet
  • 13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2034.codfw.wmnet
  • 12:23 hashar: Restarted CI Jenkins to update some plugins
  • 11:24 jmm@dns1004: END - running authdns-update
  • 11:23 jmm@dns1004: START - running authdns-update
  • 11:12 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:09 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:04 kharlan@deploy1003: Finished scap sync-world: Backport for hcaptcha: Instrument siteverify API call (T402492), hCaptcha: Log errors to Logstash (T402767) (duration: 14m 26s)
  • 11:01 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 10:58 kharlan@deploy1003: kharlan: Continuing with sync
  • 10:58 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 10:56 kharlan@deploy1003: kharlan: Backport for hcaptcha: Instrument siteverify API call (T402492), hCaptcha: Log errors to Logstash (T402767) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:54 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:54 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:50 kharlan@deploy1003: Started scap sync-world: Backport for hcaptcha: Instrument siteverify API call (T402492), hCaptcha: Log errors to Logstash (T402767)
  • 10:47 moritzm: installing openjdk-17 security updates
  • 10:30 moritzm: installing postgresql-13 security updates
  • 10:24 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 10:22 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 10:15 urbanecm@deploy1003: Finished scap sync-world: Deploying a security patch (T402698, T402600) (duration: 15m 06s)
  • 10:09 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:09 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts install4002.wikimedia.org
  • 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install4002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:01 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: install4002.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 10:00 urbanecm@deploy1003: Started scap sync-world: Deploying a security patch (T402698, T402600)
  • 09:57 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:54 urbanecm@deploy1003: Finished scap sync-world: Deploying a security patch (T402698, T402600) (duration: 02m 06s)
  • 09:52 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts install4002.wikimedia.org
  • 09:52 urbanecm@deploy1003: Started scap sync-world: Deploying a security patch (T402698, T402600)
  • 09:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 09:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 09:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 09:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 09:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 09:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 09:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T399249)', diff saved to https://phabricator.wikimedia.org/P81734 and previous config saved to /var/cache/conftool/dbconfig/20250825-093241-fceratto.json
  • 09:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 08:44 jmm@dns1004: END - running authdns-update
  • 08:43 jmm@dns1004: START - running authdns-update
  • 07:36 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 07:33 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 07:24 kharlan@deploy1003: Finished scap sync-world: Backport for hcaptcha: Delay challenge execution until submit (T402641) (duration: 41m 22s)
  • 07:11 kharlan@deploy1003: kharlan: Continuing with sync
  • 07:08 kharlan@deploy1003: kharlan: Backport for hcaptcha: Delay challenge execution until submit (T402641) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 06:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 06:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 06:43 kharlan@deploy1003: Started scap sync-world: Backport for hcaptcha: Delay challenge execution until submit (T402641)

2025-08-24

  • 15:25 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:24 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:24 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:23 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:23 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:22 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 06:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 06:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T399249)', diff saved to https://phabricator.wikimedia.org/P81733 and previous config saved to /var/cache/conftool/dbconfig/20250824-060527-fceratto.json
  • 05:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P81732 and previous config saved to /var/cache/conftool/dbconfig/20250824-055019-fceratto.json
  • 05:36 swfrench-wmf: cancelled extended downtime for db1252 (host is repooling) - T399249
  • 05:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P81731 and previous config saved to /var/cache/conftool/dbconfig/20250824-053511-fceratto.json
  • 05:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T399249)', diff saved to https://phabricator.wikimedia.org/P81730 and previous config saved to /var/cache/conftool/dbconfig/20250824-052004-fceratto.json
  • 05:08 swfrench@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 12:00:00 on db1252.eqiad.wmnet with reason: Maintenance - T399249

2025-08-23

  • 20:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1252 (T399249)', diff saved to https://phabricator.wikimedia.org/P81729 and previous config saved to /var/cache/conftool/dbconfig/20250823-205835-fceratto.json
  • 20:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1252.eqiad.wmnet with reason: Maintenance
  • 20:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T399249)', diff saved to https://phabricator.wikimedia.org/P81728 and previous config saved to /var/cache/conftool/dbconfig/20250823-205824-fceratto.json
  • 20:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P81727 and previous config saved to /var/cache/conftool/dbconfig/20250823-204316-fceratto.json
  • 20:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P81726 and previous config saved to /var/cache/conftool/dbconfig/20250823-202809-fceratto.json
  • 20:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T399249)', diff saved to https://phabricator.wikimedia.org/P81725 and previous config saved to /var/cache/conftool/dbconfig/20250823-201301-fceratto.json
  • 12:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T399249)', diff saved to https://phabricator.wikimedia.org/P81722 and previous config saved to /var/cache/conftool/dbconfig/20250823-125942-fceratto.json
  • 12:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 12:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T399249)', diff saved to https://phabricator.wikimedia.org/P81721 and previous config saved to /var/cache/conftool/dbconfig/20250823-125920-fceratto.json
  • 12:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P81720 and previous config saved to /var/cache/conftool/dbconfig/20250823-124412-fceratto.json
  • 12:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P81719 and previous config saved to /var/cache/conftool/dbconfig/20250823-122904-fceratto.json
  • 12:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T399249)', diff saved to https://phabricator.wikimedia.org/P81718 and previous config saved to /var/cache/conftool/dbconfig/20250823-121357-fceratto.json
  • 06:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T399249)', diff saved to https://phabricator.wikimedia.org/P81717 and previous config saved to /var/cache/conftool/dbconfig/20250823-060334-fceratto.json
  • 06:03 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 06:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T399249)', diff saved to https://phabricator.wikimedia.org/P81716 and previous config saved to /var/cache/conftool/dbconfig/20250823-060311-fceratto.json
  • 05:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P81715 and previous config saved to /var/cache/conftool/dbconfig/20250823-054804-fceratto.json
  • 05:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P81714 and previous config saved to /var/cache/conftool/dbconfig/20250823-053256-fceratto.json
  • 05:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T399249)', diff saved to https://phabricator.wikimedia.org/P81713 and previous config saved to /var/cache/conftool/dbconfig/20250823-051748-fceratto.json

2025-08-22

  • 22:25 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T399249)', diff saved to https://phabricator.wikimedia.org/P81712 and previous config saved to /var/cache/conftool/dbconfig/20250822-222526-fceratto.json
  • 22:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 21:35 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on sretest2006.codfw.wmnet with reason: sleep test
  • 21:23 dzahn@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people2004.codfw.wmnet
  • 21:23 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host people2004.codfw.wmnet with OS trixie
  • 21:15 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:14 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:08 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on people2004.codfw.wmnet with reason: host reimage
  • 21:03 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on people2004.codfw.wmnet with reason: host reimage
  • 20:44 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host people2004.codfw.wmnet with OS trixie
  • 20:42 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people2004.codfw.wmnet - dzahn@cumin1002"
  • 20:41 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people2004.codfw.wmnet - dzahn@cumin1002"
  • 20:41 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people2004.codfw.wmnet on all recursors
  • 20:41 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache people2004.codfw.wmnet on all recursors
  • 20:41 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:41 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people2004.codfw.wmnet - dzahn@cumin1002"
  • 20:41 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people2004.codfw.wmnet - dzahn@cumin1002"
  • 20:36 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 20:36 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host people2004.codfw.wmnet
  • 20:26 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:24 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:21 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:21 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:15 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:13 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:10 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:09 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:06 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 20:05 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:58 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:58 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:56 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:56 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:45 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1045.eqiad.wmnet with OS bullseye
  • 19:42 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:37 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:35 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:35 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:29 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1045.eqiad.wmnet with reason: host reimage
  • 19:25 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1045.eqiad.wmnet with reason: host reimage
  • 18:57 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1045.eqiad.wmnet with OS bullseye
  • 18:57 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1045.eqiad.wmnet with OS bullseye
  • 18:57 krinkle@deploy1003: Finished deploy [integration/docroot@2d9ffad]: (no justification provided) (duration: 00m 11s)
  • 18:56 krinkle@deploy1003: Started deploy [integration/docroot@2d9ffad]: (no justification provided)
  • 18:51 krinkle@deploy1003: Finished deploy [integration/docroot@5918d5e]: Add support for lcov (duration: 00m 21s)
  • 18:51 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:51 krinkle@deploy1003: Started deploy [integration/docroot@5918d5e]: Add support for lcov
  • 18:46 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1045.eqiad.wmnet with OS bullseye
  • 18:41 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:31 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-test-coord1002.eqiad.wmnet with reason: supermicro
  • 18:28 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1045
  • 18:28 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1045
  • 18:22 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1045.eqiad.wmnet with OS bookworm
  • 18:22 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 17:53 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 17:39 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1002.eqiad.wmnet with reason: host reimage
  • 17:35 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1002.eqiad.wmnet with reason: host reimage
  • 17:32 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on people1005.eqiad.wmnet with reason: T402596
  • 17:25 rzl: sudo -i docker-registryctl delete-tags docker-registry.discovery.wmnet/envoy-future:1.26.8-1 # T402584
  • 17:15 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 17:14 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 17:14 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 17:13 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 17:12 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 15:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T399249)', diff saved to https://phabricator.wikimedia.org/P81711 and previous config saved to /var/cache/conftool/dbconfig/20250822-155124-fceratto.json
  • 15:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P81710 and previous config saved to /var/cache/conftool/dbconfig/20250822-153617-fceratto.json
  • 15:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P81709 and previous config saved to /var/cache/conftool/dbconfig/20250822-152109-fceratto.json
  • 15:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T399249)', diff saved to https://phabricator.wikimedia.org/P81708 and previous config saved to /var/cache/conftool/dbconfig/20250822-150602-fceratto.json
  • 14:57 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 14:37 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1045.eqiad.wmnet with reason: host reimage
  • 14:34 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1045.eqiad.wmnet with reason: host reimage
  • 14:05 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1045.eqiad.wmnet with OS bookworm
  • 14:04 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.sanitarium_restart (exit_code=0)
  • 13:52 vriley@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:51 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:43 ladsgroup@cumin1002: START - Cookbook sre.mysql.sanitarium_restart
  • 13:40 vriley@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:40 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:39 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2151* gradually with 4 steps - Pooling in
  • 13:17 vriley@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:16 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1045
  • 13:15 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1045
  • 13:15 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:12 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 12:56 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephosd1045
  • 12:56 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1045
  • 12:56 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephosd1045
  • 12:56 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1045
  • 12:54 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2151* gradually with 4 steps - Pooling in
  • 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install6003.wikimedia.org
  • 12:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install6003.wikimedia.org with OS bookworm
  • 12:50 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2148* gradually with 4 steps - Pooling in
  • 12:41 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2157* gradually with 4 steps - Pooling in
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install6003.wikimedia.org with reason: host reimage
  • 12:28 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install6003.wikimedia.org with reason: host reimage
  • 12:05 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install6003.wikimedia.org with OS bookworm
  • 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install6003.wikimedia.org - jmm@cumin2002"
  • 12:05 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install6003.wikimedia.org - jmm@cumin2002"
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install6003.wikimedia.org on all recursors
  • 12:04 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install6003.wikimedia.org on all recursors
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6003.wikimedia.org - jmm@cumin2002"
  • 12:04 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2148* gradually with 4 steps - Pooling in
  • 12:03 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install6003.wikimedia.org - jmm@cumin2002"
  • 12:00 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:00 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install6003.wikimedia.org
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install5003.wikimedia.org
  • 11:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install5003.wikimedia.org with OS bookworm
  • 11:56 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2157* gradually with 4 steps - Pooling in
  • 11:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 11:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 11:40 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis amwikimedia, cnwikimedia, donatewiki, gewikimedia, grwikimedia, hiwikimedia, idwikimedia, maiwikimedia, ngwikimedia, nostalgiawiki, punjabiwikimedia, romdwikimedia, rswikimedia, votewiki, wbwikimedia in section s5
  • 11:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install5003.wikimedia.org with reason: host reimage
  • 11:35 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install5003.wikimedia.org with reason: host reimage
  • 11:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T401906)', diff saved to https://phabricator.wikimedia.org/P81693 and previous config saved to /var/cache/conftool/dbconfig/20250822-112259-fceratto.json
  • 11:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:48 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install5003.wikimedia.org with OS bookworm
  • 10:46 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5003.wikimedia.org - jmm@cumin2002"
  • 10:46 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install5003.wikimedia.org - jmm@cumin2002"
  • 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install5003.wikimedia.org on all recursors
  • 10:45 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install5003.wikimedia.org on all recursors
  • 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5003.wikimedia.org - jmm@cumin2002"
  • 10:28 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install5003.wikimedia.org - jmm@cumin2002"
  • 10:24 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 10:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install5003.wikimedia.org
  • 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install4003.wikimedia.org
  • 10:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install4003.wikimedia.org with OS bookworm
  • 10:13 vgutierrez: switch webrequest_sampled sampling from sequence number to kafka offset - T401383
  • 09:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install4003.wikimedia.org with reason: host reimage
  • 09:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install4003.wikimedia.org with reason: host reimage
  • 09:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2079.codfw.wmnet with OS bullseye
  • 09:33 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install4003.wikimedia.org with OS bookworm
  • 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install4003.wikimedia.org - jmm@cumin2002"
  • 09:33 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install4003.wikimedia.org - jmm@cumin2002"
  • 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install4003.wikimedia.org on all recursors
  • 09:32 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install4003.wikimedia.org on all recursors
  • 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4003.wikimedia.org - jmm@cumin2002"
  • 09:32 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install4003.wikimedia.org - jmm@cumin2002"
  • 09:28 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:28 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install4003.wikimedia.org
  • 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install2005.wikimedia.org
  • 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install2005.wikimedia.org with OS bookworm
  • 09:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2079.codfw.wmnet with reason: host reimage
  • 09:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 09:19 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2079.codfw.wmnet with reason: host reimage
  • 09:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 09:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install2005.wikimedia.org with reason: host reimage
  • 09:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install2005.wikimedia.org with reason: host reimage
  • 09:01 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2079
  • 09:01 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2079
  • 08:58 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2079
  • 08:58 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2079.codfw.wmnet 244.48.192.10.in-addr.arpa 4.4.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 08:58 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2079.codfw.wmnet 244.48.192.10.in-addr.arpa 4.4.2.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
  • 08:58 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:58 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2079 - mvernon@cumin2002"
  • 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 08:51 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2079 - mvernon@cumin2002"
  • 08:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 08:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install2005.wikimedia.org with OS bookworm
  • 08:39 mvernon@cumin2002: START - Cookbook sre.dns.netbox
  • 08:39 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2079
  • 08:38 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2079.codfw.wmnet with OS bullseye
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install2005.wikimedia.org - jmm@cumin2002"
  • 08:37 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install2005.wikimedia.org - jmm@cumin2002"
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install2005.wikimedia.org on all recursors
  • 08:37 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install2005.wikimedia.org on all recursors
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2005.wikimedia.org - jmm@cumin2002"
  • 08:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install2005.wikimedia.org - jmm@cumin2002"
  • 08:22 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:22 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install2005.wikimedia.org
  • 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host install1005.wikimedia.org
  • 08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host install1005.wikimedia.org with OS bookworm
  • 08:09 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T401906)', diff saved to https://phabricator.wikimedia.org/P81691 and previous config saved to /var/cache/conftool/dbconfig/20250822-080922-fceratto.json
  • 08:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 08:02 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on install1005.wikimedia.org with reason: host reimage
  • 07:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 07:59 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis amwikimedia, cnwikimedia, donatewiki, gewikimedia, grwikimedia, hiwikimedia, idwikimedia, maiwikimedia, ngwikimedia, nostalgiawiki, punjabiwikimedia, romdwikimedia, rswikimedia, votewiki, wbwikimedia in section s5
  • 07:58 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on install1005.wikimedia.org with reason: host reimage
  • 07:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T399249)', diff saved to https://phabricator.wikimedia.org/P81690 and previous config saved to /var/cache/conftool/dbconfig/20250822-074842-fceratto.json
  • 07:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 07:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T399249)', diff saved to https://phabricator.wikimedia.org/P81689 and previous config saved to /var/cache/conftool/dbconfig/20250822-074819-fceratto.json
  • 07:40 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis amwikimedia, cnwikimedia, donatewiki, gewikimedia, grwikimedia, hiwikimedia, idwikimedia, maiwikimedia, ngwikimedia, nostalgiawiki, punjabiwikimedia, romdwikimedia, rswikimedia, votewiki, wbwikimedia in section s5
  • 07:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host install1005.wikimedia.org with OS bookworm
  • 07:36 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install1005.wikimedia.org - jmm@cumin2002"
  • 07:35 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM install1005.wikimedia.org - jmm@cumin2002"
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) install1005.wikimedia.org on all recursors
  • 07:35 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache install1005.wikimedia.org on all recursors
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1005.wikimedia.org - jmm@cumin2002"
  • 07:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P81688 and previous config saved to /var/cache/conftool/dbconfig/20250822-073312-fceratto.json
  • 07:31 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM install1005.wikimedia.org - jmm@cumin2002"
  • 07:30 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis amwikimedia, cnwikimedia, donatewiki, gewikimedia, grwikimedia, hiwikimedia, idwikimedia, maiwikimedia, ngwikimedia, nostalgiawiki, punjabiwikimedia, romdwikimedia, rswikimedia, votewiki, wbwikimedia in section s5
  • 07:24 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host install1005.wikimedia.org
  • 07:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P81687 and previous config saved to /var/cache/conftool/dbconfig/20250822-071804-fceratto.json
  • 07:13 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T401906)', diff saved to https://phabricator.wikimedia.org/P81686 and previous config saved to /var/cache/conftool/dbconfig/20250822-071356-fceratto.json
  • 07:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 07:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T399249)', diff saved to https://phabricator.wikimedia.org/P81685 and previous config saved to /var/cache/conftool/dbconfig/20250822-070257-fceratto.json
  • 03:44 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 00:38 rzl: reprepro -C component/envoy-future include bullseye-wikimedia envoyproxy_1.26.8-1_amd64.changes # T402584 🤦

2025-08-21

  • 23:55 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T399249)', diff saved to https://phabricator.wikimedia.org/P81684 and previous config saved to /var/cache/conftool/dbconfig/20250821-235503-fceratto.json
  • 23:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 23:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T399249)', diff saved to https://phabricator.wikimedia.org/P81683 and previous config saved to /var/cache/conftool/dbconfig/20250821-235440-fceratto.json
  • 23:43 ryankemper@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-internal-scholarly,name=eqiad
  • 23:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P81682 and previous config saved to /var/cache/conftool/dbconfig/20250821-233932-fceratto.json
  • 23:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P81681 and previous config saved to /var/cache/conftool/dbconfig/20250821-232425-fceratto.json
  • 23:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) es2040* gradually with 4 steps - Work done
  • 23:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T399249)', diff saved to https://phabricator.wikimedia.org/P81679 and previous config saved to /var/cache/conftool/dbconfig/20250821-230916-fceratto.json
  • 22:33 ejegg: payments-wiki upgraded from 207b4d6a to 37616266
  • 22:31 ladsgroup@cumin1002: START - Cookbook sre.mysql.pool es2040* gradually with 4 steps - Work done
  • 22:21 denisse: Upgrading to Grafana 12.1.1 in grafana - T402544
  • 22:19 denisse: Upgrading to Grafana 12.1.1 - T402544
  • 22:19 rzl: reprepro -C component/envoy-future include bullseye-wikimedia envoyproxy_1.26.8-1_source.changes # T402584
  • 22:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1027.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 22:10 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 22:04 dzahn@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host people1005.eqiad.wmnet
  • 22:04 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host people1005.eqiad.wmnet with OS trixie
  • 22:01 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 22:01 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 22:00 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 22:00 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 21:57 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 21:53 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on people1005.eqiad.wmnet with reason: host reimage
  • 21:52 ejegg: payments-wiki upgraded from cb76e2b7 to 207b4d6a
  • 21:49 bking@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=search,name=eqiad
  • 21:48 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on people1005.eqiad.wmnet with reason: host reimage
  • 21:38 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 21:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2024.codfw.wmnet -> wdqs2027.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 21:33 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 21:27 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host people1005.eqiad.wmnet with OS trixie
  • 21:26 bking@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 55 hosts with reason: T400160
  • 21:26 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 21:26 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 21:26 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people1005.eqiad.wmnet - dzahn@cumin1002"
  • 21:26 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM people1005.eqiad.wmnet - dzahn@cumin1002"
  • 21:25 bking@cumin1002: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on 55 hosts with reason: T395571
  • 21:25 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) people1005.eqiad.wmnet on all recursors
  • 21:25 dzahn@cumin1002: START - Cookbook sre.dns.wipe-cache people1005.eqiad.wmnet on all recursors
  • 21:25 dzahn@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:25 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1005.eqiad.wmnet - dzahn@cumin1002"
  • 21:25 bking@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=search,name=eqiad
  • 21:23 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1027.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:23 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T386098, transfer newly-reloaded data) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1027.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:22 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1027.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:22 ryankemper: T386098 Depooled eqiad `wdqs-internal-scholarly` in preparation for data transfer
  • 21:21 ryankemper@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=wdqs-internal-scholarly,name=eqiad
  • 21:21 reedy@deploy1003: Finished scap sync-world: Backport for CommonSettings: Add hcaptcha.wikimedia.org to $wgCrossSiteAJAXdomains (T382148) (duration: 11m 39s)
  • 21:18 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM people1005.eqiad.wmnet - dzahn@cumin1002"
  • 21:18 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 21:16 reedy@deploy1003: reedy: Continuing with sync
  • 21:15 reedy@deploy1003: reedy: Backport for CommonSettings: Add hcaptcha.wikimedia.org to $wgCrossSiteAJAXdomains (T382148) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:14 dzahn@cumin1002: START - Cookbook sre.dns.netbox
  • 21:14 dzahn@cumin1002: START - Cookbook sre.ganeti.makevm for new host people1005.eqiad.wmnet
  • 21:09 reedy@deploy1003: Started scap sync-world: Backport for CommonSettings: Add hcaptcha.wikimedia.org to $wgCrossSiteAJAXdomains (T382148)
  • 20:55 ejegg: payments-wiki upgraded from 1235f11f to cb76e2b7
  • 20:54 ejegg: donorwiki upgraded from 5dcb98fd to cb76e2b7
  • 20:53 zabe@deploy1003: Finished scap sync-world: Backport for Stop writing to cl_to and cl_collation on large s7 and s8 wikis (T399579) (duration: 11m 46s)
  • 20:48 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2024.codfw.wmnet -> wdqs2027.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 20:47 zabe@deploy1003: zabe: Continuing with sync
  • 20:47 zabe@deploy1003: zabe: Backport for Stop writing to cl_to and cl_collation on large s7 and s8 wikis (T399579) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:41 zabe@deploy1003: Started scap sync-world: Backport for Stop writing to cl_to and cl_collation on large s7 and s8 wikis (T399579)
  • 20:40 zabe@deploy1003: Finished scap sync-world: Backport for Update redirected link (duration: 11m 06s)
  • 20:39 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1002.eqiad.wmnet with reason: supermicro
  • 20:38 mutante: deleted a bunch of old bounce messages in the exim queue on lists1004
  • 20:37 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
  • 20:35 zabe@deploy1003: zabe, meno25: Continuing with sync
  • 20:35 zabe@deploy1003: zabe, meno25: Backport for Update redirected link synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:32 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2024.codfw.wmnet -> wdqs2027.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 20:32 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2024.codfw.wmnet -> wdqs2027.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 20:29 zabe@deploy1003: Started scap sync-world: Backport for Update redirected link
  • 20:29 zabe@deploy1003: Finished scap sync-world: Backport for Set categorylinks to read new on enwiki (T397912) (duration: 11m 58s)
  • 20:23 zabe@deploy1003: zabe: Continuing with sync
  • 20:23 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 20:22 zabe@deploy1003: zabe: Backport for Set categorylinks to read new on enwiki (T397912) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:19 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
  • 20:19 mutante: lists1004 - sudo exim4 -qf - forced delivery attempt as reaction to alerting about large mail queue
  • 20:17 zabe@deploy1003: Started scap sync-world: Backport for Set categorylinks to read new on enwiki (T397912)
  • 20:07 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
  • 20:00 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:54 ejegg: payments-wiki rolled back from 49bef1cf to 1235f11f
  • 19:53 ejegg: payments-wiki upgraded from 1235f11f to 49bef1cf
  • 19:50 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:48 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: supermicro
  • 19:48 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:44 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:38 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:38 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:37 brett@dns1004: END - running authdns-update
  • 19:36 brett@dns1004: START - running authdns-update
  • 19:35 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:35 jhathaway@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:13 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:11 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: supermicro
  • 19:04 bking@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2113\.codfw\.wmnet
  • 19:04 bking@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=cirrussearch2091\.codfw\.wmnet
  • 19:01 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1047
  • 19:00 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1047
  • 18:56 bking@cumin1002: conftool action : set/weight=10; selector: name=cirrussearch2091.
  • 18:55 bking@cumin1002: conftool action : set/weight=10; selector: name=cirrussearch2113.
  • 18:55 bking@cumin1002: conftool action : set/weight=10; selector: name=cirrussearch2113.
  • 18:55 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1044
  • 18:54 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1044
  • 18:44 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1043
  • 18:44 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1043
  • 18:36 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1042
  • 18:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1042
  • 18:24 dancy@deploy1003: Installation of scap version "4.208.0" completed for 169 hosts
  • 18:23 denisse@deploy1003: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 25.8.0 - T402263 (duration: 00m 18s)
  • 18:22 denisse@deploy1003: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 25.8.0 - T402263
  • 18:20 dancy@deploy1003: Installing scap version "4.208.0" for 169 host(s)
  • 18:18 denisse@deploy1003: Finished deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 25.8.0 - T402263 (duration: 00m 08s)
  • 18:18 denisse@deploy1003: Started deploy [librenms/librenms@f049593]: Upgrade LibreNMS to 25.8.0 - T402263
  • 18:18 denisse: Upgrading LibreNMS to v25.8.0 - T402263
  • 18:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 18:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T399249)', diff saved to https://phabricator.wikimedia.org/P81673 and previous config saved to /var/cache/conftool/dbconfig/20250821-181010-fceratto.json
  • 18:03 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 18:02 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 18:02 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 18:02 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 18:01 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 18:01 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P81672 and previous config saved to /var/cache/conftool/dbconfig/20250821-175503-fceratto.json
  • 17:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P81671 and previous config saved to /var/cache/conftool/dbconfig/20250821-173955-fceratto.json
  • 17:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T399249)', diff saved to https://phabricator.wikimedia.org/P81670 and previous config saved to /var/cache/conftool/dbconfig/20250821-172448-fceratto.json
  • 17:22 reedy@deploy1003: Finished scap sync-world: Backport for Replace use of deprecated ParsoidExtensionAPI::addModuleStyles() (T402370), Avoid PHP notice in AbstractEventRegistrationSpecialPage (country field) (T402441) (duration: 14m 12s)
  • 17:15 reedy@deploy1003: reedy: Continuing with sync
  • 17:15 reedy@deploy1003: reedy: Backport for Replace use of deprecated ParsoidExtensionAPI::addModuleStyles() (T402370), Avoid PHP notice in AbstractEventRegistrationSpecialPage (country field) (T402441) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:08 reedy@deploy1003: Started scap sync-world: Backport for Replace use of deprecated ParsoidExtensionAPI::addModuleStyles() (T402370), Avoid PHP notice in AbstractEventRegistrationSpecialPage (country field) (T402441)
  • 16:55 swfrench@deploy1003: Finished scap sync-world: Deployment to pick up new PHP production images and drop unused metadata label - T402424 T401254 (duration: 36m 37s)
  • 16:43 swfrench@deploy1003: swfrench: Continuing with sync
  • 16:42 swfrench@deploy1003: swfrench: Deployment to pick up new PHP production images and drop unused metadata label - T402424 T401254 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:23 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1023.eqiad.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 16:19 swfrench@deploy1003: Started scap sync-world: Deployment to pick up new PHP production images and drop unused metadata label - T402424 T401254
  • 16:16 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1026.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 16:12 kharlan@deploy1003: Finished scap sync-world: Backport for hCaptcha: Fix topic name for frontend metrics (duration: 10m 17s)
  • 16:08 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1251 (T399249)', diff saved to https://phabricator.wikimedia.org/P81668 and previous config saved to /var/cache/conftool/dbconfig/20250821-160838-fceratto.json
  • 16:08 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1251.eqiad.wmnet with reason: Maintenance
  • 16:07 kharlan@deploy1003: kharlan: Continuing with sync
  • 16:06 kharlan@deploy1003: kharlan: Backport for hCaptcha: Fix topic name for frontend metrics synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:04 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Managing sanitization for wikis amwikimedia, cnwikimedia, donatewiki, gewikimedia, grwikimedia, hiwikimedia, idwikimedia, maiwikimedia, ngwikimedia, nostalgiawiki, punjabiwikimedia, romdwikimedia, rswikimedia, votewiki, wbwikimedia in section s5
  • 16:02 kharlan@deploy1003: Started scap sync-world: Backport for hCaptcha: Fix topic name for frontend metrics
  • 15:53 inflatador_: set cirrussearch2089 to active in netbox T399943
  • 15:46 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T399249)', diff saved to https://phabricator.wikimedia.org/P81667 and previous config saved to /var/cache/conftool/dbconfig/20250821-154605-fceratto.json
  • 15:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 15:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T399249)', diff saved to https://phabricator.wikimedia.org/P81666 and previous config saved to /var/cache/conftool/dbconfig/20250821-154543-fceratto.json
  • 15:34 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1023.eqiad.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 15:33 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1023.eqiad.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 15:33 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1023.eqiad.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 15:32 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1023.eqiad.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 15:32 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1023.eqiad.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 15:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P81664 and previous config saved to /var/cache/conftool/dbconfig/20250821-153036-fceratto.json
  • 15:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1026.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 15:22 zabe@deploy1003: Finished scap sync-world: Backport for Do not bypass LinksMigration for categorylinks (T402494) (duration: 10m 47s)
  • 15:17 zabe@deploy1003: zabe: Continuing with sync
  • 15:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P81663 and previous config saved to /var/cache/conftool/dbconfig/20250821-151528-fceratto.json
  • 15:14 zabe@deploy1003: zabe: Backport for Do not bypass LinksMigration for categorylinks (T402494) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:12 joal@deploy1003: Finished deploy [analytics/refinery@9fc3b38] (thin): Regular analytics weekly train THIN [analytics/refinery@9fc3b380] (duration: 00m 56s)
  • 15:11 joal@deploy1003: Started deploy [analytics/refinery@9fc3b38] (thin): Regular analytics weekly train THIN [analytics/refinery@9fc3b380]
  • 15:11 zabe@deploy1003: Started scap sync-world: Backport for Do not bypass LinksMigration for categorylinks (T402494)
  • 15:11 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis amwikimedia, cnwikimedia, donatewiki, gewikimedia, grwikimedia, hiwikimedia, idwikimedia, maiwikimedia, ngwikimedia, nostalgiawiki, punjabiwikimedia, romdwikimedia, rswikimedia, votewiki, wbwikimedia in section s5
  • 15:11 joal@deploy1003: Finished deploy [analytics/refinery@9fc3b38]: Regular analytics weekly train [analytics/refinery@9fc3b380] (duration: 03m 40s)
  • 15:08 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis amwikimedia, cnwikimedia, donatewiki, gewikimedia, grwikimedia, hiwikimedia, idwikimedia, maiwikimedia, ngwikimedia, nostalgiawiki, punjabiwikimedia, romdwikimedia, rswikimedia, votewiki, wbwikimedia in section s5
  • 15:07 joal@deploy1003: Started deploy [analytics/refinery@9fc3b38]: Regular analytics weekly train [analytics/refinery@9fc3b380]
  • 15:06 joal@deploy1003: Finished deploy [analytics/refinery@9fc3b38] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9fc3b380] (duration: 00m 55s)
  • 15:05 joal@deploy1003: Started deploy [analytics/refinery@9fc3b38] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@9fc3b380]
  • 15:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T399249)', diff saved to https://phabricator.wikimedia.org/P81662 and previous config saved to /var/cache/conftool/dbconfig/20250821-150021-fceratto.json
  • 14:52 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 14:50 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis amwikimedia, cnwikimedia, donatewiki, gewikimedia, grwikimedia, hiwikimedia, idwikimedia, maiwikimedia, ngwikimedia, nostalgiawiki, punjabiwikimedia, romdwikimedia, rswikimedia, votewiki, wbwikimedia in section s5
  • 14:50 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.sanitize-wiki (exit_code=99) Checking sanitization for wikis amwikimedia, cnwikimedia, donatewiki, gewikimedia, grwikimedia, hiwikimedia, idwikimedia, maiwikimedia, ngwikimedia, nostalgiawiki, punjabiwikimedia, romdwikimedia, rswikimedia, votewiki, wbwikimedia in section s5
  • 14:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 14:47 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 14:44 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis amwikimedia, cnwikimedia, donatewiki, gewikimedia, grwikimedia, hiwikimedia, idwikimedia, maiwikimedia, ngwikimedia, nostalgiawiki, punjabiwikimedia, romdwikimedia, rswikimedia, votewiki, wbwikimedia in section s5
  • 14:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on es2040.codfw.wmnet with reason: 10GB-fication
  • 14:33 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 14:33 bking@cumin1002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool es2040 T399927', diff saved to https://phabricator.wikimedia.org/P81661 and previous config saved to /var/cache/conftool/dbconfig/20250821-143039-ladsgroup.json
  • 14:17 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 14:16 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 13:52 kartik@deploy1003: Finished scap sync-world: Backport for CX3 Build 1.0.0+20250821 (T387427) (duration: 19m 38s)
  • 13:47 kartik@deploy1003: kartik: Continuing with sync
  • 13:47 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 13:46 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 13:36 kartik@deploy1003: kartik: Backport for CX3 Build 1.0.0+20250821 (T387427) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:32 kartik@deploy1003: Started scap sync-world: Backport for CX3 Build 1.0.0+20250821 (T387427)
  • 13:30 urbanecm@deploy1003: Finished scap sync-world: Backport for Set wgCampaignEventsCountrySchemaMigrationStage to MIGRATION_WRITE_NEW (T397476) (duration: 14m 01s)
  • 13:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 13:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T399249)', diff saved to https://phabricator.wikimedia.org/P81660 and previous config saved to /var/cache/conftool/dbconfig/20250821-132839-fceratto.json
  • 13:27 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 13:27 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 13:27 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 13:26 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 13:25 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 13:25 urbanecm@deploy1003: urbanecm, daimona: Continuing with sync
  • 13:22 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 13:21 urbanecm@deploy1003: urbanecm, daimona: Backport for Set wgCampaignEventsCountrySchemaMigrationStage to MIGRATION_WRITE_NEW (T397476) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:21 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Bugfix - oblivian@cumin1003"
  • 13:21 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfix - oblivian@cumin1003
  • 13:20 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfix - oblivian@cumin1003
  • 13:20 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Bugfix - oblivian@cumin1003"
  • 13:19 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 13:16 urbanecm@deploy1003: Started scap sync-world: Backport for Set wgCampaignEventsCountrySchemaMigrationStage to MIGRATION_WRITE_NEW (T397476)
  • 13:16 urbanecm@deploy1003: Finished scap sync-world: Backport for bewwiktionary: set sitename, project namespace & timezone (T402134), bewwiktionary: add logos (T402134) (duration: 11m 34s)
  • 13:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P81659 and previous config saved to /var/cache/conftool/dbconfig/20250821-131333-fceratto.json
  • 13:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm7001.magru.wmnet with OS bookworm
  • 13:09 urbanecm@deploy1003: urbanecm, anzx: Continuing with sync
  • 13:07 urbanecm@deploy1003: urbanecm, anzx: Backport for bewwiktionary: set sitename, project namespace & timezone (T402134), bewwiktionary: add logos (T402134) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:04 urbanecm@deploy1003: Started scap sync-world: Backport for bewwiktionary: set sitename, project namespace & timezone (T402134), bewwiktionary: add logos (T402134)
  • 12:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P81658 and previous config saved to /var/cache/conftool/dbconfig/20250821-125825-fceratto.json
  • 12:53 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm7001.magru.wmnet with reason: host reimage
  • 12:48 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm7001.magru.wmnet with reason: host reimage
  • 12:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T399249)', diff saved to https://phabricator.wikimedia.org/P81657 and previous config saved to /var/cache/conftool/dbconfig/20250821-124318-fceratto.json
  • 12:24 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host testvm7001.magru.wmnet with OS bookworm
  • 12:21 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 12:21 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 12:18 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 12:18 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 12:15 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 12:15 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 12:14 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 12:11 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 12:11 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 12:11 stevemunene@deploy1003: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'sync'.
  • 11:33 stevemunene@deploy1003: helmfile [dse-k8s-codfw] START helmfile.d/admin 'sync'.
  • 11:31 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T399249)', diff saved to https://phabricator.wikimedia.org/P81656 and previous config saved to /var/cache/conftool/dbconfig/20250821-113101-fceratto.json
  • 11:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 11:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T399249)', diff saved to https://phabricator.wikimedia.org/P81655 and previous config saved to /var/cache/conftool/dbconfig/20250821-113039-fceratto.json
  • 11:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P81654 and previous config saved to /var/cache/conftool/dbconfig/20250821-111531-fceratto.json
  • 11:10 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm7001.magru.wmnet with OS bookworm
  • 11:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P81653 and previous config saved to /var/cache/conftool/dbconfig/20250821-110024-fceratto.json
  • 10:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm7001.magru.wmnet with reason: host reimage
  • 10:50 stevemunene@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs[2013-2014].codfw.wmnet} and A:lvs (T397301)
  • 10:48 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm7001.magru.wmnet with reason: host reimage
  • 10:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T399249)', diff saved to https://phabricator.wikimedia.org/P81652 and previous config saved to /var/cache/conftool/dbconfig/20250821-104516-fceratto.json
  • 10:37 stevemunene@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs[2013-2014].codfw.wmnet} and A:lvs (T397301)
  • 10:26 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host testvm7001.magru.wmnet with OS bookworm
  • 10:25 ayounsi@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host testvm7001.magru.wmnet with OS bookworm
  • 10:17 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host testvm7001.magru.wmnet with OS bookworm
  • 09:47 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:30 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T399249)', diff saved to https://phabricator.wikimedia.org/P81651 and previous config saved to /var/cache/conftool/dbconfig/20250821-093011-fceratto.json
  • 09:30 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 09:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T399249)', diff saved to https://phabricator.wikimedia.org/P81650 and previous config saved to /var/cache/conftool/dbconfig/20250821-092948-fceratto.json
  • 09:19 ayounsi@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm7001.magru.wmnet
  • 09:19 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm7001.magru.wmnet with OS bookworm
  • 09:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P81648 and previous config saved to /var/cache/conftool/dbconfig/20250821-091441-fceratto.json
  • 09:03 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm7001.magru.wmnet with reason: host reimage
  • 08:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P81647 and previous config saved to /var/cache/conftool/dbconfig/20250821-085933-fceratto.json
  • 08:58 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm7001.magru.wmnet with reason: host reimage
  • 08:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T399249)', diff saved to https://phabricator.wikimedia.org/P81646 and previous config saved to /var/cache/conftool/dbconfig/20250821-084426-fceratto.json
  • 08:31 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host testvm7001.magru.wmnet with OS bookworm
  • 08:28 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.15 refs T396376
  • 08:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm7001.magru.wmnet - ayounsi@cumin1003"
  • 08:24 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm7001.magru.wmnet - ayounsi@cumin1003"
  • 08:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm7001.magru.wmnet on all recursors
  • 08:24 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache testvm7001.magru.wmnet on all recursors
  • 08:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm7001.magru.wmnet - ayounsi@cumin1003"
  • 08:23 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm7001.magru.wmnet - ayounsi@cumin1003"
  • 08:18 joelyrookewmde: Finished populateSitesTable for bewwiktionary https://phabricator.wikimedia.org/T402130
  • 08:18 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 08:18 ayounsi@cumin1003: START - Cookbook sre.ganeti.makevm for new host testvm7001.magru.wmnet
  • 08:11 moritzm: installing openjdk-17 security updates
  • 08:11 taavi@cumin1003: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database tlwikisource (T388657)
  • 08:10 taavi@cumin1003: START - Cookbook sre.wikireplicas.add-wiki for database tlwikisource (T388657)
  • 08:10 taavi@cumin1003: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database madwikisource (T391770)
  • 08:10 taavi@cumin1003: START - Cookbook sre.wikireplicas.add-wiki for database madwikisource (T391770)
  • 08:10 taavi@cumin1003: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database minwikibooks (T395502)
  • 08:10 taavi@cumin1003: START - Cookbook sre.wikireplicas.add-wiki for database minwikibooks (T395502)
  • 08:10 taavi@cumin1003: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database zghwiktionary (T399788)
  • 08:09 taavi@cumin1003: START - Cookbook sre.wikireplicas.add-wiki for database zghwiktionary (T399788)
  • 08:09 taavi@cumin1003: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database bewwiktionary (T402137)
  • 08:09 taavi@cumin1003: START - Cookbook sre.wikireplicas.add-wiki for database bewwiktionary (T402137)
  • 08:08 taavi@cumin1003: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database rkiwiki (T392502)
  • 07:53 joelyrookewmde: ^for T402130
  • 07:52 joelyrookewmde@deploy1003: mwscript-k8s job started: foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https # [Add wikidata support ticket PhabId]
  • 07:31 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T399249)', diff saved to https://phabricator.wikimedia.org/P81645 and previous config saved to /var/cache/conftool/dbconfig/20250821-073125-fceratto.json
  • 07:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 07:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T399249)', diff saved to https://phabricator.wikimedia.org/P81644 and previous config saved to /var/cache/conftool/dbconfig/20250821-073102-fceratto.json
  • 07:21 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T399249)', diff saved to https://phabricator.wikimedia.org/P81643 and previous config saved to /var/cache/conftool/dbconfig/20250821-072136-fceratto.json
  • 07:21 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 07:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T399249)', diff saved to https://phabricator.wikimedia.org/P81642 and previous config saved to /var/cache/conftool/dbconfig/20250821-072113-fceratto.json
  • 07:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P81641 and previous config saved to /var/cache/conftool/dbconfig/20250821-071554-fceratto.json
  • 07:13 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1023.eqiad.wmnet -> wdqs2026.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 07:12 taavi@cumin1003: START - Cookbook sre.wikireplicas.add-wiki for database rkiwiki (T392502)
  • 07:08 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
  • 07:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P81640 and previous config saved to /var/cache/conftool/dbconfig/20250821-070605-fceratto.json
  • 07:01 moritzm: installing openjdk-21security updates
  • 07:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P81639 and previous config saved to /var/cache/conftool/dbconfig/20250821-070047-fceratto.json
  • 06:56 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Add deprecation scope - oblivian@cumin1003"
  • 06:56 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Add deprecation scope - oblivian@cumin1003
  • 06:55 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Add deprecation scope - oblivian@cumin1003
  • 06:55 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Add deprecation scope - oblivian@cumin1003"
  • 06:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P81638 and previous config saved to /var/cache/conftool/dbconfig/20250821-065057-fceratto.json
  • 06:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T399249)', diff saved to https://phabricator.wikimedia.org/P81637 and previous config saved to /var/cache/conftool/dbconfig/20250821-064539-fceratto.json
  • 06:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T399249)', diff saved to https://phabricator.wikimedia.org/P81636 and previous config saved to /var/cache/conftool/dbconfig/20250821-063550-fceratto.json
  • 06:32 eileen: config revision changed from 45c6fa38 to 3378426a
  • 06:22 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1023.eqiad.wmnet -> wdqs2026.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 06:19 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1023.eqiad.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 05:25 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1023.eqiad.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 05:24 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1026.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 05:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T399249)', diff saved to https://phabricator.wikimedia.org/P81635 and previous config saved to /var/cache/conftool/dbconfig/20250821-051809-fceratto.json
  • 05:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 05:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T399249)', diff saved to https://phabricator.wikimedia.org/P81634 and previous config saved to /var/cache/conftool/dbconfig/20250821-051746-fceratto.json
  • 05:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P81633 and previous config saved to /var/cache/conftool/dbconfig/20250821-050239-fceratto.json
  • 04:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P81632 and previous config saved to /var/cache/conftool/dbconfig/20250821-044731-fceratto.json
  • 04:33 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1026.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 04:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T399249)', diff saved to https://phabricator.wikimedia.org/P81631 and previous config saved to /var/cache/conftool/dbconfig/20250821-043224-fceratto.json
  • 02:10 eileen: civicrm upgraded from 279b3993 to c3b729c0
  • 01:41 ejegg: payments-wiki rolled back from 0944453c to 1235f11f
  • 01:23 ejegg: payments-wiki upgraded from 1235f11f to 0944453c
  • 01:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T399249)', diff saved to https://phabricator.wikimedia.org/P81625 and previous config saved to /var/cache/conftool/dbconfig/20250821-012253-fceratto.json
  • 01:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 01:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T399249)', diff saved to https://phabricator.wikimedia.org/P81624 and previous config saved to /var/cache/conftool/dbconfig/20250821-012230-fceratto.json
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 12m 11s)
  • 01:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P81623 and previous config saved to /var/cache/conftool/dbconfig/20250821-010723-fceratto.json
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P81622 and previous config saved to /var/cache/conftool/dbconfig/20250821-005215-fceratto.json
  • 00:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T399249)', diff saved to https://phabricator.wikimedia.org/P81621 and previous config saved to /var/cache/conftool/dbconfig/20250821-003707-fceratto.json
  • 00:24 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T399249)', diff saved to https://phabricator.wikimedia.org/P81620 and previous config saved to /var/cache/conftool/dbconfig/20250821-002406-fceratto.json
  • 00:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 00:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 00:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T399249)', diff saved to https://phabricator.wikimedia.org/P81619 and previous config saved to /var/cache/conftool/dbconfig/20250821-002325-fceratto.json
  • 00:09 vriley@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:09 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1045
  • 00:08 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1045
  • 00:08 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P81618 and previous config saved to /var/cache/conftool/dbconfig/20250821-000817-fceratto.json
  • 00:05 vriley@cumin1003: START - Cookbook sre.dns.netbox

2025-08-20

  • 23:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis bewwiktionary in section s5
  • 23:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P81617 and previous config saved to /var/cache/conftool/dbconfig/20250820-235310-fceratto.json
  • 23:46 ladsgroup@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis bewwiktionary in section s5
  • 23:43 zabe@deploy1003: Finished scap sync-world: Backport for Update interwiki cache (T402130) (duration: 10m 03s)
  • 23:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T399249)', diff saved to https://phabricator.wikimedia.org/P81616 and previous config saved to /var/cache/conftool/dbconfig/20250820-233802-fceratto.json
  • 23:37 zabe@deploy1003: zabe: Continuing with sync
  • 23:37 zabe@deploy1003: zabe: Backport for Update interwiki cache (T402130) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:32 zabe@deploy1003: Started scap sync-world: Backport for Update interwiki cache (T402130)
  • 23:30 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1044.eqiad.wmnet with OS bullseye
  • 23:30 zabe@deploy1003: Finished scap sync-world: Backport for Activate bewwiktionary (T402130) (duration: 11m 12s)
  • 23:24 zabe@deploy1003: zabe: Continuing with sync
  • 23:23 zabe@deploy1003: zabe: Backport for Activate bewwiktionary (T402130) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:18 zabe@deploy1003: Started scap sync-world: Backport for Activate bewwiktionary (T402130)
  • 23:16 zabe: create Wiktionary Betawi # T402130
  • 23:15 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 23:14 eileen: tools upgraded from 0d6df34a to 284579a9
  • 23:13 zabe@deploy1003: Finished scap sync-world: Backport for Initial configuration for bewwiktionary (T402130) (duration: 11m 59s)
  • 23:12 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1044.eqiad.wmnet with reason: host reimage
  • 23:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T399249)', diff saved to https://phabricator.wikimedia.org/P81615 and previous config saved to /var/cache/conftool/dbconfig/20250820-231148-fceratto.json
  • 23:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 23:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T399249)', diff saved to https://phabricator.wikimedia.org/P81614 and previous config saved to /var/cache/conftool/dbconfig/20250820-231107-fceratto.json
  • 23:11 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 23:08 zabe@deploy1003: zabe: Continuing with sync
  • 23:08 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1044.eqiad.wmnet with reason: host reimage
  • 23:06 zabe@deploy1003: zabe: Backport for Initial configuration for bewwiktionary (T402130) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:01 zabe@deploy1003: Started scap sync-world: Backport for Initial configuration for bewwiktionary (T402130)
  • 22:58 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 22:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P81613 and previous config saved to /var/cache/conftool/dbconfig/20250820-225600-fceratto.json
  • 22:54 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 22:52 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1051.eqiad.wmnet with reason: host reimage
  • 22:51 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1050.eqiad.wmnet with reason: host reimage
  • 22:50 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1023.eqiad.wmnet -> wdqs2023.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 22:47 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2019.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 22:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2008.codfw.wmnet -> wdqs2020.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 22:45 jdlrobson@deploy1003: Finished scap sync-world: Backport for Hide content for wgReadingListsAnonymizedPreviews = true (T402050), Hide content for wgReadingListsAnonymizedPreviews = true (T402050) (duration: 10m 54s)
  • 22:41 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1044.eqiad.wmnet with OS bullseye
  • 22:40 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1044.eqiad.wmnet with OS bullseye
  • 22:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P81612 and previous config saved to /var/cache/conftool/dbconfig/20250820-224052-fceratto.json
  • 22:40 jdlrobson@deploy1003: jdlrobson: Continuing with sync
  • 22:39 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1049.eqiad.wmnet with OS bullseye
  • 22:38 jdlrobson@deploy1003: jdlrobson: Backport for Hide content for wgReadingListsAnonymizedPreviews = true (T402050), Hide content for wgReadingListsAnonymizedPreviews = true (T402050) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:38 cwhite: resize prometheus/k8s-dse +25G on prometheus100[78]
  • 22:37 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 22:34 jdlrobson@deploy1003: Started scap sync-world: Backport for Hide content for wgReadingListsAnonymizedPreviews = true (T402050), Hide content for wgReadingListsAnonymizedPreviews = true (T402050)
  • 22:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1013.eqiad.wmnet -> wdqs1021.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 22:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 22:28 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1044.eqiad.wmnet with OS bullseye
  • 22:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T399249)', diff saved to https://phabricator.wikimedia.org/P81611 and previous config saved to /var/cache/conftool/dbconfig/20250820-222544-fceratto.json
  • 22:25 jdlrobson@deploy1003: Finished scap sync-world: Backport for Restore $wgReadingListsAnonymizedPreviews feature flag for shared lists (T402050), Restore $wgReadingListsAnonymizedPreviews feature flag for shared lists (T402050) (duration: 11m 07s)
  • 22:25 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1051.eqiad.wmnet with OS bullseye
  • 22:23 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1050.eqiad.wmnet with OS bullseye
  • 22:22 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 22:21 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1025.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 22:20 jdlrobson@deploy1003: jdlrobson: Continuing with sync
  • 22:19 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1042.eqiad.wmnet with reason: host reimage
  • 22:18 jdlrobson@deploy1003: jdlrobson: Backport for Restore $wgReadingListsAnonymizedPreviews feature flag for shared lists (T402050), Restore $wgReadingListsAnonymizedPreviews feature flag for shared lists (T402050) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:17 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 22:14 jdlrobson@deploy1003: Started scap sync-world: Backport for Restore $wgReadingListsAnonymizedPreviews feature flag for shared lists (T402050), Restore $wgReadingListsAnonymizedPreviews feature flag for shared lists (T402050)
  • 22:13 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1049.eqiad.wmnet with reason: host reimage
  • 22:11 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1042.eqiad.wmnet with reason: host reimage
  • 22:04 ryankemper: [WDQS] `ryankemper@wdqs1016:~$ sudo systemctl restart wdqs-blazegraph`
  • 22:00 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 21:57 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1023.eqiad.wmnet -> wdqs2023.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 21:56 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1048.eqiad.wmnet with reason: host reimage
  • 21:52 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2008.codfw.wmnet -> wdqs2020.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 21:51 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2019.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 21:46 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1049.eqiad.wmnet with OS bullseye
  • 21:44 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 21:42 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 21:42 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1013.eqiad.wmnet -> wdqs1021.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:39 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1012.eqiad.wmnet -> wdqs1017.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:37 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:33 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 21:30 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1025.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:27 ejegg: fundraising civicrm upgraded from b6f2ff27 to 279b3993
  • 21:27 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1048.eqiad.wmnet with OS bullseye
  • 21:25 ecarg@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:25 ecarg@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:24 ecarg@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:24 ecarg@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:23 ecarg@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:23 ecarg@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:23 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1025.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:23 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1011.eqiad.wmnet -> wdqs1025.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:21 vriley@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:20 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1045
  • 21:20 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1045
  • 21:20 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:20 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1045 - vriley@cumin1003"
  • 21:20 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1045 - vriley@cumin1003"
  • 21:16 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 21:16 ecarg@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 21:15 ecarg@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 21:15 ecarg@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 21:14 ecarg@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 21:13 ecarg@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 21:12 ecarg@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 21:05 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T399249)', diff saved to https://phabricator.wikimedia.org/P81610 and previous config saved to /var/cache/conftool/dbconfig/20250820-210500-fceratto.json
  • 21:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 21:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T399249)', diff saved to https://phabricator.wikimedia.org/P81609 and previous config saved to /var/cache/conftool/dbconfig/20250820-210437-fceratto.json
  • 20:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P81608 and previous config saved to /var/cache/conftool/dbconfig/20250820-204929-fceratto.json
  • 20:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P81607 and previous config saved to /var/cache/conftool/dbconfig/20250820-203422-fceratto.json
  • 20:23 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 20:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T399249)', diff saved to https://phabricator.wikimedia.org/P81606 and previous config saved to /var/cache/conftool/dbconfig/20250820-201914-fceratto.json
  • 20:02 brett@dns1004: END - running authdns-update
  • 20:01 brett@dns1004: START - running authdns-update
  • 19:13 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:13 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 19:12 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 19:11 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 18:56 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 18:55 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 18:54 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:53 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T399249)', diff saved to https://phabricator.wikimedia.org/P81605 and previous config saved to /var/cache/conftool/dbconfig/20250820-185344-fceratto.json
  • 18:53 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 18:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T399249)', diff saved to https://phabricator.wikimedia.org/P81604 and previous config saved to /var/cache/conftool/dbconfig/20250820-185321-fceratto.json
  • 18:45 ejegg: standalone (IPN listener) SmashPig upgraded from ab4b9cd1 to 77dc08bd
  • 18:43 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:43 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:39 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P81603 and previous config saved to /var/cache/conftool/dbconfig/20250820-183814-fceratto.json
  • 18:24 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1002.eqiad.wmnet with reason: supermicro
  • 18:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P81600 and previous config saved to /var/cache/conftool/dbconfig/20250820-182307-fceratto.json
  • 18:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T399249)', diff saved to https://phabricator.wikimedia.org/P81599 and previous config saved to /var/cache/conftool/dbconfig/20250820-180759-fceratto.json
  • 17:54 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 17:44 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 17:28 swfrench@deploy1003: Finished scap sync-world: No-build deployment to apply mw-debug/next helmfile diffs - T401254 (duration: 05m 57s)
  • 17:23 swfrench@deploy1003: Started scap sync-world: No-build deployment to apply mw-debug/next helmfile diffs - T401254
  • 17:19 swfrench@deploy1003: Stopping before sync operations
  • 17:18 swfrench@deploy1003: Started scap sync-world: No-sync deployment to verify mw-debug/next helmfile diffs - T401254
  • 17:08 swfrench@deploy1003: Finished scap sync-world: Deployment to pick up build-report cleanup - T401721 (duration: 02m 41s)
  • 17:05 swfrench@deploy1003: Started scap sync-world: Deployment to pick up build-report cleanup - T401721
  • 16:48 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Bugfix - oblivian@cumin1003"
  • 16:48 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfix - oblivian@cumin1003
  • 16:47 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Bugfix - oblivian@cumin1003
  • 16:47 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Bugfix - oblivian@cumin1003"
  • 16:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T399249)', diff saved to https://phabricator.wikimedia.org/P81597 and previous config saved to /var/cache/conftool/dbconfig/20250820-164005-fceratto.json
  • 16:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 16:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T399249)', diff saved to https://phabricator.wikimedia.org/P81596 and previous config saved to /var/cache/conftool/dbconfig/20250820-163942-fceratto.json
  • 16:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P81595 and previous config saved to /var/cache/conftool/dbconfig/20250820-162435-fceratto.json
  • 16:16 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "UX improvements - oblivian@cumin1003"
  • 16:16 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: UX improvements - oblivian@cumin1003
  • 16:15 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: UX improvements - oblivian@cumin1003
  • 16:15 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "UX improvements - oblivian@cumin1003"
  • 16:11 zabe@deploy1003: Finished scap sync-world: Backport for Stop writing to cl_to and cl_collation on more wikis (T399579) (duration: 08m 16s)
  • 16:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P81594 and previous config saved to /var/cache/conftool/dbconfig/20250820-160927-fceratto.json
  • 16:06 zabe@deploy1003: zabe: Continuing with sync
  • 16:06 zabe@deploy1003: zabe: Backport for Stop writing to cl_to and cl_collation on more wikis (T399579) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:03 zabe@deploy1003: Started scap sync-world: Backport for Stop writing to cl_to and cl_collation on more wikis (T399579)
  • 15:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T399249)', diff saved to https://phabricator.wikimedia.org/P81593 and previous config saved to /var/cache/conftool/dbconfig/20250820-155420-fceratto.json
  • 15:49 jgleeson: SmashPig standalone upgraded from ebb6e309 to ab4b9cd1
  • 15:47 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T399249)', diff saved to https://phabricator.wikimedia.org/P81591 and previous config saved to /var/cache/conftool/dbconfig/20250820-154749-fceratto.json
  • 15:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 15:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T399249)', diff saved to https://phabricator.wikimedia.org/P81590 and previous config saved to /var/cache/conftool/dbconfig/20250820-154726-fceratto.json
  • 15:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host build2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:43 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host build2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 15:38 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:36 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P81589 and previous config saved to /var/cache/conftool/dbconfig/20250820-153219-fceratto.json
  • 15:31 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:29 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1002.eqiad.wmnet with reason: supermicro
  • 15:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P81588 and previous config saved to /var/cache/conftool/dbconfig/20250820-151712-fceratto.json
  • 15:05 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubesvc),name=dse-k8s-worker2003.codfw.wmnet
  • 15:05 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubesvc),name=dse-k8s-worker2002.codfw.wmnet
  • 15:05 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubesvc),name=dse-k8s-worker2001.codfw.wmnet
  • 15:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T399249)', diff saved to https://phabricator.wikimedia.org/P81587 and previous config saved to /var/cache/conftool/dbconfig/20250820-150204-fceratto.json
  • 14:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T399249)', diff saved to https://phabricator.wikimedia.org/P81586 and previous config saved to /var/cache/conftool/dbconfig/20250820-143222-fceratto.json
  • 14:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 14:19 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubemaster),name=dse-k8s-ctrl2002.codfw.wmnet
  • 14:19 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubemaster),name=dse-k8s-ctrl2001.codfw.wmnet
  • 13:03 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:45 claime: Restarting ipoid-daily-update job - T402388
  • 12:42 ayounsi@cumin1003: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host atlas7001.wikimedia.org
  • 12:42 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas7001.wikimedia.org - ayounsi@cumin1003"
  • 12:42 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas7001.wikimedia.org - ayounsi@cumin1003"
  • 12:41 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas7001.wikimedia.org on all recursors
  • 12:41 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache atlas7001.wikimedia.org on all recursors
  • 12:41 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:41 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas7001.wikimedia.org - ayounsi@cumin1003"
  • 12:41 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas7001.wikimedia.org - ayounsi@cumin1003"
  • 12:38 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 12:37 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 12:37 ayounsi@cumin1003: START - Cookbook sre.ganeti.makevm for new host atlas7001.wikimedia.org
  • 12:36 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 12:36 jnuche@deploy1003: Finished scap sync-world: Backport for Omit empty username in JCApiUtils::initApiRequestObj (T402273) (duration: 07m 55s)
  • 12:36 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 12:35 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 12:35 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 12:35 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 12:31 jnuche@deploy1003: jnuche: Continuing with sync
  • 12:30 jnuche@deploy1003: jnuche: Backport for Omit empty username in JCApiUtils::initApiRequestObj (T402273) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:30 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 12:28 jnuche@deploy1003: Started scap sync-world: Backport for Omit empty username in JCApiUtils::initApiRequestObj (T402273)
  • 12:26 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:26 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:26 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:24 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:15 jnuche@deploy1003: Installation of scap version "4.207.0" completed for 4 hosts
  • 11:12 jnuche@deploy1003: Installing scap version "4.207.0" for 4 host(s)
  • 10:38 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubesvc),name=dse-k8s-worker1019.eqiad.wmnet
  • 10:38 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubesvc),name=dse-k8s-worker1018.eqiad.wmnet
  • 10:38 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubesvc),name=dse-k8s-worker1017.eqiad.wmnet
  • 10:38 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubesvc),name=dse-k8s-worker1016.eqiad.wmnet
  • 10:38 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubesvc),name=dse-k8s-worker1015.eqiad.wmnet
  • 10:38 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubesvc),name=dse-k8s-worker1013.eqiad.wmnet
  • 10:37 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubesvc),name=dse-k8s-worker1012.eqiad.wmnet
  • 10:33 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubesvc),name=dse-k8s-worker1011.eqiad.wmnet
  • 10:32 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubesvc),name=dse-k8s-worker1010.eqiad.wmnet
  • 10:31 stevemunene@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: service=(kubesvc),name=dse-k8s-worker1009.eqiad.wmnet
  • 09:44 Dreamy_Jazz: Running `/usr/local/bin/foreachwikiindblist mediamoderation-continuous-scan.dblist extensions/MediaModeration/maintenance/importExistingFilesToScanTable.php --force --start-timestamp "20230701010101" --batch-size "5000"`
  • 09:11 derick@deploy1003: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=foundationwiki --logwiki=metawiki Selfkilla666 Cowsheepcool # T402364
  • 08:18 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.15 refs T396376
  • 07:17 kartik@deploy1003: Finished scap sync-world: Backport for MinT: Add stream configuration and registration (T397600 T397043) (duration: 11m 21s)
  • 07:12 kartik@deploy1003: kartik, hueitan: Continuing with sync
  • 07:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T399249)', diff saved to https://phabricator.wikimedia.org/P81582 and previous config saved to /var/cache/conftool/dbconfig/20250820-071043-fceratto.json
  • 07:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 07:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T399249)', diff saved to https://phabricator.wikimedia.org/P81581 and previous config saved to /var/cache/conftool/dbconfig/20250820-071020-fceratto.json
  • 07:08 kartik@deploy1003: kartik, hueitan: Backport for MinT: Add stream configuration and registration (T397600 T397043) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:06 kartik@deploy1003: Started scap sync-world: Backport for MinT: Add stream configuration and registration (T397600 T397043)
  • 06:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P81580 and previous config saved to /var/cache/conftool/dbconfig/20250820-065513-fceratto.json
  • 06:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P81579 and previous config saved to /var/cache/conftool/dbconfig/20250820-064005-fceratto.json
  • 06:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 06:32 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru sandbox to routed ganeti - ayounsi@cumin1003"
  • 06:31 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: magru sandbox to routed ganeti - ayounsi@cumin1003"
  • 06:29 jmm@dns1004: END - running authdns-update
  • 06:28 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Introduce policies - oblivian@cumin1003"
  • 06:28 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Introduce policies - oblivian@cumin1003
  • 06:28 jmm@dns1004: START - running authdns-update
  • 06:27 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Introduce policies - oblivian@cumin1003
  • 06:27 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Introduce policies - oblivian@cumin1003"
  • 06:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 06:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T399249)', diff saved to https://phabricator.wikimedia.org/P81578 and previous config saved to /var/cache/conftool/dbconfig/20250820-062457-fceratto.json
  • 06:02 kharlan@deploy1003: Finished scap sync-world: Backport for Enable hCaptcha on test2wiki (T382148) (duration: 11m 48s)
  • 05:57 kharlan@deploy1003: dreamyjazz, kharlan: Continuing with sync
  • 05:53 kharlan@deploy1003: dreamyjazz, kharlan: Backport for Enable hCaptcha on test2wiki (T382148) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 05:50 kharlan@deploy1003: Started scap sync-world: Backport for Enable hCaptcha on test2wiki (T382148)
  • 03:07 ejegg: fundraising python tools upgraded from 6ddfb22f to 0d6df34a
  • 00:26 btullis@cumin1003: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on P{dse-k8s-worker10[15-19].eqiad.wmnet} and (A:dse-k8s-master or A:dse-k8s-worker)

2025-08-19

  • 23:33 zabe@deploy1003: Finished scap sync-world: Backport for Stop writing to cl_to and cl_collation on more wikis (T399579) (duration: 08m 58s)
  • 23:28 zabe@deploy1003: zabe: Continuing with sync
  • 23:27 zabe@deploy1003: zabe: Backport for Stop writing to cl_to and cl_collation on more wikis (T399579) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:24 zabe@deploy1003: Started scap sync-world: Backport for Stop writing to cl_to and cl_collation on more wikis (T399579)
  • 23:06 btullis@cumin1003: START - Cookbook sre.k8s.reboot-nodes rolling reboot on P{dse-k8s-worker10[15-19].eqiad.wmnet} and (A:dse-k8s-master or A:dse-k8s-worker)
  • 22:45 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 22:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T399249)', diff saved to https://phabricator.wikimedia.org/P81576 and previous config saved to /var/cache/conftool/dbconfig/20250819-224028-fceratto.json
  • 22:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 22:29 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-test-coord1002.eqiad.wmnet with reason: host reimage
  • 22:25 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-test-coord1002.eqiad.wmnet with reason: host reimage
  • 22:04 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 21:44 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 21:43 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 21:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T402010)', diff saved to https://phabricator.wikimedia.org/P81575 and previous config saved to /var/cache/conftool/dbconfig/20250819-213725-ladsgroup.json
  • 21:35 jdlrobson@deploy1003: Finished scap sync-world: Backport for Revert "Stop sending more than one og:image to social media platforms" (duration: 12m 47s)
  • 21:28 jdlrobson@deploy1003: jdlrobson: Continuing with sync
  • 21:27 jdlrobson@deploy1003: jdlrobson: Backport for Revert "Stop sending more than one og:image to social media platforms" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:23 jdlrobson@deploy1003: Started scap sync-world: Backport for Revert "Stop sending more than one og:image to social media platforms"
  • 21:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P81574 and previous config saved to /var/cache/conftool/dbconfig/20250819-212218-ladsgroup.json
  • 21:16 eevans@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 21:15 eevans@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 21:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P81573 and previous config saved to /var/cache/conftool/dbconfig/20250819-210710-ladsgroup.json
  • 20:52 zabe@deploy1003: Finished scap sync-world: Backport for Restore inadvertently removed messages (T153988) (duration: 36m 31s)
  • 20:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T402010)', diff saved to https://phabricator.wikimedia.org/P81572 and previous config saved to /var/cache/conftool/dbconfig/20250819-205203-ladsgroup.json
  • 20:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T402010)', diff saved to https://phabricator.wikimedia.org/P81571 and previous config saved to /var/cache/conftool/dbconfig/20250819-204935-ladsgroup.json
  • 20:49 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 20:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T402010)', diff saved to https://phabricator.wikimedia.org/P81570 and previous config saved to /var/cache/conftool/dbconfig/20250819-204913-ladsgroup.json
  • 20:40 eileen: * civicrm upgraded from 51580e2e to b6f2ff27
  • 20:40 zabe@deploy1003: chlod, zabe: Continuing with sync
  • 20:39 zabe@deploy1003: chlod, zabe: Backport for Restore inadvertently removed messages (T153988) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P81569 and previous config saved to /var/cache/conftool/dbconfig/20250819-203405-ladsgroup.json
  • 20:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P81568 and previous config saved to /var/cache/conftool/dbconfig/20250819-201858-ladsgroup.json
  • 20:16 zabe@deploy1003: Started scap sync-world: Backport for Restore inadvertently removed messages (T153988)
  • 20:15 brett@dns1004: END - running authdns-update
  • 20:13 brett@dns1004: START - running authdns-update
  • 20:05 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on sretest2001.codfw.wmnet with reason: supermicro
  • 20:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T402010)', diff saved to https://phabricator.wikimedia.org/P81567 and previous config saved to /var/cache/conftool/dbconfig/20250819-200350-ladsgroup.json
  • 20:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T402010)', diff saved to https://phabricator.wikimedia.org/P81566 and previous config saved to /var/cache/conftool/dbconfig/20250819-200122-ladsgroup.json
  • 20:01 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 20:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T402010)', diff saved to https://phabricator.wikimedia.org/P81565 and previous config saved to /var/cache/conftool/dbconfig/20250819-200100-ladsgroup.json
  • 19:52 brett@dns1004: END - running authdns-update
  • 19:51 brett@dns1004: START - running authdns-update
  • 19:50 brett: import ncmonitor 2.0.0 into bookworm-wikimedia
  • 19:47 bking@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=search,name=eqiad
  • 19:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P81564 and previous config saved to /var/cache/conftool/dbconfig/20250819-194552-ladsgroup.json
  • 19:45 mszabo@deploy1003: Finished scap sync-world: Backport for AbuseFilterHooks: Gracefully handle performers without actor records (T402298) (duration: 11m 36s)
  • 19:44 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply logging config change - bking@cumin1002 - T395571
  • 19:39 mszabo@deploy1003: mszabo: Continuing with sync
  • 19:38 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply logging config change - bking@cumin1002 - T395571
  • 19:35 mszabo@deploy1003: mszabo: Backport for AbuseFilterHooks: Gracefully handle performers without actor records (T402298) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:33 mszabo@deploy1003: Started scap sync-world: Backport for AbuseFilterHooks: Gracefully handle performers without actor records (T402298)
  • 19:32 bking@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 55 hosts with reason: T395571
  • 19:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P81563 and previous config saved to /var/cache/conftool/dbconfig/20250819-193045-ladsgroup.json
  • 19:30 bking@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=search,name=eqiad
  • 19:24 Dreamy_Jazz: Running `/usr/local/bin/foreachwikiindblist group1.dblist extensions/MediaModeration/maintenance/importExistingFilesToScanTable.php --force --start-timestamp "20230701010101"`
  • 19:23 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 19:17 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 19:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T402010)', diff saved to https://phabricator.wikimedia.org/P81562 and previous config saved to /var/cache/conftool/dbconfig/20250819-191537-ladsgroup.json
  • 19:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T402010)', diff saved to https://phabricator.wikimedia.org/P81561 and previous config saved to /var/cache/conftool/dbconfig/20250819-191311-ladsgroup.json
  • 19:13 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 19:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 19:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T402010)', diff saved to https://phabricator.wikimedia.org/P81560 and previous config saved to /var/cache/conftool/dbconfig/20250819-191204-ladsgroup.json
  • 18:58 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P81559 and previous config saved to /var/cache/conftool/dbconfig/20250819-185656-ladsgroup.json
  • 18:54 damilare: donorwiki upgraded from 373eb362 to 5dcb98fd
  • 18:47 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:44 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 18:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P81558 and previous config saved to /var/cache/conftool/dbconfig/20250819-184149-ladsgroup.json
  • 18:39 swfrench@deploy1003: Finished scap sync-world: No-code-changes scap sync-world with new helmfile values - T401721 (duration: 06m 28s)
  • 18:38 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 18:36 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:36 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:34 swfrench@deploy1003: Started scap sync-world: No-code-changes scap sync-world with new helmfile values - T401721
  • 18:32 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:28 swfrench@deploy1003: Stopping before sync operations
  • 18:27 swfrench@deploy1003: Started scap sync-world: Non-deploy scap run to verify image build and dependent helmfile values - T401721
  • 18:26 jhathaway@cumin1002: START - Cookbook sre.hosts.provision for host an-test-coord1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
  • 18:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T402010)', diff saved to https://phabricator.wikimedia.org/P81557 and previous config saved to /var/cache/conftool/dbconfig/20250819-182642-ladsgroup.json
  • 18:26 dancy@deploy1003: Installation of scap version "4.206.0" completed for 2 hosts
  • 18:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T402010)', diff saved to https://phabricator.wikimedia.org/P81556 and previous config saved to /var/cache/conftool/dbconfig/20250819-182419-ladsgroup.json
  • 18:24 dancy@deploy1003: Installing scap version "4.206.0" for 2 host(s)
  • 18:24 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 18:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T402010)', diff saved to https://phabricator.wikimedia.org/P81555 and previous config saved to /var/cache/conftool/dbconfig/20250819-182356-ladsgroup.json
  • 18:22 mutante: gerrit - deactivated user Keccake256 for spam-like comments and edits on commons
  • 18:18 jhathaway@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 18:11 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host an-test-coord1002.eqiad.wmnet with OS bookworm
  • 18:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P81554 and previous config saved to /var/cache/conftool/dbconfig/20250819-180848-ladsgroup.json
  • 18:04 rzl@deploy1003: Finished scap sync-world: https://gerrit.wikimedia.org/r/1174872 (duration: 07m 51s)
  • 18:00 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on an-test-coord1002.eqiad.wmnet with reason: supermicro
  • 17:59 rzl@deploy1003: rzl: Continuing with sync
  • 17:58 rzl@deploy1003: rzl: https://gerrit.wikimedia.org/r/1174872 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:57 rzl@deploy1003: Started scap sync-world: https://gerrit.wikimedia.org/r/1174872
  • 17:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P81553 and previous config saved to /var/cache/conftool/dbconfig/20250819-175340-ladsgroup.json
  • 17:49 jgleeson: process-control config revision changed from 80aab41e to 45c6fa38
  • 17:39 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T402010)', diff saved to https://phabricator.wikimedia.org/P81552 and previous config saved to /var/cache/conftool/dbconfig/20250819-173833-ladsgroup.json
  • 17:38 zoe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 17:38 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 17:37 zoe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 17:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T402010)', diff saved to https://phabricator.wikimedia.org/P81551 and previous config saved to /var/cache/conftool/dbconfig/20250819-173709-ladsgroup.json
  • 17:37 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 17:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T402010)', diff saved to https://phabricator.wikimedia.org/P81550 and previous config saved to /var/cache/conftool/dbconfig/20250819-173646-ladsgroup.json
  • 17:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P81548 and previous config saved to /var/cache/conftool/dbconfig/20250819-172139-ladsgroup.json
  • 17:17 swfrench@deploy1003: Finished scap sync-world: No-op deployment to introduce new build report metadata - T401721 (duration: 02m 52s)
  • 17:15 swfrench@deploy1003: Started scap sync-world: No-op deployment to introduce new build report metadata - T401721
  • 17:12 mszabo@deploy1003: Finished scap sync-world: Backport for AbuseFilterHooks: Handle IP user performers without actor records (T402298) (duration: 07m 38s)
  • 17:10 mutante: phab2002/phab1004 - systemctl restart php7.4-fpm after we increased APCu shared memory segment size (T401157)
  • 17:07 mszabo@deploy1003: kharlan, mszabo: Continuing with sync
  • 17:07 mszabo@deploy1003: kharlan, mszabo: Backport for AbuseFilterHooks: Handle IP user performers without actor records (T402298) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P81546 and previous config saved to /var/cache/conftool/dbconfig/20250819-170632-ladsgroup.json
  • 17:05 mszabo@deploy1003: Started scap sync-world: Backport for AbuseFilterHooks: Handle IP user performers without actor records (T402298)
  • 16:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T402010)', diff saved to https://phabricator.wikimedia.org/P81545 and previous config saved to /var/cache/conftool/dbconfig/20250819-165124-ladsgroup.json
  • 16:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T402010)', diff saved to https://phabricator.wikimedia.org/P81544 and previous config saved to /var/cache/conftool/dbconfig/20250819-165015-ladsgroup.json
  • 16:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 16:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T402010)', diff saved to https://phabricator.wikimedia.org/P81543 and previous config saved to /var/cache/conftool/dbconfig/20250819-165003-ladsgroup.json
  • 16:48 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 16:48 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 16:45 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 16:39 mszabo@deploy1003: Sync cancelled.
  • 16:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P81542 and previous config saved to /var/cache/conftool/dbconfig/20250819-163455-ladsgroup.json
  • 16:32 mszabo@deploy1003: mszabo, kharlan: Backport for AbuseFilterHooks: Handle IP user performers without actor records (T402298) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:30 mszabo@deploy1003: Started scap sync-world: Backport for AbuseFilterHooks: Handle IP user performers without actor records (T402298)
  • 16:23 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1042.eqiad.wmnet with reason: host reimage
  • 16:20 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1042.eqiad.wmnet with reason: host reimage
  • 16:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P81541 and previous config saved to /var/cache/conftool/dbconfig/20250819-161948-ladsgroup.json
  • 16:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T402010)', diff saved to https://phabricator.wikimedia.org/P81540 and previous config saved to /var/cache/conftool/dbconfig/20250819-160439-ladsgroup.json
  • 16:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T402010)', diff saved to https://phabricator.wikimedia.org/P81539 and previous config saved to /var/cache/conftool/dbconfig/20250819-160230-ladsgroup.json
  • 16:02 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 16:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T402010)', diff saved to https://phabricator.wikimedia.org/P81538 and previous config saved to /var/cache/conftool/dbconfig/20250819-160218-ladsgroup.json
  • 15:50 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 15:49 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P81536 and previous config saved to /var/cache/conftool/dbconfig/20250819-154711-ladsgroup.json
  • 15:35 vriley@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:35 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1042
  • 15:34 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1042
  • 15:33 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P81535 and previous config saved to /var/cache/conftool/dbconfig/20250819-153203-ladsgroup.json
  • 15:31 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 15:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T402010)', diff saved to https://phabricator.wikimedia.org/P81534 and previous config saved to /var/cache/conftool/dbconfig/20250819-153015-fceratto.json
  • 15:30 jgleeson: SmashPig upgraded from 7586e8df to ebb6e309
  • 15:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T402010)', diff saved to https://phabricator.wikimedia.org/P81533 and previous config saved to /var/cache/conftool/dbconfig/20250819-152743-fceratto.json
  • 15:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 15:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T402010)', diff saved to https://phabricator.wikimedia.org/P81532 and previous config saved to /var/cache/conftool/dbconfig/20250819-152720-fceratto.json
  • 15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T402010)', diff saved to https://phabricator.wikimedia.org/P81531 and previous config saved to /var/cache/conftool/dbconfig/20250819-151656-ladsgroup.json
  • 15:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T402010)', diff saved to https://phabricator.wikimedia.org/P81530 and previous config saved to /var/cache/conftool/dbconfig/20250819-151446-ladsgroup.json
  • 15:14 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 15:14 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 15:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T402010)', diff saved to https://phabricator.wikimedia.org/P81529 and previous config saved to /var/cache/conftool/dbconfig/20250819-151354-ladsgroup.json
  • 15:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P81528 and previous config saved to /var/cache/conftool/dbconfig/20250819-151213-fceratto.json
  • 15:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1020.eqiad.wmnet with OS bullseye
  • 15:08 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:06 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:04 brennen@deploy1003: Finished deploy [phabricator/deployment@22fcde9]: deploy phab1004 for T402309 (duration: 00m 39s)
  • 15:03 brennen@deploy1003: Started deploy [phabricator/deployment@22fcde9]: deploy phab1004 for T402309
  • 15:03 brennen@deploy1003: Finished deploy [phabricator/deployment@22fcde9]: deploy phab2002 for T402309 (duration: 00m 42s)
  • 15:02 brennen@deploy1003: Started deploy [phabricator/deployment@22fcde9]: deploy phab2002 for T402309
  • 15:01 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2002.codfw.wmnet,phab[1004-1005].eqiad.wmnet with reason: T402309
  • 14:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P81527 and previous config saved to /var/cache/conftool/dbconfig/20250819-145847-ladsgroup.json
  • 14:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P81526 and previous config saved to /var/cache/conftool/dbconfig/20250819-145706-fceratto.json
  • 14:56 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 14:55 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 14:54 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 14:53 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 14:53 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 14:53 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 14:52 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
  • 14:51 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
  • 14:48 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1020.eqiad.wmnet with reason: host reimage
  • 14:47 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 14:45 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:45 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 14:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1020.eqiad.wmnet with reason: host reimage
  • 14:44 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 14:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P81525 and previous config saved to /var/cache/conftool/dbconfig/20250819-144339-ladsgroup.json
  • 14:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T402010)', diff saved to https://phabricator.wikimedia.org/P81524 and previous config saved to /var/cache/conftool/dbconfig/20250819-144158-fceratto.json
  • 14:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T402010)', diff saved to https://phabricator.wikimedia.org/P81523 and previous config saved to /var/cache/conftool/dbconfig/20250819-143926-fceratto.json
  • 14:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:39 dreamyjazz@deploy1003: Finished scap sync-world: Backport for UserInfoCard: Link to metawiki for Special:CentralAuth links (T397690) (duration: 13m 09s)
  • 14:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T402010)', diff saved to https://phabricator.wikimedia.org/P81522 and previous config saved to /var/cache/conftool/dbconfig/20250819-143903-fceratto.json
  • 14:37 Dreamy_Jazz: Running `/usr/local/bin/foreachwikiindblist group2.dblist extensions/MediaModeration/maintenance/importExistingFilesToScanTable.php --force --start-timestamp "20230701010101"`
  • 14:37 Dreamy_Jazz: Running `/usr/local/bin/foreachwikiindblist group0.dblist extensions/MediaModeration/maintenance/importExistingFilesToScanTable.php --force --start-timestamp "20230701010101"`
  • 14:31 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 14:30 _joe_: running requestctl-admin upgrade-schema pattern on alert1002
  • 14:30 dreamyjazz@deploy1003: dreamyjazz: Backport for UserInfoCard: Link to metawiki for Special:CentralAuth links (T397690) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T402010)', diff saved to https://phabricator.wikimedia.org/P81521 and previous config saved to /var/cache/conftool/dbconfig/20250819-142832-ladsgroup.json
  • 14:26 dreamyjazz@deploy1003: Started scap sync-world: Backport for UserInfoCard: Link to metawiki for Special:CentralAuth links (T397690)
  • 14:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T402010)', diff saved to https://phabricator.wikimedia.org/P81519 and previous config saved to /var/cache/conftool/dbconfig/20250819-142514-ladsgroup.json
  • 14:25 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T402010)', diff saved to https://phabricator.wikimedia.org/P81518 and previous config saved to /var/cache/conftool/dbconfig/20250819-142420-ladsgroup.json
  • 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P81517 and previous config saved to /var/cache/conftool/dbconfig/20250819-142355-fceratto.json
  • 14:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:11 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:10 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P81516 and previous config saved to /var/cache/conftool/dbconfig/20250819-140913-ladsgroup.json
  • 14:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1017.eqiad.wmnet with OS bullseye
  • 14:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P81515 and previous config saved to /var/cache/conftool/dbconfig/20250819-140848-fceratto.json
  • 14:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1018.eqiad.wmnet with OS bullseye
  • 14:05 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:05 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:01 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 14:01 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1019.eqiad.wmnet with OS bullseye
  • 13:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:56 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P81514 and previous config saved to /var/cache/conftool/dbconfig/20250819-135405-ladsgroup.json
  • 13:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T402010)', diff saved to https://phabricator.wikimedia.org/P81513 and previous config saved to /var/cache/conftool/dbconfig/20250819-135340-fceratto.json
  • 13:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T402010)', diff saved to https://phabricator.wikimedia.org/P81511 and previous config saved to /var/cache/conftool/dbconfig/20250819-135112-fceratto.json
  • 13:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 13:49 _joe_: systemctl reload varnish-frontend.service on cp4039
  • 13:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1017.eqiad.wmnet with reason: host reimage
  • 13:43 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:43 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1018.eqiad.wmnet with reason: host reimage
  • 13:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-fe1020.eqiad.wmnet with OS bullseye
  • 13:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:39 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1017.eqiad.wmnet with reason: host reimage
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T402010)', diff saved to https://phabricator.wikimedia.org/P81510 and previous config saved to /var/cache/conftool/dbconfig/20250819-133858-ladsgroup.json
  • 13:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1019.eqiad.wmnet with reason: host reimage
  • 13:36 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1018.eqiad.wmnet with reason: host reimage
  • 13:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T402010)', diff saved to https://phabricator.wikimedia.org/P81509 and previous config saved to /var/cache/conftool/dbconfig/20250819-133537-ladsgroup.json
  • 13:35 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 13:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T402010)', diff saved to https://phabricator.wikimedia.org/P81508 and previous config saved to /var/cache/conftool/dbconfig/20250819-133515-ladsgroup.json
  • 13:35 kart_: Updated Recommendation API to 2025-07-25-064834-production (T399117)
  • 13:34 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1044.eqiad.wmnet with OS bookworm
  • 13:34 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 13:34 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1003"
  • 13:33 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1019.eqiad.wmnet with reason: host reimage
  • 13:33 kartik@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P81507 and previous config saved to /var/cache/conftool/dbconfig/20250819-132007-ladsgroup.json
  • 13:19 kartik@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:14 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1044.eqiad.wmnet with reason: host reimage
  • 13:13 kartik@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 13:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-fe1017.eqiad.wmnet with OS bullseye
  • 13:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-fe1018.eqiad.wmnet with OS bullseye
  • 13:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 13:10 vriley@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1044.eqiad.wmnet with reason: host reimage
  • 13:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ms-fe1019.eqiad.wmnet with OS bullseye
  • 13:06 moritzm: restart slapd on main LDAP r/w servers hosts to pick up GNU TLS security updates
  • 13:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P81506 and previous config saved to /var/cache/conftool/dbconfig/20250819-130500-ladsgroup.json
  • 13:03 moritzm: restart FPM on Phabricator hosts to pick up GNU TLS security updates
  • 13:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:02 moritzm: restart Exim on Phabricator hosts to pick up GNU TLS security updates
  • 13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas-codfw
  • 12:59 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:59 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas-codfw
  • 12:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1019.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T402010)', diff saved to https://phabricator.wikimedia.org/P81505 and previous config saved to /var/cache/conftool/dbconfig/20250819-124952-ladsgroup.json
  • 12:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T402010)', diff saved to https://phabricator.wikimedia.org/P81504 and previous config saved to /var/cache/conftool/dbconfig/20250819-124731-ladsgroup.json
  • 12:47 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T402010)', diff saved to https://phabricator.wikimedia.org/P81503 and previous config saved to /var/cache/conftool/dbconfig/20250819-124709-ladsgroup.json
  • 12:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:43 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host ms-fe1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:37 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1044.eqiad.wmnet with OS bookworm
  • 12:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P81502 and previous config saved to /var/cache/conftool/dbconfig/20250819-123201-ladsgroup.json
  • 12:28 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1019.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host ms-fe1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:18 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for - jclark@cumin1002"
  • 12:18 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for - jclark@cumin1002"
  • 12:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P81501 and previous config saved to /var/cache/conftool/dbconfig/20250819-121654-ladsgroup.json
  • 12:15 vriley@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:15 moritzm: installing gnutls28 security updates on bullseye
  • 12:14 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:06 hashar: Restarting Jenkins
  • 12:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T402010)', diff saved to https://phabricator.wikimedia.org/P81500 and previous config saved to /var/cache/conftool/dbconfig/20250819-120147-ladsgroup.json
  • 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T402010)', diff saved to https://phabricator.wikimedia.org/P81499 and previous config saved to /var/cache/conftool/dbconfig/20250819-115926-ladsgroup.json
  • 11:59 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T402010)', diff saved to https://phabricator.wikimedia.org/P81498 and previous config saved to /var/cache/conftool/dbconfig/20250819-115915-ladsgroup.json
  • 11:51 moritzm: installing openjdk-21 security updates
  • 11:45 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2204* gradually with 4 steps - Upgraded MariaDB
  • 11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P81497 and previous config saved to /var/cache/conftool/dbconfig/20250819-114407-ladsgroup.json
  • 11:38 moritzm: uploaded openjdk-21 21.0.8+9-1~deb12u1 to bookworm-wikimedia (backport of latest security release)
  • 11:37 vriley@cumin1003: START - Cookbook sre.hosts.reimage for host cloudcephosd1044.eqiad.wmnet with OS bookworm
  • 11:35 vriley@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P81496 and previous config saved to /var/cache/conftool/dbconfig/20250819-112900-ladsgroup.json
  • 11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T402010)', diff saved to https://phabricator.wikimedia.org/P81495 and previous config saved to /var/cache/conftool/dbconfig/20250819-111353-ladsgroup.json
  • 11:13 vriley@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:11 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1044
  • 11:11 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1044
  • 11:10 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T402010)', diff saved to https://phabricator.wikimedia.org/P81494 and previous config saved to /var/cache/conftool/dbconfig/20250819-110931-ladsgroup.json
  • 11:09 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 11:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T402010)', diff saved to https://phabricator.wikimedia.org/P81493 and previous config saved to /var/cache/conftool/dbconfig/20250819-110909-ladsgroup.json
  • 11:08 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 11:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1003.eqiad.wmnet with OS trixie
  • 11:02 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:00 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2204* gradually with 4 steps - Upgraded MariaDB
  • 10:56 vriley@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P81491 and previous config saved to /var/cache/conftool/dbconfig/20250819-105401-ladsgroup.json
  • 10:52 vriley@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:51 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2204.codfw.wmnet
  • 10:47 vriley@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:46 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 10:46 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1003.eqiad.wmnet with reason: host reimage
  • 10:45 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1044
  • 10:45 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1044
  • 10:45 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2204 - Upgrading db2204.codfw.wmnet
  • 10:45 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2204 - Upgrading db2204.codfw.wmnet
  • 10:45 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2204.codfw.wmnet
  • 10:45 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:45 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1044 - vriley@cumin1003"
  • 10:44 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1044 - vriley@cumin1003"
  • 10:40 vriley@cumin1003: START - Cookbook sre.dns.netbox
  • 10:39 moritzm: installing openjdk-17 security updates
  • 10:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P81489 and previous config saved to /var/cache/conftool/dbconfig/20250819-103854-ladsgroup.json
  • 10:34 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2207 to s2 primary T402276', diff saved to https://phabricator.wikimedia.org/P81488 and previous config saved to /var/cache/conftool/dbconfig/20250819-103402-fceratto.json
  • 10:33 federico3: Starting s2 codfw failover from db2204 to db2207 - T402276
  • 10:33 stevemunene@dns1004: END - running authdns-update
  • 10:31 stevemunene@dns1004: START - running authdns-update
  • 10:24 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2207 with weight 0 T402276', diff saved to https://phabricator.wikimedia.org/P81487 and previous config saved to /var/cache/conftool/dbconfig/20250819-102414-fceratto.json
  • 10:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T402010)', diff saved to https://phabricator.wikimedia.org/P81486 and previous config saved to /var/cache/conftool/dbconfig/20250819-102346-ladsgroup.json
  • 10:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T402276
  • 10:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T402010)', diff saved to https://phabricator.wikimedia.org/P81485 and previous config saved to /var/cache/conftool/dbconfig/20250819-102126-ladsgroup.json
  • 10:21 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 10:20 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host sretest1003.eqiad.wmnet with OS trixie
  • 10:20 claime: Fractional routing support for rest API deployed - T400131
  • 08:56 jnuche@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.15 refs T396376
  • 08:22 aikochou@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 07:37 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.15 refs T396376
  • 07:33 kart_: Updated cxserver to 2025-08-14-134810-production (T399117, T393705)
  • 07:33 kartik@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 07:32 kartik@deploy1003: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 07:31 kartik@deploy1003: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 07:30 kartik@deploy1003: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 07:27 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 07:27 kartik@deploy1003: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 07:15 kartik@deploy1003: Finished scap sync-world: Backport for Content Translation: Remove unused configuration parameter (T400671) (duration: 12m 26s)
  • 07:10 kartik@deploy1003: kartik: Continuing with sync
  • 07:04 kartik@deploy1003: kartik: Backport for Content Translation: Remove unused configuration parameter (T400671) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:02 kartik@deploy1003: Started scap sync-world: Backport for Content Translation: Remove unused configuration parameter (T400671)
  • 04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.12 (duration: 04m 23s)
  • 02:57 eileen: civicrm upgraded from 25bc1c7b to 51580e2e
  • 01:03 maryum: undeploy security fix for T397396
  • 00:23 denisse: Clearing corrupted logs on ms-be1071 - T402247
  • 00:11 zabe@deploy1003: Finished scap sync-world: Backport for Stop writing to cl_to and cl_collation on a few large wikis (T399579) (duration: 37m 15s)

2025-08-18

  • 23:59 zabe@deploy1003: zabe: Continuing with sync
  • 23:55 zabe@deploy1003: zabe: Backport for Stop writing to cl_to and cl_collation on a few large wikis (T399579) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 23:34 zabe@deploy1003: Started scap sync-world: Backport for Stop writing to cl_to and cl_collation on a few large wikis (T399579)
  • 23:26 zabe@deploy1003: sync-world aborted: Backport for Stop writing to cl_to and cl_collation on a few large wikis (T399579) (duration: 02m 11s)
  • 23:24 zabe@deploy1003: Started scap sync-world: Backport for Stop writing to cl_to and cl_collation on a few large wikis (T399579)
  • 23:19 zabe@deploy1003: sync-world aborted: Backport for Stop writing to cl_to and cl_collation on a few large wikis (T399579) (duration: 01m 49s)
  • 23:18 zabe@deploy1003: Started scap sync-world: Backport for Stop writing to cl_to and cl_collation on a few large wikis (T399579)
  • 23:17 zabe@deploy1003: sync-world aborted: Backport for Stop writing to cl_to and cl_collation on a few large wikis (T399579) (duration: 13m 48s)
  • 23:03 zabe@deploy1003: Started scap sync-world: Backport for Stop writing to cl_to and cl_collation on a few large wikis (T399579)
  • 23:03 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1071.eqiad.wmnet with reason: vacuum overlarge container dbs
  • 22:40 Daimona: Manually dropping DB rows in wikishared causing fatals # T402239#11096385
  • 22:36 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1071.eqiad.wmnet with reason: vacuum overlarge container dbs
  • 22:36 maryum: Deployed security patches for several extensions
  • 22:31 maryum: Deployed security fix for T402075
  • 22:01 sbassett: Removed primary mitigation for T400697
  • 21:58 bking@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=search,name=eqiad
  • 21:53 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics[1072-1077].eqiad.wmnet
  • 21:53 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:53 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics[1072-1077].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 21:53 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: analytics[1072-1077].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 21:53 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1007.eqiad.wmnet with OS bookworm
  • 21:53 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 21:50 dancy@deploy1003: Installation of scap version "4.205.0" completed for 2 hosts
  • 21:48 dancy@deploy1003: Installing scap version "4.205.0" for 2 host(s)
  • 21:45 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 21:42 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 21:35 bking@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 55 hosts with reason: T395571
  • 21:33 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply logging config change - bking@cumin1002 - T395571
  • 21:19 dancy@deploy1003: Sync cancelled.
  • 21:18 dancy@deploy1003: dancy: testing synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:17 dancy@deploy1003: sync-world aborted: testing (duration: 04m 48s)
  • 21:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T399249)', diff saved to https://phabricator.wikimedia.org/P81480 and previous config saved to /var/cache/conftool/dbconfig/20250818-211234-fceratto.json
  • 21:12 dancy@deploy1003: Started scap sync-world: testing
  • 20:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P81479 and previous config saved to /var/cache/conftool/dbconfig/20250818-205726-fceratto.json
  • 20:57 dancy@deploy1003: Installation of scap version "4.203.0" completed for 169 hosts
  • 20:53 dancy@deploy1003: Installing scap version "4.203.0" for 169 host(s)
  • 20:44 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-backup-datanode1033.eqiad.wmnet with OS bookworm
  • 20:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P81478 and previous config saved to /var/cache/conftool/dbconfig/20250818-204219-fceratto.json
  • 20:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T399249)', diff saved to https://phabricator.wikimedia.org/P81477 and previous config saved to /var/cache/conftool/dbconfig/20250818-202712-fceratto.json
  • 20:26 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T399249)', diff saved to https://phabricator.wikimedia.org/P81476 and previous config saved to /var/cache/conftool/dbconfig/20250818-202602-fceratto.json
  • 20:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 20:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T399249)', diff saved to https://phabricator.wikimedia.org/P81475 and previous config saved to /var/cache/conftool/dbconfig/20250818-202539-fceratto.json
  • 20:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P81474 and previous config saved to /var/cache/conftool/dbconfig/20250818-201031-fceratto.json
  • 19:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P81473 and previous config saved to /var/cache/conftool/dbconfig/20250818-195524-fceratto.json
  • 19:48 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 19:46 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy support for structured provenance patterns - swfrench@cumin2002 - T401430"
  • 19:46 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy support for structured provenance patterns - swfrench@cumin2002 - T401430
  • 19:45 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy support for structured provenance patterns - swfrench@cumin2002 - T401430
  • 19:45 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy support for structured provenance patterns - swfrench@cumin2002 - T401430"
  • 19:45 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 19:44 jhathaway@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on an-test-coord1002.eqiad.wmnet with reason: supermicro
  • 19:42 btullis@cumin1003: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts an-backup-datanode1032.eqiad.wmnet
  • 19:42 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-backup-datanode1032.eqiad.wmnet
  • 19:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T399249)', diff saved to https://phabricator.wikimedia.org/P81472 and previous config saved to /var/cache/conftool/dbconfig/20250818-194017-fceratto.json
  • 19:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T399249)', diff saved to https://phabricator.wikimedia.org/P81471 and previous config saved to /var/cache/conftool/dbconfig/20250818-193907-fceratto.json
  • 19:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 19:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T399249)', diff saved to https://phabricator.wikimedia.org/P81470 and previous config saved to /var/cache/conftool/dbconfig/20250818-193844-fceratto.json
  • 19:33 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1007.eqiad.wmnet with reason: host reimage
  • 19:30 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1007.eqiad.wmnet with reason: host reimage
  • 19:24 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_eqiad: apply logging config change - bking@cumin1002 - T395571
  • 19:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P81469 and previous config saved to /var/cache/conftool/dbconfig/20250818-192337-fceratto.json
  • 19:22 bking@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 55 hosts with reason: T395571
  • 19:16 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts analytics[1072-1077].eqiad.wmnet
  • 19:15 bking@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=search,name=eqiad
  • 19:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1033.eqiad.wmnet with OS bookworm
  • 19:13 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-backup-datanode1033.eqiad.wmnet with OS bookworm
  • 19:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P81467 and previous config saved to /var/cache/conftool/dbconfig/20250818-190830-fceratto.json
  • 19:07 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1007.eqiad.wmnet with OS bookworm
  • 19:05 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1007
  • 19:04 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1033
  • 19:04 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1007
  • 19:02 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1033
  • 19:02 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:02 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed analytics1071 to an-backup-datanode1007 - btullis@cumin1003"
  • 19:02 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed analytics1071 to an-backup-datanode1007 - btullis@cumin1003"
  • 18:56 stevemunene@cumin1003: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 18:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T399249)', diff saved to https://phabricator.wikimedia.org/P81466 and previous config saved to /var/cache/conftool/dbconfig/20250818-185322-fceratto.json
  • 18:51 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T399249)', diff saved to https://phabricator.wikimedia.org/P81465 and previous config saved to /var/cache/conftool/dbconfig/20250818-185111-fceratto.json
  • 18:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 18:50 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 18:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T399249)', diff saved to https://phabricator.wikimedia.org/P81464 and previous config saved to /var/cache/conftool/dbconfig/20250818-185031-fceratto.json
  • 18:50 swfrench@deploy1003: Finished scap sync-world: Test deploy to investigate spurious full builds (duration: 02m 19s)
  • 18:48 swfrench@deploy1003: Started scap sync-world: Test deploy to investigate spurious full builds
  • 18:35 stevemunene@cumin1003: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 18:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P81462 and previous config saved to /var/cache/conftool/dbconfig/20250818-183524-fceratto.json
  • 18:33 stevemunene@cumin1003: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 18:33 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply logging config change - bking@cumin1002 - T395571
  • 18:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P81461 and previous config saved to /var/cache/conftool/dbconfig/20250818-182017-fceratto.json
  • 18:20 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 18:15 stevemunene@cumin1003: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 18:15 stevemunene@cumin1003: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 18:14 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts analytics1071.eqiad.wmnet
  • 18:14 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:12 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 18:09 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1033.eqiad.wmnet with OS bookworm
  • 18:08 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1032.eqiad.wmnet with OS bookworm
  • 18:07 btullis@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-backup-datanode1033
  • 18:07 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1033
  • 18:07 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:07 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-backup-namenode1033 to an-backup-datanode1033 - btullis@cumin1003"
  • 18:07 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-backup-namenode1033 to an-backup-datanode1033 - btullis@cumin1003"
  • 18:07 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1006.eqiad.wmnet with OS bookworm
  • 18:07 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 18:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T399249)', diff saved to https://phabricator.wikimedia.org/P81460 and previous config saved to /var/cache/conftool/dbconfig/20250818-180509-fceratto.json
  • 18:04 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 18:04 damilare: SmashPig upgraded from 3d1d6f15 to 7586e8df
  • 18:03 swfrench@deploy1003: Finished scap sync-world: Deploy new images after verifying dependent helmfile values - T401721 (duration: 36m 38s)
  • 18:03 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T399249)', diff saved to https://phabricator.wikimedia.org/P81459 and previous config saved to /var/cache/conftool/dbconfig/20250818-180259-fceratto.json
  • 18:02 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 18:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T399249)', diff saved to https://phabricator.wikimedia.org/P81458 and previous config saved to /var/cache/conftool/dbconfig/20250818-180247-fceratto.json
  • 18:02 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts analytics1071.eqiad.wmnet
  • 18:01 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 17:52 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1032.eqiad.wmnet with reason: host reimage
  • 17:49 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1032.eqiad.wmnet with reason: host reimage
  • 17:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1006.eqiad.wmnet with reason: host reimage
  • 17:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P81457 and previous config saved to /var/cache/conftool/dbconfig/20250818-174740-fceratto.json
  • 17:44 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1006.eqiad.wmnet with reason: host reimage
  • 17:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P81456 and previous config saved to /var/cache/conftool/dbconfig/20250818-173232-fceratto.json
  • 17:30 swfrench@deploy1003: Started scap sync-world: Deploy new images after verifying dependent helmfile values - T401721
  • 17:25 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1035.eqiad.wmnet with OS bookworm
  • 17:25 stevemunene@cumin1003: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 17:24 swfrench@deploy1003: Stopping before sync operations
  • 17:23 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1032.eqiad.wmnet with OS bookworm
  • 17:21 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1034.eqiad.wmnet with OS bookworm
  • 17:21 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1006.eqiad.wmnet with OS bookworm
  • 17:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1005.eqiad.wmnet with OS bookworm
  • 17:19 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 17:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-backup-datanode1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T399249)', diff saved to https://phabricator.wikimedia.org/P81455 and previous config saved to /var/cache/conftool/dbconfig/20250818-171725-fceratto.json
  • 17:15 stevemunene@cumin1003: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 17:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T399249)', diff saved to https://phabricator.wikimedia.org/P81454 and previous config saved to /var/cache/conftool/dbconfig/20250818-171515-fceratto.json
  • 17:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 17:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T399249)', diff saved to https://phabricator.wikimedia.org/P81453 and previous config saved to /var/cache/conftool/dbconfig/20250818-171452-fceratto.json
  • 17:08 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1035.eqiad.wmnet with reason: host reimage
  • 17:05 swfrench@deploy1003: Started scap sync-world: Non-deploy scap run to verify image build and dependent helmfile values - T401721
  • 17:04 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1034.eqiad.wmnet with reason: host reimage
  • 17:04 dancy@deploy1003: Installation of scap version "4.202.0" completed for 2 hosts
  • 17:03 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1035.eqiad.wmnet with reason: host reimage
  • 17:02 dancy@deploy1003: Installing scap version "4.202.0" for 2 host(s)
  • 17:02 arlolra@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 17:01 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1034.eqiad.wmnet with reason: host reimage
  • 17:00 arlolra@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 16:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P81452 and previous config saved to /var/cache/conftool/dbconfig/20250818-165945-fceratto.json
  • 16:57 stevemunene@cumin1003: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 16:56 stevemunene@cumin1003: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 16:55 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 16:52 arlolra@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 16:51 arlolra@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 16:50 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply logging config change - bking@cumin1002 - T395571
  • 16:49 bking@cumin1002: conftool action : set/weight=10; selector: name=cirrussearch2091.
  • 16:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P81451 and previous config saved to /var/cache/conftool/dbconfig/20250818-164437-fceratto.json
  • 16:44 arlolra@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 16:42 bking@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply logging config change - bking@cumin1002 - T395571
  • 16:42 arlolra@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 16:40 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: apply logging config change - bking@cumin1002 - T395571
  • 16:39 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1005.eqiad.wmnet with reason: host reimage
  • 16:38 stevemunene@cumin1003: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 16:36 btullis@cumin1003: START - Cookbook sre.hosts.provision for host an-backup-datanode1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:35 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1035.eqiad.wmnet with OS bookworm
  • 16:35 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1034.eqiad.wmnet with OS bookworm
  • 16:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1035
  • 16:34 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:34 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 16:33 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1005.eqiad.wmnet with reason: host reimage
  • 16:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1035
  • 16:32 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1006
  • 16:32 rzl@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 16:31 rzl@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 16:31 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1006
  • 16:30 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:30 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-backup-namenode1035 to an-backup-datanode1035 - btullis@cumin1003"
  • 16:29 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-backup-namenode1035 to an-backup-datanode1035 - btullis@cumin1003"
  • 16:29 bking@cumin1002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch
  • 16:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T399249)', diff saved to https://phabricator.wikimedia.org/P81450 and previous config saved to /var/cache/conftool/dbconfig/20250818-162930-fceratto.json
  • 16:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T399249)', diff saved to https://phabricator.wikimedia.org/P81449 and previous config saved to /var/cache/conftool/dbconfig/20250818-162720-fceratto.json
  • 16:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 16:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T399249)', diff saved to https://phabricator.wikimedia.org/P81448 and previous config saved to /var/cache/conftool/dbconfig/20250818-162656-fceratto.json
  • 16:25 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 16:25 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1036.eqiad.wmnet with OS bookworm
  • 16:22 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts analytics1070.eqiad.wmnet
  • 16:22 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 16:21 bking@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch
  • 16:20 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1037.eqiad.wmnet with OS bookworm
  • 16:20 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 16:17 jdrewniak@deploy1003: Finished scap sync-world: Backport for Fix type declaration for nonexistent event cache (T401952) (duration: 11m 13s)
  • 16:13 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts analytics1070.eqiad.wmnet
  • 16:11 jdrewniak@deploy1003: daimona, jdrewniak: Continuing with sync
  • 16:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P81447 and previous config saved to /var/cache/conftool/dbconfig/20250818-161149-fceratto.json
  • 16:11 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1005.eqiad.wmnet with OS bookworm
  • 16:11 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-backup-datanode1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:08 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1036.eqiad.wmnet with reason: host reimage
  • 16:07 jdrewniak@deploy1003: daimona, jdrewniak: Backport for Fix type declaration for nonexistent event cache (T401952) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:06 jdrewniak@deploy1003: Started scap sync-world: Backport for Fix type declaration for nonexistent event cache (T401952)
  • 16:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1037.eqiad.wmnet with reason: host reimage
  • 16:01 btullis@cumin1003: START - Cookbook sre.hosts.provision for host an-backup-datanode1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:01 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply logging config change - bking@cumin1002 - T395571
  • 15:59 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1036.eqiad.wmnet with reason: host reimage
  • 15:57 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1005
  • 15:57 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1037.eqiad.wmnet with reason: host reimage
  • 15:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P81446 and previous config saved to /var/cache/conftool/dbconfig/20250818-155642-fceratto.json
  • 15:56 jdrewniak@deploy1003: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 49s)
  • 15:56 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1005
  • 15:55 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:55 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-worker1069 to an-backup-datanode1005 - btullis@cumin1003"
  • 15:55 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-worker1069 to an-backup-datanode1005 - btullis@cumin1003"
  • 15:54 jdrewniak@deploy1003: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 34s)
  • 15:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T399249)', diff saved to https://phabricator.wikimedia.org/P81445 and previous config saved to /var/cache/conftool/dbconfig/20250818-154134-fceratto.json
  • 15:40 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T399249)', diff saved to https://phabricator.wikimedia.org/P81444 and previous config saved to /var/cache/conftool/dbconfig/20250818-154024-fceratto.json
  • 15:40 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 15:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T399249)', diff saved to https://phabricator.wikimedia.org/P81443 and previous config saved to /var/cache/conftool/dbconfig/20250818-154012-fceratto.json
  • 15:36 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 15:33 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1069.eqiad.wmnet
  • 15:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1069.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 15:33 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1069.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 15:32 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply logging config change - bking@cumin1002 - T395571
  • 15:31 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1036.eqiad.wmnet with OS bookworm
  • 15:31 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1037.eqiad.wmnet with OS bookworm
  • 15:31 bking@cumin1002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: apply logging config change - bking@cumin1002 - T395571
  • 15:30 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1004.eqiad.wmnet with OS bookworm
  • 15:30 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 15:30 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-backup-datanode1037.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P81442 and previous config saved to /var/cache/conftool/dbconfig/20250818-152505-fceratto.json
  • 15:24 bking@cumin1002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: apply logging config change - bking@cumin1002 - T395571
  • 15:17 logmsgbot: mszabo Deployed security patch for T400892
  • 15:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P81441 and previous config saved to /var/cache/conftool/dbconfig/20250818-150958-fceratto.json
  • 15:05 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 15:05 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 15:01 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 15:00 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 15:00 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 14:59 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 14:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T399249)', diff saved to https://phabricator.wikimedia.org/P81440 and previous config saved to /var/cache/conftool/dbconfig/20250818-145450-fceratto.json
  • 14:52 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T399249)', diff saved to https://phabricator.wikimedia.org/P81439 and previous config saved to /var/cache/conftool/dbconfig/20250818-145240-fceratto.json
  • 14:52 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 14:50 tchin@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:49 tchin@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 14:48 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 14:47 tchin@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:46 tchin@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 14:45 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 14:45 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 14:45 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 14:45 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 14:40 tchin@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 14:39 tchin@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 14:39 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:sessionstore: Upgrading to Java 11.0.28 - eevans@cumin1002
  • 14:39 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 14:39 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 14:38 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:37 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:37 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 14:37 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 14:35 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1038.eqiad.wmnet with OS bookworm
  • 14:32 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1004.eqiad.wmnet with reason: host reimage
  • 14:28 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-eqiad
  • 14:27 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1004.eqiad.wmnet with reason: host reimage
  • 14:21 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:18 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1038.eqiad.wmnet with reason: host reimage
  • 14:14 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1069.eqiad.wmnet
  • 14:13 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1038.eqiad.wmnet with reason: host reimage
  • 14:10 btullis@cumin1003: START - Cookbook sre.hosts.provision for host an-backup-datanode1037.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:09 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-eqiad
  • 14:07 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1037
  • 14:06 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1037
  • 14:05 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:05 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-backup-namenode1037 to an-backup-datanode1037 - btullis@cumin1003"
  • 14:05 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-backup-namenode1037 to an-backup-datanode1037 - btullis@cumin1003"
  • 14:03 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:sessionstore: Upgrading to Java 11.0.28 - eevans@cumin1002
  • 14:03 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1004.eqiad.wmnet with OS bookworm
  • 14:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-backup-datanode1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:02 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 14:02 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 14:01 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:01 herron@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-logging-codfw
  • 13:56 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 13:56 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 13:55 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1003.eqiad.wmnet with OS bookworm
  • 13:55 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 13:52 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 13:49 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1039.eqiad.wmnet with OS bookworm
  • 13:49 btullis@cumin1003: START - Cookbook sre.hosts.provision for host an-backup-datanode1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:47 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1038.eqiad.wmnet with OS bookworm
  • 13:46 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1004
  • 13:46 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1040.eqiad.wmnet with OS bookworm
  • 13:45 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1004
  • 13:44 herron@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-logging-codfw
  • 13:42 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:42 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-worker1068 to an-backup-datanode1004 - btullis@cumin1003"
  • 13:42 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-worker1068 to an-backup-datanode1004 - btullis@cumin1003"
  • 13:37 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1003.eqiad.wmnet with reason: host reimage
  • 13:36 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 13:34 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 13:33 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1039.eqiad.wmnet with reason: host reimage
  • 13:30 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 13:29 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1040.eqiad.wmnet with reason: host reimage
  • 13:28 sukhe@dns1004: END - running authdns-update
  • 13:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1068.eqiad.wmnet
  • 13:27 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:27 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1068.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 13:27 sukhe@dns1004: START - running authdns-update
  • 13:26 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1039.eqiad.wmnet with reason: host reimage
  • 13:25 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1003.eqiad.wmnet with reason: host reimage
  • 13:24 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2179* gradually with 4 steps - Upgrade MariaDB
  • 13:24 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1068.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 13:24 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1040.eqiad.wmnet with reason: host reimage
  • 13:20 fceratto@cumin1002: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 8:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 13:15 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 13:14 moritzm: imported bird2 2.17.1+branch.mq.bgp.multilisten.c47b08a1524c-cznic.1 into component/bird-routed-ganeti for Bookworm T362392
  • 13:13 jforrester@deploy1003: Finished scap sync-world: Backport for [metawiki] Set site name to 'Meta-Wiki', not just 'Meta' (T399843), Enable IP Reveal on Special:AbuseLog, eswiki, commons, wikidatawiki: IP cap lift for wikipedia workshop on 2025-August-23 (T401745), [zhwikisource] Set noindex,nofollow for namespaces User and User Talk (T401070), [
  • 13:08 jforrester@deploy1003: mszwarc, eggroll97, gergesshamon, anzx, jforrester: Continuing with sync
  • 13:07 Lucas_WMDE: (cont.) User Talk (T401070)]], Add Oath log to bureaucrats (T401350) synced to the testservers
  • {{safesubst:SAL entry|1=13:06 jforrester@deploy1003: mszwarc, eggroll97, gergesshamon, anzx, jforrester: Backport for [metawiki] Set site name to 'Meta-Wiki', not just 'Meta' (T399843), Enable IP Reveal on Special:AbuseLog, eswiki, commons, wikidatawiki: IP cap lift for wikipedia workshop on 2025-August-23 (T401745), [[gerrit:1179242|[zhwikisource] Set noindex,nofollow for namespaces User an}}
  • {{safesubst:SAL entry|1=13:05 jforrester@deploy1003: Started scap sync-world: Backport for [metawiki] Set site name to 'Meta-Wiki', not just 'Meta' (T399843), Enable IP Reveal on Special:AbuseLog, eswiki, commons, wikidatawiki: IP cap lift for wikipedia workshop on 2025-August-23 (T401745), [zhwikisource] Set noindex,nofollow for namespaces User and User Talk (T401070), [[}}
  • 13:01 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1068.eqiad.wmnet
  • 12:59 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1003.eqiad.wmnet with OS bookworm
  • 12:59 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1039.eqiad.wmnet with OS bookworm
  • 12:58 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1040.eqiad.wmnet with OS bookworm
  • 12:58 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1002.eqiad.wmnet with OS bookworm
  • 12:58 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 12:55 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1041.eqiad.wmnet with OS bookworm
  • 12:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for Reduce default recentchanges query time on large wikis to 1 day (T399455) (duration: 12m 36s)
  • 12:45 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 12:42 ladsgroup@deploy1003: zabe, ladsgroup: Continuing with sync
  • 12:41 ladsgroup@deploy1003: zabe, ladsgroup: Backport for Reduce default recentchanges query time on large wikis to 1 day (T399455) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:40 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1042.eqiad.wmnet with OS bookworm
  • 12:38 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1041.eqiad.wmnet with reason: host reimage
  • 12:38 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2179* gradually with 4 steps - Upgrade MariaDB
  • 12:37 ladsgroup@deploy1003: Started scap sync-world: Backport for Reduce default recentchanges query time on large wikis to 1 day (T399455)
  • 12:36 moritzm: installing apache2 security updates
  • 12:35 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1041.eqiad.wmnet with reason: host reimage
  • 12:34 ladsgroup@deploy1003: Finished scap sync-world: Backport for Introduce rights for checking constraints (T401789), Check permission to check constraints (T401789) (duration: 38m 07s)
  • 12:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1002.eqiad.wmnet with reason: host reimage
  • 12:24 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-backup-datanode1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:23 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1042.eqiad.wmnet with reason: host reimage
  • 12:23 claime: sudo puppet cert clean push-notifications.discovery.wmnet - T402183
  • 12:23 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2179.codfw.wmnet
  • 12:21 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 12:21 btullis@cumin1003: START - Cookbook sre.hosts.provision for host an-backup-datanode1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:20 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1003
  • 12:19 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1003
  • 12:19 ladsgroup@deploy1003: ladsgroup: Backport for Introduce rights for checking constraints (T401789), Check permission to check constraints (T401789) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:19 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1002.eqiad.wmnet with reason: host reimage
  • 12:19 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:19 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-worker1067 to an-backup-datanode1003 - btullis@cumin1003"
  • 12:18 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-worker1067 to an-backup-datanode1003 - btullis@cumin1003"
  • 12:17 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1042.eqiad.wmnet with reason: host reimage
  • 12:16 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2179 - Upgrading db2179.codfw.wmnet
  • 12:15 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2179 - Upgrading db2179.codfw.wmnet
  • 12:15 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2179.codfw.wmnet
  • 12:09 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1041.eqiad.wmnet with OS bookworm
  • 12:08 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 12:06 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1043.eqiad.wmnet with OS bookworm
  • 12:04 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1067.eqiad.wmnet
  • 12:04 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:04 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 12:04 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1067.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 12:00 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 11:56 ladsgroup@deploy1003: Started scap sync-world: Backport for Introduce rights for checking constraints (T401789), Check permission to check constraints (T401789)
  • 11:54 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1067.eqiad.wmnet
  • 11:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1002.eqiad.wmnet with OS bookworm
  • 11:52 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-backup-datanode1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:50 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1042.eqiad.wmnet with OS bookworm
  • 11:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1043.eqiad.wmnet with reason: host reimage
  • 11:48 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:47 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:44 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1043.eqiad.wmnet with reason: host reimage
  • 11:32 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1044.eqiad.wmnet with OS bookworm
  • 11:31 btullis@cumin1003: START - Cookbook sre.hosts.provision for host an-backup-datanode1002.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:28 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-eqiad
  • 11:26 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-eqiad
  • 11:25 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1002
  • 11:23 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1002
  • 11:23 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:22 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-worker1066 to an-backup-datanode1002 - btullis@cumin1003"
  • 11:19 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1043.eqiad.wmnet with OS bookworm
  • 11:19 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renamed an-worker1066 to an-backup-datanode1002 - btullis@cumin1003"
  • 11:17 jmm@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe-codfw
  • 11:15 jmm@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe-codfw
  • 11:15 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1044.eqiad.wmnet with reason: host reimage
  • 11:14 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1045.eqiad.wmnet with OS bookworm
  • 11:11 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1044.eqiad.wmnet with reason: host reimage
  • 11:11 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 11:03 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-worker1066.eqiad.wmnet
  • 11:03 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:59 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 10:59 moritzm: installing libxml2 security updates
  • 10:57 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1045.eqiad.wmnet with reason: host reimage
  • 10:54 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1045.eqiad.wmnet with reason: host reimage
  • 10:51 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1066.eqiad.wmnet
  • 10:46 fceratto@cumin1002: dbctl commit (dc=all): 'db2179: configure API group', diff saved to https://phabricator.wikimedia.org/P81429 and previous config saved to /var/cache/conftool/dbconfig/20250818-104617-fceratto.json
  • 10:46 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1044.eqiad.wmnet with OS bookworm
  • 10:42 Amir1: dropped _echo_target_page_new in aawiki x1 (T399302)
  • 10:42 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2240 to s4 primary T402171', diff saved to https://phabricator.wikimedia.org/P81428 and previous config saved to /var/cache/conftool/dbconfig/20250818-104158-fceratto.json
  • 10:41 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1102 to an-backup-datanode1032
  • 10:40 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1032
  • 10:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 10:39 federico3: Starting s4 codfw failover from db2179 to db2240 - T402171
  • 10:36 ladsgroup@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 10:31 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1032
  • 10:31 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-datanode1032 on all recursors
  • 10:31 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-datanode1032 on all recursors
  • 10:31 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:31 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1102 to an-backup-datanode1032 - btullis@cumin1003"
  • 10:29 fceratto@cumin1002: dbctl commit (dc=all): 'Remove db2240 from API/vslow/dump T402171', diff saved to https://phabricator.wikimedia.org/P81427 and previous config saved to /var/cache/conftool/dbconfig/20250818-102921-fceratto.json
  • 10:29 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1045.eqiad.wmnet with OS bookworm
  • 10:28 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1102 to an-backup-datanode1032 - btullis@cumin1003"
  • 10:28 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2240 with weight 0 T402171', diff saved to https://phabricator.wikimedia.org/P81426 and previous config saved to /var/cache/conftool/dbconfig/20250818-102826-fceratto.json
  • 10:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T402171
  • 10:22 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 10:22 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 10:19 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1103 to an-backup-namenode1033
  • 10:19 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 10:18 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1102 to an-backup-datanode1032
  • 10:18 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-namenode1033
  • 10:17 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-namenode1033
  • 10:17 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-namenode1033 on all recursors
  • 10:17 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-namenode1033 on all recursors
  • 10:17 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:17 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1103 to an-backup-namenode1033 - btullis@cumin1003"
  • 10:16 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1104 to an-backup-datanode1034
  • 10:15 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1034
  • 10:15 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1103 to an-backup-namenode1033 - btullis@cumin1003"
  • 10:11 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 10:07 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 10:06 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1034
  • 10:06 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-datanode1034 on all recursors
  • 10:06 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-datanode1034 on all recursors
  • 10:06 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:06 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1104 to an-backup-datanode1034 - btullis@cumin1003"
  • 10:06 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1046.eqiad.wmnet with OS bookworm
  • 10:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti7004.magru.wmnet to cluster magru03 and group B
  • 10:05 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 10:05 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1104 to an-backup-datanode1034 - btullis@cumin1003"
  • 10:05 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1103 to an-backup-namenode1033
  • 10:01 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7004.magru.wmnet to cluster magru03 and group B
  • 09:52 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti7004.magru.wmnet to cluster magru03 and group B
  • 09:51 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti7004.magru.wmnet to cluster magru03 and group B
  • 09:51 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1105 to an-backup-namenode1035
  • 09:51 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-namenode1035
  • 09:49 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 09:49 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-namenode1035
  • 09:49 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-namenode1035 on all recursors
  • 09:49 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-namenode1035 on all recursors
  • 09:49 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:49 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1105 to an-backup-namenode1035 - btullis@cumin1003"
  • 09:49 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1104 to an-backup-datanode1034
  • 09:49 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1105 to an-backup-namenode1035 - btullis@cumin1003"
  • 09:49 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1046.eqiad.wmnet with reason: host reimage
  • 09:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1106 to an-backup-datanode1036
  • 09:47 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1036
  • 09:45 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1046.eqiad.wmnet with reason: host reimage
  • 09:43 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1036
  • 09:43 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-datanode1036 on all recursors
  • 09:43 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-datanode1036 on all recursors
  • 09:43 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:43 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1106 to an-backup-datanode1036 - btullis@cumin1003"
  • 09:43 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 09:42 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1106 to an-backup-datanode1036 - btullis@cumin1003"
  • 09:42 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1105 to an-backup-namenode1035
  • 09:40 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1107 to an-backup-namenode1037
  • 09:40 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-namenode1037
  • 09:38 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-namenode1037
  • 09:38 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-namenode1037 on all recursors
  • 09:38 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 09:38 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-namenode1037 on all recursors
  • 09:38 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:38 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1107 to an-backup-namenode1037 - btullis@cumin1003"
  • 09:38 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1107 to an-backup-namenode1037 - btullis@cumin1003"
  • 09:38 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1106 to an-backup-datanode1036
  • 09:35 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1108 to an-backup-datanode1038
  • 09:34 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1038
  • 09:33 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 09:33 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1038
  • 09:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-datanode1038 on all recursors
  • 09:33 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-datanode1038 on all recursors
  • 09:33 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1108 to an-backup-datanode1038 - btullis@cumin1003"
  • 09:33 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1108 to an-backup-datanode1038 - btullis@cumin1003"
  • 09:30 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1107 to an-backup-namenode1037
  • 09:28 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 09:28 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1108 to an-backup-datanode1038
  • 09:25 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1109 to an-backup-datanode1039
  • 09:24 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1039
  • 09:21 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1039
  • 09:21 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-datanode1039 on all recursors
  • 09:21 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-datanode1039 on all recursors
  • 09:21 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:21 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1109 to an-backup-datanode1039 - btullis@cumin1003"
  • 09:20 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1109 to an-backup-datanode1039 - btullis@cumin1003"
  • 09:19 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1046.eqiad.wmnet with OS bookworm
  • 09:19 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-backup-datanode1046.eqiad.wmnet with OS bookworm
  • 09:15 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1046.eqiad.wmnet with OS bookworm
  • 09:12 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 09:12 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1109 to an-backup-datanode1039
  • 09:05 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1110 to an-backup-datanode1040
  • 09:05 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1040
  • 09:03 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1040
  • 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-datanode1040 on all recursors
  • 09:03 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-datanode1040 on all recursors
  • 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:03 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1110 to an-backup-datanode1040 - btullis@cumin1003"
  • 08:52 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1110 to an-backup-datanode1040 - btullis@cumin1003"
  • 08:46 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 08:45 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1110 to an-backup-datanode1040
  • 08:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 08:45 hashar@deploy1003: Finished deploy [integration/docroot@af6fb25]: dev: Simplify router.php a bit (duration: 00m 13s)
  • 08:45 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 08:44 hashar@deploy1003: Started deploy [integration/docroot@af6fb25]: dev: Simplify router.php a bit
  • 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1111 to an-backup-datanode1041
  • 08:44 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1041
  • 08:43 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1041
  • 08:42 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-datanode1041 on all recursors
  • 08:42 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-datanode1041 on all recursors
  • 08:42 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:42 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1111 to an-backup-datanode1041 - btullis@cumin1003"
  • 08:40 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1111 to an-backup-datanode1041 - btullis@cumin1003"
  • 08:34 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 08:33 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1111 to an-backup-datanode1041
  • 07:56 kartik@deploy1003: Finished scap sync-world: Backport for Disable NewUserMessage extension on hiwiki (T402047) (duration: 14m 21s)
  • 07:49 kartik@deploy1003: kartik, dreamrimmer: Continuing with sync
  • 07:46 kartik@deploy1003: kartik, dreamrimmer: Backport for Disable NewUserMessage extension on hiwiki (T402047) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:43 dcausse: T401633: creating archive index on tlwikisource, zghwiktionary, rkiwiki, minwikibooks and madwikisource
  • 07:42 kartik@deploy1003: Started scap sync-world: Backport for Disable NewUserMessage extension on hiwiki (T402047)
  • 07:39 kartik@deploy1003: Finished scap sync-world: Backport for Make MT limit 80% on Welch Wikipedia (T385482) (duration: 37m 18s)
  • 07:26 kartik@deploy1003: kartik, wangombe: Continuing with sync
  • 07:24 kartik@deploy1003: kartik, wangombe: Backport for Make MT limit 80% on Welch Wikipedia (T385482) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:02 kartik@deploy1003: Started scap sync-world: Backport for Make MT limit 80% on Welch Wikipedia (T385482)
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 45s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-08-17

  • 05:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T400854)', diff saved to https://phabricator.wikimedia.org/P81422 and previous config saved to /var/cache/conftool/dbconfig/20250817-054629-ladsgroup.json
  • 05:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P81421 and previous config saved to /var/cache/conftool/dbconfig/20250817-053122-ladsgroup.json
  • 05:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P81420 and previous config saved to /var/cache/conftool/dbconfig/20250817-051615-ladsgroup.json
  • 05:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T400854)', diff saved to https://phabricator.wikimedia.org/P81419 and previous config saved to /var/cache/conftool/dbconfig/20250817-050107-ladsgroup.json
  • 04:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2240 (T400854)', diff saved to https://phabricator.wikimedia.org/P81418 and previous config saved to /var/cache/conftool/dbconfig/20250817-045722-ladsgroup.json
  • 04:57 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2240.codfw.wmnet with reason: Maintenance
  • 04:53 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 04:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T400854)', diff saved to https://phabricator.wikimedia.org/P81417 and previous config saved to /var/cache/conftool/dbconfig/20250817-045318-ladsgroup.json
  • 04:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P81416 and previous config saved to /var/cache/conftool/dbconfig/20250817-043811-ladsgroup.json
  • 04:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P81415 and previous config saved to /var/cache/conftool/dbconfig/20250817-042303-ladsgroup.json
  • 04:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T400854)', diff saved to https://phabricator.wikimedia.org/P81414 and previous config saved to /var/cache/conftool/dbconfig/20250817-040755-ladsgroup.json
  • 04:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T400854)', diff saved to https://phabricator.wikimedia.org/P81413 and previous config saved to /var/cache/conftool/dbconfig/20250817-040410-ladsgroup.json
  • 04:04 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 04:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T400854)', diff saved to https://phabricator.wikimedia.org/P81412 and previous config saved to /var/cache/conftool/dbconfig/20250817-040347-ladsgroup.json
  • 03:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P81411 and previous config saved to /var/cache/conftool/dbconfig/20250817-034839-ladsgroup.json
  • 03:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P81410 and previous config saved to /var/cache/conftool/dbconfig/20250817-033332-ladsgroup.json
  • 03:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T400854)', diff saved to https://phabricator.wikimedia.org/P81409 and previous config saved to /var/cache/conftool/dbconfig/20250817-031824-ladsgroup.json
  • 03:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2236 (T400854)', diff saved to https://phabricator.wikimedia.org/P81408 and previous config saved to /var/cache/conftool/dbconfig/20250817-031436-ladsgroup.json
  • 03:14 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2236.codfw.wmnet with reason: Maintenance
  • 03:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T400854)', diff saved to https://phabricator.wikimedia.org/P81407 and previous config saved to /var/cache/conftool/dbconfig/20250817-031414-ladsgroup.json
  • 02:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P81406 and previous config saved to /var/cache/conftool/dbconfig/20250817-025906-ladsgroup.json
  • 02:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P81405 and previous config saved to /var/cache/conftool/dbconfig/20250817-024359-ladsgroup.json
  • 02:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T400854)', diff saved to https://phabricator.wikimedia.org/P81404 and previous config saved to /var/cache/conftool/dbconfig/20250817-022851-ladsgroup.json
  • 02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T400854)', diff saved to https://phabricator.wikimedia.org/P81403 and previous config saved to /var/cache/conftool/dbconfig/20250817-022508-ladsgroup.json
  • 02:25 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 02:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T400854)', diff saved to https://phabricator.wikimedia.org/P81402 and previous config saved to /var/cache/conftool/dbconfig/20250817-022445-ladsgroup.json
  • 02:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P81401 and previous config saved to /var/cache/conftool/dbconfig/20250817-020937-ladsgroup.json
  • 01:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P81400 and previous config saved to /var/cache/conftool/dbconfig/20250817-015430-ladsgroup.json
  • 01:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T400854)', diff saved to https://phabricator.wikimedia.org/P81399 and previous config saved to /var/cache/conftool/dbconfig/20250817-013922-ladsgroup.json
  • 01:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T400854)', diff saved to https://phabricator.wikimedia.org/P81398 and previous config saved to /var/cache/conftool/dbconfig/20250817-013537-ladsgroup.json
  • 01:35 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 01:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T400854)', diff saved to https://phabricator.wikimedia.org/P81397 and previous config saved to /var/cache/conftool/dbconfig/20250817-013525-ladsgroup.json
  • 01:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P81396 and previous config saved to /var/cache/conftool/dbconfig/20250817-012017-ladsgroup.json
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 46s)
  • 01:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P81395 and previous config saved to /var/cache/conftool/dbconfig/20250817-010510-ladsgroup.json
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T400854)', diff saved to https://phabricator.wikimedia.org/P81394 and previous config saved to /var/cache/conftool/dbconfig/20250817-005002-ladsgroup.json
  • 00:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T400854)', diff saved to https://phabricator.wikimedia.org/P81393 and previous config saved to /var/cache/conftool/dbconfig/20250817-004616-ladsgroup.json
  • 00:46 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 00:42 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 00:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T400854)', diff saved to https://phabricator.wikimedia.org/P81392 and previous config saved to /var/cache/conftool/dbconfig/20250817-004231-ladsgroup.json
  • 00:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P81391 and previous config saved to /var/cache/conftool/dbconfig/20250817-002723-ladsgroup.json
  • 00:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P81390 and previous config saved to /var/cache/conftool/dbconfig/20250817-001216-ladsgroup.json

2025-08-16

  • 23:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T400854)', diff saved to https://phabricator.wikimedia.org/P81389 and previous config saved to /var/cache/conftool/dbconfig/20250816-235708-ladsgroup.json
  • 23:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T400854)', diff saved to https://phabricator.wikimedia.org/P81388 and previous config saved to /var/cache/conftool/dbconfig/20250816-235253-ladsgroup.json
  • 23:52 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 23:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T400854)', diff saved to https://phabricator.wikimedia.org/P81387 and previous config saved to /var/cache/conftool/dbconfig/20250816-235229-ladsgroup.json
  • 23:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P81386 and previous config saved to /var/cache/conftool/dbconfig/20250816-233722-ladsgroup.json
  • 23:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P81385 and previous config saved to /var/cache/conftool/dbconfig/20250816-232214-ladsgroup.json
  • 23:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T400854)', diff saved to https://phabricator.wikimedia.org/P81384 and previous config saved to /var/cache/conftool/dbconfig/20250816-230707-ladsgroup.json
  • 23:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T400854)', diff saved to https://phabricator.wikimedia.org/P81383 and previous config saved to /var/cache/conftool/dbconfig/20250816-230254-ladsgroup.json
  • 23:02 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 23:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T400854)', diff saved to https://phabricator.wikimedia.org/P81382 and previous config saved to /var/cache/conftool/dbconfig/20250816-230231-ladsgroup.json
  • 22:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P81381 and previous config saved to /var/cache/conftool/dbconfig/20250816-224723-ladsgroup.json
  • 22:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P81380 and previous config saved to /var/cache/conftool/dbconfig/20250816-223215-ladsgroup.json
  • 22:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T400854)', diff saved to https://phabricator.wikimedia.org/P81379 and previous config saved to /var/cache/conftool/dbconfig/20250816-221708-ladsgroup.json
  • 22:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T400854)', diff saved to https://phabricator.wikimedia.org/P81378 and previous config saved to /var/cache/conftool/dbconfig/20250816-221254-ladsgroup.json
  • 22:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 01:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T399249)', diff saved to https://phabricator.wikimedia.org/P81377 and previous config saved to /var/cache/conftool/dbconfig/20250816-013415-fceratto.json
  • 01:24 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2055.codfw.wmnet with OS bookworm
  • 01:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P81376 and previous config saved to /var/cache/conftool/dbconfig/20250816-011908-fceratto.json
  • 01:14 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2055.codfw.wmnet with reason: host reimage
  • 01:11 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2052.codfw.wmnet with OS bookworm
  • 01:11 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 01:11 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es2055.codfw.wmnet with reason: host reimage
  • 01:10 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 01:08 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host es2055.codfw.wmnet with OS bookworm
  • 01:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240', diff saved to https://phabricator.wikimedia.org/P81375 and previous config saved to /var/cache/conftool/dbconfig/20250816-010400-fceratto.json
  • 01:03 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2055.codfw.wmnet with OS bookworm
  • 01:00 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2053.codfw.wmnet with OS bookworm
  • 01:00 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 01:00 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 00:54 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2056.codfw.wmnet with OS bookworm
  • 00:54 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 00:53 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 00:53 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2054.codfw.wmnet with OS bookworm
  • 00:53 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 00:53 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 00:52 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2052.codfw.wmnet with reason: host reimage
  • 00:51 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2057.codfw.wmnet with OS bookworm
  • 00:51 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 00:50 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 00:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2240 (T399249)', diff saved to https://phabricator.wikimedia.org/P81374 and previous config saved to /var/cache/conftool/dbconfig/20250816-004852-fceratto.json
  • 00:48 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2051.codfw.wmnet with OS bookworm
  • 00:48 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 00:47 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 00:45 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2055.codfw.wmnet with reason: host reimage
  • 00:41 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2053.codfw.wmnet with reason: host reimage
  • 00:38 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2056.codfw.wmnet with reason: host reimage
  • 00:34 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2054.codfw.wmnet with reason: host reimage
  • 00:32 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2057.codfw.wmnet with reason: host reimage
  • 00:28 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2051.codfw.wmnet with reason: host reimage
  • 00:26 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es2057.codfw.wmnet with reason: host reimage
  • 00:26 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es2056.codfw.wmnet with reason: host reimage
  • 00:25 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es2055.codfw.wmnet with reason: host reimage
  • 00:25 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es2054.codfw.wmnet with reason: host reimage
  • 00:25 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es2053.codfw.wmnet with reason: host reimage
  • 00:25 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es2052.codfw.wmnet with reason: host reimage
  • 00:25 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on es2051.codfw.wmnet with reason: host reimage
  • 00:08 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host es2057.codfw.wmnet with OS bookworm
  • 00:07 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host es2056.codfw.wmnet with OS bookworm
  • 00:07 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host es2055.codfw.wmnet with OS bookworm
  • 00:07 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host es2054.codfw.wmnet with OS bookworm
  • 00:07 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host es2053.codfw.wmnet with OS bookworm
  • 00:07 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host es2052.codfw.wmnet with OS bookworm
  • 00:06 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host es2051.codfw.wmnet with OS bookworm

2025-08-15

  • 23:58 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2057.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:58 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:57 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:57 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:56 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:56 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:56 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:54 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2057.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:54 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:53 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:53 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:52 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:52 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:51 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:06 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2057.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:06 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:05 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:04 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:04 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:03 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:58 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2050.codfw.wmnet with OS bookworm
  • 22:58 jhancock@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 22:58 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1002"
  • 22:54 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:53 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2057.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:53 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2056.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:52 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2055.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:52 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2054.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:51 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2053.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:51 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2052.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:50 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host es2051.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:42 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2050.codfw.wmnet with reason: host reimage
  • 22:39 jhancock@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2050.codfw.wmnet with reason: host reimage
  • 22:33 sbassett: Deployed updated security mitigation for T401266
  • 22:33 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2057
  • 22:33 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es2057
  • 22:33 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2056
  • 22:33 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es2056
  • 22:32 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2055
  • 22:32 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es2055
  • 22:32 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2054
  • 22:32 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es2054
  • 22:32 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2053
  • 22:32 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es2053
  • 22:32 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2052
  • 22:31 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es2052
  • 22:31 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2051
  • 22:31 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host es2051
  • 22:31 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:31 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2051-7 to codfw - jhancock@cumin1003"
  • 22:31 dzahn@dns1004: END - running authdns-update
  • 22:31 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2051-7 to codfw - jhancock@cumin1003"
  • 22:30 dzahn@dns1004: START - running authdns-update
  • 22:27 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 22:22 denisse: Remove log debug file from host - T383309
  • 22:20 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host es2050.codfw.wmnet with OS bookworm
  • 18:54 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cirrussearch2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:52 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host cirrussearch2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:51 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cirrussearch2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:46 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host cirrussearch2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:24 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cirrussearch2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:18 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host cirrussearch2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:15 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cirrussearch2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:09 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host cirrussearch2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:40 wfan: civicrm upgraded from 5949c641 to 25bc1c7b
  • 17:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2240 (T399249)', diff saved to https://phabricator.wikimedia.org/P81373 and previous config saved to /var/cache/conftool/dbconfig/20250815-173159-fceratto.json
  • 17:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2240.codfw.wmnet with reason: Maintenance
  • 16:51 kemayo@deploy1003: Finished scap sync-world: Backport for Avoid error when switching to source editing (T402024) (duration: 08m 01s)
  • 16:45 kemayo@deploy1003: kemayo: Continuing with sync
  • 16:45 kemayo@deploy1003: kemayo: Backport for Avoid error when switching to source editing (T402024) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:42 kemayo@deploy1003: Started scap sync-world: Backport for Avoid error when switching to source editing (T402024)
  • 16:21 ammarpad@deploy1003: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=loginwiki --logwiki=metawiki 'Band of the Falcon' 'Griffith the Hero' # T401997
  • 10:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 10:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T399249)', diff saved to https://phabricator.wikimedia.org/P81371 and previous config saved to /var/cache/conftool/dbconfig/20250815-105424-fceratto.json
  • 10:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P81370 and previous config saved to /var/cache/conftool/dbconfig/20250815-103917-fceratto.json
  • 10:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P81369 and previous config saved to /var/cache/conftool/dbconfig/20250815-102408-fceratto.json
  • 10:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T399249)', diff saved to https://phabricator.wikimedia.org/P81368 and previous config saved to /var/cache/conftool/dbconfig/20250815-100901-fceratto.json
  • 10:06 zabe@deploy1003: Finished scap sync-world: Backport for Migrate overlooked query to categorylinks read new (T401951) (duration: 08m 17s)
  • 10:00 zabe@deploy1003: zabe: Continuing with sync
  • 09:59 zabe@deploy1003: zabe: Backport for Migrate overlooked query to categorylinks read new (T401951) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:59 moritzm: uploaded openjdk-8 8u462-ga-1 to bookworm (backport of latest Java 8 security fixes)
  • 09:57 zabe@deploy1003: Started scap sync-world: Backport for Migrate overlooked query to categorylinks read new (T401951)
  • 09:10 taavi: update python3-flask-keystone in trixie-wikimedia T401986
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1013.eqiad.wmnet
  • 07:02 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve1013.eqiad.wmnet
  • 04:39 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 04:14 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 04:13 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 03:45 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 03:44 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 03:03 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 03:01 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 02:44 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2237 (T399249)', diff saved to https://phabricator.wikimedia.org/P81367 and previous config saved to /var/cache/conftool/dbconfig/20250815-024449-fceratto.json
  • 02:44 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2237.codfw.wmnet with reason: Maintenance
  • 02:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T399249)', diff saved to https://phabricator.wikimedia.org/P81366 and previous config saved to /var/cache/conftool/dbconfig/20250815-024426-fceratto.json
  • 02:40 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Upgrading to Java 11.0.28 - eevans@cumin1002
  • 02:35 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 02:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P81365 and previous config saved to /var/cache/conftool/dbconfig/20250815-022919-fceratto.json
  • 02:25 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1047.eqiad.wmnet with OS bullseye
  • 02:22 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1046.eqiad.wmnet with OS bullseye
  • 02:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P81364 and previous config saved to /var/cache/conftool/dbconfig/20250815-021412-fceratto.json
  • 02:07 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1047.eqiad.wmnet with reason: host reimage
  • 02:03 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1047.eqiad.wmnet with reason: host reimage
  • 02:03 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1046.eqiad.wmnet with reason: host reimage
  • 02:02 ejegg: payments-wiki upgraded from 5c5b4637 to 03e8e38f
  • 01:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T399249)', diff saved to https://phabricator.wikimedia.org/P81363 and previous config saved to /var/cache/conftool/dbconfig/20250815-015904-fceratto.json
  • 01:58 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1046.eqiad.wmnet with reason: host reimage
  • 01:36 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1047.eqiad.wmnet with OS bullseye
  • 01:31 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1046.eqiad.wmnet with OS bullseye
  • 01:23 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1047.eqiad.wmnet with OS bullseye
  • 01:17 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1046.eqiad.wmnet with OS bullseye
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 50s)
  • 01:09 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Upgrading to Java 11.0.28 - eevans@cumin1002
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-08-14

  • 23:48 rzl@deploy1003: Finished scap sync-world: https://gerrit.wikimedia.org/r/1178666 (duration: 03m 28s)
  • 23:46 rzl@deploy1003: Started scap sync-world: https://gerrit.wikimedia.org/r/1178666
  • 23:41 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Upgrading to Java 11.0.28 - eevans@cumin1002
  • 22:14 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Upgrading to Java 11.0.28 - eevans@cumin1002
  • 22:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host es2049.codfw.wmnet with OS bookworm
  • 22:09 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 22:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 21:53 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Upgrading to Java 11.0.28 - eevans@cumin1002
  • 21:51 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1043.eqiad.wmnet with OS bullseye
  • 21:48 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es2049.codfw.wmnet with reason: host reimage
  • 21:43 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on es2049.codfw.wmnet with reason: host reimage
  • 21:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host es2049.codfw.wmnet with OS bookworm
  • 21:33 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1043.eqiad.wmnet with reason: host reimage
  • 21:30 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1043.eqiad.wmnet with reason: host reimage
  • 21:12 ejegg: re-enabled fundraising scheduled jobs
  • 21:03 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1043.eqiad.wmnet with OS bullseye
  • 20:58 jhuneidi@deploy1003: Finished scap sync-world: Backport for MediaWiki.org: Restrict creation of empty categories using Translate (T401878) (duration: 13m 36s)
  • 20:52 jhuneidi@deploy1003: jhuneidi, pppery: Continuing with sync
  • 20:48 dzahn@dns1004: END - running authdns-update
  • 20:47 dzahn@dns1004: START - running authdns-update
  • 20:46 jhuneidi@deploy1003: jhuneidi, pppery: Backport for MediaWiki.org: Restrict creation of empty categories using Translate (T401878) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:44 jhuneidi@deploy1003: Started scap sync-world: Backport for MediaWiki.org: Restrict creation of empty categories using Translate (T401878)
  • 20:37 ejegg: payments-wiki upgraded from cd876775 to 5c5b4637
  • 20:36 jhuneidi@deploy1003: Finished scap sync-world: Backport for wikimaniawiki: remove 2026-2028 namespace protection (T401948), wikimaniawiki: update extendedconfirmed promotion config (T401537) (duration: 10m 12s)
  • 20:31 jhuneidi@deploy1003: robertsky, jhuneidi: Continuing with sync
  • 20:28 jhuneidi@deploy1003: robertsky, jhuneidi: Backport for wikimaniawiki: remove 2026-2028 namespace protection (T401948), wikimaniawiki: update extendedconfirmed promotion config (T401537) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:27 ejegg: standalone SmashPig upgraded from 83293ee1 to 3d1d6f15
  • 20:27 ejegg: fundraising civicrm upgraded from 321a17c0 to 5949c641
  • 20:26 jhuneidi@deploy1003: Started scap sync-world: Backport for wikimaniawiki: remove 2026-2028 namespace protection (T401948), wikimaniawiki: update extendedconfirmed promotion config (T401537)
  • 20:24 ejegg: stopped scheduled fundraising jobs for deployment
  • 20:21 mstyles@deploy1003: Finished scap sync-world: Backport for WebAuthn: Limit passkeys to roaming (T399665) (duration: 10m 23s)
  • 20:16 mstyles@deploy1003: mstyles: Continuing with sync
  • 20:13 mstyles@deploy1003: mstyles: Backport for WebAuthn: Limit passkeys to roaming (T399665) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:11 mstyles@deploy1003: Started scap sync-world: Backport for WebAuthn: Limit passkeys to roaming (T399665)
  • 20:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2049.codfw.wmnet with OS bookworm
  • 19:47 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Upgrading to Java 11.0.28 - eevans@cumin1002
  • 19:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host es2049.codfw.wmnet with OS bookworm
  • 18:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2236 (T399249)', diff saved to https://phabricator.wikimedia.org/P81361 and previous config saved to /var/cache/conftool/dbconfig/20250814-185410-fceratto.json
  • 18:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2236.codfw.wmnet with reason: Maintenance
  • 18:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T399249)', diff saved to https://phabricator.wikimedia.org/P81360 and previous config saved to /var/cache/conftool/dbconfig/20250814-185348-fceratto.json
  • 18:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P81359 and previous config saved to /var/cache/conftool/dbconfig/20250814-183840-fceratto.json
  • 18:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P81358 and previous config saved to /var/cache/conftool/dbconfig/20250814-182333-fceratto.json
  • 18:10 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.14 refs T396375
  • 18:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T399249)', diff saved to https://phabricator.wikimedia.org/P81357 and previous config saved to /var/cache/conftool/dbconfig/20250814-180825-fceratto.json
  • 18:07 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Upgrading to Java 11.0.28 - eevans@cumin1002
  • 17:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1112 to an-backup-datanode1042
  • 17:47 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1042
  • 17:37 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2050.codfw.wmnet with OS bookworm
  • 17:02 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:01 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:01 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-backup-datanode1046.eqiad.wmnet with OS bookworm
  • 17:01 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:00 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:00 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:00 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:55 zabe@deploy1003: Finished scap sync-world: Backport for Stop writing to cl_to and cl_collation on medium wikis (T399579) (duration: 08m 23s)
  • 16:49 zabe@deploy1003: zabe: Continuing with sync
  • 16:48 zabe@deploy1003: zabe: Backport for Stop writing to cl_to and cl_collation on medium wikis (T399579) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:48 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-ntp (exit_code=0) rolling restart_daemons on A:dnsbox
  • 16:46 zabe@deploy1003: Started scap sync-world: Backport for Stop writing to cl_to and cl_collation on medium wikis (T399579)
  • 16:27 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1042
  • 16:27 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-datanode1042 on all recursors
  • 16:27 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-datanode1042 on all recursors
  • 16:27 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:27 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1112 to an-backup-datanode1042 - btullis@cumin1003"
  • 16:27 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1112 to an-backup-datanode1042 - btullis@cumin1003"
  • 16:18 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host es2050.codfw.wmnet with OS bookworm
  • 16:17 jhancock@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2050']
  • 16:17 jhancock@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2050']
  • 16:15 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 16:15 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1112 to an-backup-datanode1042
  • 16:14 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:13 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1113 to an-backup-datanode1043
  • 16:13 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1043
  • 16:12 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1043
  • 16:12 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-datanode1043 on all recursors
  • 16:12 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-datanode1043 on all recursors
  • 16:12 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:12 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1113 to an-backup-datanode1043 - btullis@cumin1003"
  • 16:10 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host es2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:10 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1113 to an-backup-datanode1043 - btullis@cumin1003"
  • 16:06 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 16:06 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1113 to an-backup-datanode1043
  • 16:03 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Upgrading to Java 11.0.28 - eevans@cumin1002
  • 16:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1114 to an-backup-datanode1044
  • 16:00 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1044
  • 15:59 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1044
  • 15:59 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-datanode1044 on all recursors
  • 15:59 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-datanode1044 on all recursors
  • 15:59 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:59 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1114 to an-backup-datanode1044 - btullis@cumin1003"
  • 15:58 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:58 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1114 to an-backup-datanode1044 - btullis@cumin1003"
  • 15:55 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 15:55 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 15:54 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 15:54 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1114 to an-backup-datanode1044
  • 15:54 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1046.eqiad.wmnet with OS bookworm
  • 15:52 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1115 to an-backup-datanode1045
  • 15:51 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1045
  • 15:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 15:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 15:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
  • 15:48 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
  • 15:46 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host es2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:45 jhancock@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2050
  • 15:45 jhancock@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host es2050
  • 15:45 jhancock@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:45 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2050 to codfw - jhancock@cumin1002"
  • 15:45 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2050 to codfw - jhancock@cumin1002"
  • 15:41 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1045
  • 15:41 jhancock@cumin1002: START - Cookbook sre.dns.netbox
  • 15:41 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-datanode1045 on all recursors
  • 15:41 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-datanode1045 on all recursors
  • 15:41 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:41 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1115 to an-backup-datanode1045 - btullis@cumin1003"
  • 15:37 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1115 to an-backup-datanode1045 - btullis@cumin1003"
  • 15:32 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 15:32 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1115 to an-backup-datanode1045
  • 15:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1116 to an-backup-datanode1046
  • 15:31 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1046
  • 15:30 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1046
  • 15:30 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-datanode1046 on all recursors
  • 15:30 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-datanode1046 on all recursors
  • 15:29 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:29 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1116 to an-backup-datanode1046 - btullis@cumin1003"
  • 15:29 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1116 to an-backup-datanode1046 - btullis@cumin1003"
  • 15:22 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2144.codfw.wmnet
  • 15:21 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1151.eqiad.wmnet
  • 15:17 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 15:17 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1116 to an-backup-datanode1046
  • 15:15 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2144.codfw.wmnet
  • 15:14 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db1151.eqiad.wmnet
  • 15:11 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 15:11 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 15:10 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 15:10 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 15:01 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:01 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for nokia switches codfw - cmooney@cumin1003"
  • 15:01 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for nokia switches codfw - cmooney@cumin1003"
  • 14:58 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 14:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti7004.magru.wmnet with OS bookworm
  • 14:54 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
  • 14:41 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
  • 14:35 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti7004.magru.wmnet with reason: host reimage
  • 14:33 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
  • 14:32 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 14:32 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 14:31 moritzm: installing libxml2 security updates
  • 14:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti7004.magru.wmnet with reason: host reimage
  • 14:29 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 14:29 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 14:28 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 14:28 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 14:23 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1153.eqiad.wmnet
  • 14:16 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db1153.eqiad.wmnet
  • 14:10 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2143.codfw.wmnet
  • 14:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti7004.magru.wmnet with OS bookworm
  • 14:03 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2143.codfw.wmnet
  • 14:03 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 14:03 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 13:59 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: namespaceDupes rkiwiki --fix # T392499
  • 13:55 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:55 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for minwikibooks: add logos (T395499), rkiwiki: add logos (T392499) (duration: 08m 54s)
  • 13:51 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:51 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: removed VIP for magru02 - jmm@cumin2002"
  • 13:51 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: removed VIP for magru02 - jmm@cumin2002"
  • 13:49 lucaswerkmeister-wmde@deploy1003: anzx, lucaswerkmeister-wmde: Continuing with sync
  • 13:48 lucaswerkmeister-wmde@deploy1003: anzx, lucaswerkmeister-wmde: Backport for minwikibooks: add logos (T395499), rkiwiki: add logos (T392499) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:46 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for minwikibooks: add logos (T395499), rkiwiki: add logos (T392499)
  • 13:45 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:43 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for ResourceLoader: Temporily track cache usage of preloaded NS_USER title info (T393835) (duration: 17m 06s)
  • 13:38 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, hokwelum: Continuing with sync
  • 13:28 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, hokwelum: Backport for ResourceLoader: Temporily track cache usage of preloaded NS_USER title info (T393835) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:26 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for ResourceLoader: Temporily track cache usage of preloaded NS_USER title info (T393835)
  • 13:25 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for rkiwiki: set sitename, project namespace and time zone (T392499) (duration: 10m 00s)
  • 13:19 lucaswerkmeister-wmde@deploy1003: anzx, lucaswerkmeister-wmde: Continuing with sync
  • 13:17 lucaswerkmeister-wmde@deploy1003: anzx, lucaswerkmeister-wmde: Backport for rkiwiki: set sitename, project namespace and time zone (T392499) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:15 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts durum7002.magru.wmnet
  • 13:15 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:15 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for rkiwiki: set sitename, project namespace and time zone (T392499)
  • 13:13 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 13:12 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts doh7002.wikimedia.org
  • 13:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:12 sukhe@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh7002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1003"
  • 13:12 sukhe@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: doh7002.wikimedia.org decommissioned, removing all IPs except the asset tag one - sukhe@cumin1003"
  • 13:08 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 13:04 sukhe@cumin1003: START - Cookbook sre.hosts.decommission for hosts durum7002.magru.wmnet
  • 13:03 sukhe@cumin1003: START - Cookbook sre.hosts.decommission for hosts doh7002.wikimedia.org
  • 12:57 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 12:57 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 12:57 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 12:57 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 12:55 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-ntp rolling restart_daemons on A:dnsbox
  • 12:54 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d6-eqiad
  • 12:54 cmooney@cumin1003: START - Cookbook sre.network.tls for network device lsw1-d6-eqiad
  • 12:50 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent": T401848
  • 12:47 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1152.eqiad.wmnet
  • 12:39 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db1152.eqiad.wmnet
  • 12:12 moritzm: installing PHP 7.4 security updates
  • 12:03 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2142.codfw.wmnet
  • 11:55 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2142.codfw.wmnet
  • 11:45 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-datanode1001.eqiad.wmnet with OS bookworm
  • 11:45 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 11:41 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
  • 11:33 logmsgbot: mszabo Deployed security patch for T280413
  • 11:28 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2142.codfw.wmnet
  • 11:28 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) db2142 - Upgrading db2142.codfw.wmnet
  • 11:28 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2142 - Upgrading db2142.codfw.wmnet
  • 11:27 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2142.codfw.wmnet
  • 11:23 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-datanode1001.eqiad.wmnet with reason: host reimage
  • 11:23 taavi: copy thanos package to trixie-wikimedia T401813
  • 11:19 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-datanode1001.eqiad.wmnet with reason: host reimage
  • 11:13 moritzm: installing openssl security updates
  • 11:08 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
  • 11:08 fceratto@cumin1002: START - Cookbook sre.mysql.parsercache
  • 10:55 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T399249)', diff saved to https://phabricator.wikimedia.org/P81350 and previous config saved to /var/cache/conftool/dbconfig/20250814-105514-fceratto.json
  • 10:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 10:54 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1001.eqiad.wmnet with OS bookworm
  • 10:49 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host an-backup-datanode1001.eqiad.wmnet with OS bookworm
  • 10:37 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1001.eqiad.wmnet with OS bookworm
  • 10:04 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-backup-datanode1001.eqiad.wmnet with OS bookworm
  • 10:00 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host thanos-be1005.eqiad.wmnet with OS bullseye
  • 09:43 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on thanos-be1005.eqiad.wmnet with reason: host reimage
  • 09:43 moritzm: installing Java 17 security updates
  • 09:41 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:39 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:39 cgoubert@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:39 mvernon@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on thanos-be1005.eqiad.wmnet with reason: host reimage
  • 09:38 cgoubert@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:17 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host thanos-be1005.eqiad.wmnet with OS bullseye
  • 09:07 moritzm: installing Java 8 security updates on Bullseye
  • 08:59 moritzm: uploaded openjdk-8 8u462-ga-1 to bullseye-wikimedia (backport of latest Java 8 security fixes)
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
  • 08:34 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 08:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 08:31 XioNoX: lsw1-d2-codfw> restart jsd gracefully
  • 08:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 08:27 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 08:26 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
  • 08:26 moritzm: installing Java 8 security updates on kafka-test*
  • 07:50 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 07:49 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 07:19 gkyziridis@deploy1003: Finished scap sync-world: Backport for ores-extension: add threshold for revertrisk in enwiki (T400590) (duration: 12m 07s)
  • 07:14 gkyziridis@deploy1003: gkyziridis, isaranto: Continuing with sync
  • 07:09 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 07:09 gkyziridis@deploy1003: gkyziridis, isaranto: Backport for ores-extension: add threshold for revertrisk in enwiki (T400590) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T400854)', diff saved to https://phabricator.wikimedia.org/P81344 and previous config saved to /var/cache/conftool/dbconfig/20250814-070932-ladsgroup.json
  • 07:07 gkyziridis@deploy1003: Started scap sync-world: Backport for ores-extension: add threshold for revertrisk in enwiki (T400590)
  • 06:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P81343 and previous config saved to /var/cache/conftool/dbconfig/20250814-065424-ladsgroup.json
  • 06:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P81342 and previous config saved to /var/cache/conftool/dbconfig/20250814-063916-ladsgroup.json
  • 06:32 XioNoX: lsw1-d2-codfw> restart analytics-agent gracefully
  • 06:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T400854)', diff saved to https://phabricator.wikimedia.org/P81341 and previous config saved to /var/cache/conftool/dbconfig/20250814-062409-ladsgroup.json
  • 06:23 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262725
  • 06:22 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 262725
  • 06:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1252 (T400854)', diff saved to https://phabricator.wikimedia.org/P81340 and previous config saved to /var/cache/conftool/dbconfig/20250814-061838-ladsgroup.json
  • 06:18 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1252.eqiad.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T400854)', diff saved to https://phabricator.wikimedia.org/P81339 and previous config saved to /var/cache/conftool/dbconfig/20250814-061816-ladsgroup.json
  • 06:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P81338 and previous config saved to /var/cache/conftool/dbconfig/20250814-060308-ladsgroup.json
  • 05:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P81337 and previous config saved to /var/cache/conftool/dbconfig/20250814-054800-ladsgroup.json
  • 05:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T400854)', diff saved to https://phabricator.wikimedia.org/P81336 and previous config saved to /var/cache/conftool/dbconfig/20250814-053252-ladsgroup.json
  • 05:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T400854)', diff saved to https://phabricator.wikimedia.org/P81335 and previous config saved to /var/cache/conftool/dbconfig/20250814-052831-ladsgroup.json
  • 05:28 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 05:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T400854)', diff saved to https://phabricator.wikimedia.org/P81334 and previous config saved to /var/cache/conftool/dbconfig/20250814-052809-ladsgroup.json
  • 05:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P81333 and previous config saved to /var/cache/conftool/dbconfig/20250814-051301-ladsgroup.json
  • 04:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P81332 and previous config saved to /var/cache/conftool/dbconfig/20250814-045753-ladsgroup.json
  • 04:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T400854)', diff saved to https://phabricator.wikimedia.org/P81331 and previous config saved to /var/cache/conftool/dbconfig/20250814-044246-ladsgroup.json
  • 04:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T400854)', diff saved to https://phabricator.wikimedia.org/P81330 and previous config saved to /var/cache/conftool/dbconfig/20250814-043732-ladsgroup.json
  • 04:37 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 04:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T400854)', diff saved to https://phabricator.wikimedia.org/P81329 and previous config saved to /var/cache/conftool/dbconfig/20250814-043719-ladsgroup.json
  • 04:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P81328 and previous config saved to /var/cache/conftool/dbconfig/20250814-042211-ladsgroup.json
  • 04:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P81327 and previous config saved to /var/cache/conftool/dbconfig/20250814-040703-ladsgroup.json
  • 03:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T400854)', diff saved to https://phabricator.wikimedia.org/P81326 and previous config saved to /var/cache/conftool/dbconfig/20250814-035155-ladsgroup.json
  • 03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T400854)', diff saved to https://phabricator.wikimedia.org/P81325 and previous config saved to /var/cache/conftool/dbconfig/20250814-034734-ladsgroup.json
  • 03:47 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 03:43 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 03:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T400854)', diff saved to https://phabricator.wikimedia.org/P81324 and previous config saved to /var/cache/conftool/dbconfig/20250814-034332-ladsgroup.json
  • 03:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P81323 and previous config saved to /var/cache/conftool/dbconfig/20250814-032824-ladsgroup.json
  • 03:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P81322 and previous config saved to /var/cache/conftool/dbconfig/20250814-031316-ladsgroup.json
  • 02:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T400854)', diff saved to https://phabricator.wikimedia.org/P81321 and previous config saved to /var/cache/conftool/dbconfig/20250814-025808-ladsgroup.json
  • 02:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T400854)', diff saved to https://phabricator.wikimedia.org/P81320 and previous config saved to /var/cache/conftool/dbconfig/20250814-025323-ladsgroup.json
  • 02:53 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 02:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T400854)', diff saved to https://phabricator.wikimedia.org/P81319 and previous config saved to /var/cache/conftool/dbconfig/20250814-025300-ladsgroup.json
  • 02:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P81318 and previous config saved to /var/cache/conftool/dbconfig/20250814-023753-ladsgroup.json
  • 02:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P81317 and previous config saved to /var/cache/conftool/dbconfig/20250814-022245-ladsgroup.json
  • 02:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T400854)', diff saved to https://phabricator.wikimedia.org/P81316 and previous config saved to /var/cache/conftool/dbconfig/20250814-020737-ladsgroup.json
  • 02:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T400854)', diff saved to https://phabricator.wikimedia.org/P81315 and previous config saved to /var/cache/conftool/dbconfig/20250814-020228-ladsgroup.json
  • 02:02 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 02:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T400854)', diff saved to https://phabricator.wikimedia.org/P81314 and previous config saved to /var/cache/conftool/dbconfig/20250814-020205-ladsgroup.json
  • 01:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P81313 and previous config saved to /var/cache/conftool/dbconfig/20250814-014657-ladsgroup.json
  • 01:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P81312 and previous config saved to /var/cache/conftool/dbconfig/20250814-013149-ladsgroup.json
  • 01:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T400854)', diff saved to https://phabricator.wikimedia.org/P81311 and previous config saved to /var/cache/conftool/dbconfig/20250814-011642-ladsgroup.json
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 48s)
  • 01:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T400854)', diff saved to https://phabricator.wikimedia.org/P81310 and previous config saved to /var/cache/conftool/dbconfig/20250814-011215-ladsgroup.json
  • 01:12 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 01:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T400854)', diff saved to https://phabricator.wikimedia.org/P81309 and previous config saved to /var/cache/conftool/dbconfig/20250814-011152-ladsgroup.json
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P81308 and previous config saved to /var/cache/conftool/dbconfig/20250814-005644-ladsgroup.json
  • 00:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P81307 and previous config saved to /var/cache/conftool/dbconfig/20250814-004137-ladsgroup.json
  • 00:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T400854)', diff saved to https://phabricator.wikimedia.org/P81306 and previous config saved to /var/cache/conftool/dbconfig/20250814-002629-ladsgroup.json
  • 00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T400854)', diff saved to https://phabricator.wikimedia.org/P81305 and previous config saved to /var/cache/conftool/dbconfig/20250814-002159-ladsgroup.json
  • 00:21 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 00:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T400854)', diff saved to https://phabricator.wikimedia.org/P81304 and previous config saved to /var/cache/conftool/dbconfig/20250814-002136-ladsgroup.json
  • 00:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P81303 and previous config saved to /var/cache/conftool/dbconfig/20250814-000629-ladsgroup.json

2025-08-13

  • 23:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P81300 and previous config saved to /var/cache/conftool/dbconfig/20250813-235121-ladsgroup.json
  • 23:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T400854)', diff saved to https://phabricator.wikimedia.org/P81299 and previous config saved to /var/cache/conftool/dbconfig/20250813-233614-ladsgroup.json
  • 23:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T400854)', diff saved to https://phabricator.wikimedia.org/P81298 and previous config saved to /var/cache/conftool/dbconfig/20250813-233102-ladsgroup.json
  • 23:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T400854)', diff saved to https://phabricator.wikimedia.org/P81297 and previous config saved to /var/cache/conftool/dbconfig/20250813-233031-ladsgroup.json
  • 23:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P81296 and previous config saved to /var/cache/conftool/dbconfig/20250813-231523-ladsgroup.json
  • 23:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P81295 and previous config saved to /var/cache/conftool/dbconfig/20250813-230015-ladsgroup.json
  • 22:57 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd2004-dev.codfw.wmnet with OS bookworm
  • 22:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T400854)', diff saved to https://phabricator.wikimedia.org/P81294 and previous config saved to /var/cache/conftool/dbconfig/20250813-224508-ladsgroup.json
  • 22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T400854)', diff saved to https://phabricator.wikimedia.org/P81293 and previous config saved to /var/cache/conftool/dbconfig/20250813-224044-ladsgroup.json
  • 22:40 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 22:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T400854)', diff saved to https://phabricator.wikimedia.org/P81292 and previous config saved to /var/cache/conftool/dbconfig/20250813-224021-ladsgroup.json
  • 22:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P81291 and previous config saved to /var/cache/conftool/dbconfig/20250813-222513-ladsgroup.json
  • 22:18 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
  • 22:15 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd2004-dev.codfw.wmnet with reason: host reimage
  • 22:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P81290 and previous config saved to /var/cache/conftool/dbconfig/20250813-221006-ladsgroup.json
  • 21:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T400854)', diff saved to https://phabricator.wikimedia.org/P81289 and previous config saved to /var/cache/conftool/dbconfig/20250813-215458-ladsgroup.json
  • 21:54 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-d8-eqiad
  • 21:54 cmooney@cumin1003: START - Cookbook sre.network.tls for network device ssw1-d8-eqiad
  • 21:54 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2004-dev.codfw.wmnet with OS bookworm
  • 21:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T400854)', diff saved to https://phabricator.wikimedia.org/P81288 and previous config saved to /var/cache/conftool/dbconfig/20250813-215003-ladsgroup.json
  • 21:49 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T400854)', diff saved to https://phabricator.wikimedia.org/P81287 and previous config saved to /var/cache/conftool/dbconfig/20250813-214940-ladsgroup.json
  • 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P81286 and previous config saved to /var/cache/conftool/dbconfig/20250813-213433-ladsgroup.json
  • 21:29 brennen@deploy1003: Installation of scap version "4.201.0" completed for 169 hosts
  • 21:24 brennen@deploy1003: Installing scap version "4.201.0" for 169 host(s)
  • 21:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P81285 and previous config saved to /var/cache/conftool/dbconfig/20250813-211925-ladsgroup.json
  • 21:14 zabe@deploy1003: mwscript-k8s job started: extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki minwikibooks --cluster=all # T395452
  • 21:14 zabe@deploy1003: mwscript-k8s job started: extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki madwikisource --cluster=all # T391747
  • 21:13 zabe@deploy1003: mwscript-k8s job started: extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki tlwikisource --cluster=all # T388639
  • 21:12 zabe@deploy1003: mwscript-k8s job started: extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki zghwiktionary --cluster=all # T399684
  • 21:11 zabe@deploy1003: mwscript-k8s job started: extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki rkiwiki --cluster=all # T392490
  • 21:10 zabe@deploy1003: mwscript-k8s job started: extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki rkiw --cluster=all # T392490
  • 21:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T400854)', diff saved to https://phabricator.wikimedia.org/P81284 and previous config saved to /var/cache/conftool/dbconfig/20250813-210418-ladsgroup.json
  • 21:01 zabe@deploy1003: Finished scap sync-world: Backport for UpdateSearchIndexConfig get the writable clusters not all of them (T401633), UpdateSearchIndexConfig get the writable clusters not all of them (T401633) (duration: 08m 41s)
  • 20:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T400854)', diff saved to https://phabricator.wikimedia.org/P81283 and previous config saved to /var/cache/conftool/dbconfig/20250813-205923-ladsgroup.json
  • 20:59 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 20:56 zabe@deploy1003: zabe: Continuing with sync
  • 20:55 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 20:55 zabe@deploy1003: zabe: Backport for UpdateSearchIndexConfig get the writable clusters not all of them (T401633), UpdateSearchIndexConfig get the writable clusters not all of them (T401633) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:54 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 20:53 zabe@deploy1003: Started scap sync-world: Backport for UpdateSearchIndexConfig get the writable clusters not all of them (T401633), UpdateSearchIndexConfig get the writable clusters not all of them (T401633)
  • 20:52 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 20:36 swfrench-wmf: start manual equivalent of imagesuggestions-notifyunillustratedwatched-ca cronjob in --dry-run mode - T368096
  • 20:36 swfrench@deploy1003: mwscript-k8s job started: extensions/ImageSuggestions/maintenance/SendNotificationsForUnillustratedWatchedTitles.php --wiki=cawiki --min-edit-count=500 --min-confidence=80 --max-notifications-per-user=2 --exclude-instance-of=Q5 --queue --quiet --dry-run
  • 20:27 swfrench@deploy1003: Finished scap sync-world: Backport for Reduce log level to 'info' on ImageSuggestions (T368096) (duration: 09m 22s)
  • 20:26 eevans@deploy1003: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 20:24 eevans@deploy1003: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 20:21 swfrench@deploy1003: swfrench: Continuing with sync
  • 20:21 eevans@deploy1003: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 20:21 eevans@deploy1003: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 20:20 swfrench@deploy1003: swfrench: Backport for Reduce log level to 'info' on ImageSuggestions (T368096) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:17 swfrench@deploy1003: Started scap sync-world: Backport for Reduce log level to 'info' on ImageSuggestions (T368096)
  • 20:13 kemayo@deploy1003: Finished scap sync-world: Backport for Edit check: selectionmanager/gutter merge follow-ups (T400905) (duration: 09m 17s)
  • 20:07 kemayo@deploy1003: kemayo: Continuing with sync
  • 20:05 kemayo@deploy1003: kemayo: Backport for Edit check: selectionmanager/gutter merge follow-ups (T400905) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:03 kemayo@deploy1003: Started scap sync-world: Backport for Edit check: selectionmanager/gutter merge follow-ups (T400905)
  • 19:40 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 19:33 aqu@deploy1003: Finished deploy [analytics/refinery@f09c763] (thin): Regular analytics weekly train THIN [analytics/refinery@f09c7633] (duration: 01m 33s)
  • 19:32 aqu@deploy1003: Started deploy [analytics/refinery@f09c763] (thin): Regular analytics weekly train THIN [analytics/refinery@f09c7633]
  • 19:31 aqu@deploy1003: Finished deploy [analytics/refinery@f09c763]: Regular analytics weekly train [analytics/refinery@f09c7633] (duration: 03m 09s)
  • 19:28 aqu@deploy1003: Started deploy [analytics/refinery@f09c763]: Regular analytics weekly train [analytics/refinery@f09c7633]
  • 19:21 aqu@deploy1003: Finished deploy [analytics/refinery@f09c763] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f09c7633] (duration: 03m 05s)
  • 19:19 mutante: lists.wikimedia.org - restarted apache2, added NEL headers
  • 19:18 aqu@deploy1003: Started deploy [analytics/refinery@f09c763] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f09c7633]
  • 18:41 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 18:31 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 18:28 dzahn@dns1004: END - running authdns-update
  • 18:26 dzahn@dns1004: START - running authdns-update
  • 18:13 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.14 refs T396375
  • 17:59 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 17:32 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2219 gradually with 4 steps - Repooling
  • 17:29 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:23 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
  • 17:17 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:06 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:00 brett@dns1004: END - running authdns-update
  • 16:59 brett@dns1004: START - running authdns-update
  • 16:58 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 16:56 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:55 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:55 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:53 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:46 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2219 gradually with 4 steps - Repooling
  • 16:45 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:39 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1047.eqiad.wmnet with reason: host reimage
  • 16:33 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:33 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1047.eqiad.wmnet with reason: host reimage
  • 16:23 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:21 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2147 quickly with 2 steps - Repooling
  • 16:09 urbanecm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 16:08 urbanecm@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 16:06 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2147 quickly with 2 steps - Repooling
  • 16:02 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 16:01 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 15:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T399249)', diff saved to https://phabricator.wikimedia.org/P81275 and previous config saved to /var/cache/conftool/dbconfig/20250813-155650-fceratto.json
  • 15:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T399249)', diff saved to https://phabricator.wikimedia.org/P81274 and previous config saved to /var/cache/conftool/dbconfig/20250813-155440-fceratto.json
  • 15:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 15:29 akosiaris@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:27 akosiaris@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:25 akosiaris@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:24 akosiaris@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:40 moritzm: installing PHP 8.2 security updates
  • 14:30 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:27 jmm@dns1004: END - running authdns-update
  • 14:26 jmm@dns1004: START - running authdns-update
  • 14:25 jmm@dns1004: END - running authdns-update
  • 14:25 jmm@dns1004: START - running authdns-update
  • 14:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T400854)', diff saved to https://phabricator.wikimedia.org/P81268 and previous config saved to /var/cache/conftool/dbconfig/20250813-142421-ladsgroup.json
  • 14:22 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:21 gkyziridis@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
  • 14:20 apine@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 apine@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P81267 and previous config saved to /var/cache/conftool/dbconfig/20250813-140914-ladsgroup.json
  • 14:06 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2165.codfw.wmnet
  • 14:06 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2165 gradually with 4 steps - Upgrade of db2165.codfw.wmnet completed
  • 13:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P81266 and previous config saved to /var/cache/conftool/dbconfig/20250813-135407-ladsgroup.json
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T400854)', diff saved to https://phabricator.wikimedia.org/P81265 and previous config saved to /var/cache/conftool/dbconfig/20250813-133859-ladsgroup.json
  • 13:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T400854)', diff saved to https://phabricator.wikimedia.org/P81264 and previous config saved to /var/cache/conftool/dbconfig/20250813-133619-ladsgroup.json
  • 13:36 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 13:27 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:25 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: namespaceDupes madwikisource --fix # T391767
  • 13:24 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: namespaceDupes zghwiktionary --fix # T399785
  • 13:23 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: namespaceDupes minwikibooks --fix # T395499
  • 13:21 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Revert^2 "madwikisource: set metanamespace, sitename and timezone", Revert^2 "minwikibooks , zghwiktionary : add project talk namespace aliases" (duration: 13m 40s)
  • 13:20 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2165 gradually with 4 steps - Upgrade of db2165.codfw.wmnet completed
  • 13:16 lucaswerkmeister-wmde@deploy1003: anzx, lucaswerkmeister-wmde: Continuing with sync
  • 13:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2165 - Upgrading db2165.codfw.wmnet
  • 13:12 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2165 - Upgrading db2165.codfw.wmnet
  • 13:11 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2165.codfw.wmnet
  • 13:09 lucaswerkmeister-wmde@deploy1003: anzx, lucaswerkmeister-wmde: Backport for Revert^2 "madwikisource: set metanamespace, sitename and timezone", Revert^2 "minwikibooks , zghwiktionary : add project talk namespace aliases" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:08 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
  • 13:07 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Revert^2 "madwikisource: set metanamespace, sitename and timezone", Revert^2 "minwikibooks , zghwiktionary : add project talk namespace aliases"
  • 12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T400854)', diff saved to https://phabricator.wikimedia.org/P81261 and previous config saved to /var/cache/conftool/dbconfig/20250813-125036-ladsgroup.json
  • 12:49 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica
  • 12:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T400854)', diff saved to https://phabricator.wikimedia.org/P81260 and previous config saved to /var/cache/conftool/dbconfig/20250813-124849-ladsgroup.json
  • 12:48 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 12:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T400854)', diff saved to https://phabricator.wikimedia.org/P81259 and previous config saved to /var/cache/conftool/dbconfig/20250813-124734-ladsgroup.json
  • 12:43 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T399249)', diff saved to https://phabricator.wikimedia.org/P81258 and previous config saved to /var/cache/conftool/dbconfig/20250813-124348-fceratto.json
  • 12:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 12:43 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T399249)', diff saved to https://phabricator.wikimedia.org/P81257 and previous config saved to /var/cache/conftool/dbconfig/20250813-124326-fceratto.json
  • 12:40 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab Replica
  • 12:39 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica
  • 12:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P81256 and previous config saved to /var/cache/conftool/dbconfig/20250813-123226-ladsgroup.json
  • 12:30 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica
  • 12:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P81255 and previous config saved to /var/cache/conftool/dbconfig/20250813-122818-fceratto.json
  • 12:26 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-datanode1001.eqiad.wmnet with OS bookworm
  • 12:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P81254 and previous config saved to /var/cache/conftool/dbconfig/20250813-121719-ladsgroup.json
  • 12:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P81253 and previous config saved to /var/cache/conftool/dbconfig/20250813-121311-fceratto.json
  • 12:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T400854)', diff saved to https://phabricator.wikimedia.org/P81252 and previous config saved to /var/cache/conftool/dbconfig/20250813-120212-ladsgroup.json
  • 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T400854)', diff saved to https://phabricator.wikimedia.org/P81251 and previous config saved to /var/cache/conftool/dbconfig/20250813-115937-ladsgroup.json
  • 11:59 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T400854)', diff saved to https://phabricator.wikimedia.org/P81250 and previous config saved to /var/cache/conftool/dbconfig/20250813-115913-ladsgroup.json
  • 11:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T399249)', diff saved to https://phabricator.wikimedia.org/P81249 and previous config saved to /var/cache/conftool/dbconfig/20250813-115803-fceratto.json
  • 11:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P81248 and previous config saved to /var/cache/conftool/dbconfig/20250813-114406-ladsgroup.json
  • 11:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 11:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P81247 and previous config saved to /var/cache/conftool/dbconfig/20250813-112858-ladsgroup.json
  • 11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T400854)', diff saved to https://phabricator.wikimedia.org/P81246 and previous config saved to /var/cache/conftool/dbconfig/20250813-111351-ladsgroup.json
  • 11:13 jforrester@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 11:12 jforrester@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 11:11 jforrester@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 11:11 jforrester@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T400854)', diff saved to https://phabricator.wikimedia.org/P81245 and previous config saved to /var/cache/conftool/dbconfig/20250813-111112-ladsgroup.json
  • 11:11 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 11:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1054.eqiad.wmnet to cluster eqiad and group A
  • 11:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T400854)', diff saved to https://phabricator.wikimedia.org/P81244 and previous config saved to /var/cache/conftool/dbconfig/20250813-111049-ladsgroup.json
  • 11:10 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:09 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:07 jforrester@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:07 jforrester@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 10:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P81243 and previous config saved to /var/cache/conftool/dbconfig/20250813-105542-ladsgroup.json
  • 10:40 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-datanode1001
  • 10:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P81242 and previous config saved to /var/cache/conftool/dbconfig/20250813-104034-ladsgroup.json
  • 10:39 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-datanode1001
  • 10:34 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:33 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming and reprovisioning an-worker1065 as an-backup-datanode1001 - btullis@cumin1003"
  • 10:33 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming and reprovisioning an-worker1065 as an-backup-datanode1001 - btullis@cumin1003"
  • 10:31 moritzm: installing openssl updates on Bookworm
  • 10:31 fabfur: upgrading haproxykafka to v 0.3.14+deb11u2 on A:cp
  • 10:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T400854)', diff saved to https://phabricator.wikimedia.org/P81241 and previous config saved to /var/cache/conftool/dbconfig/20250813-102527-ladsgroup.json
  • 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T400854)', diff saved to https://phabricator.wikimedia.org/P81240 and previous config saved to /var/cache/conftool/dbconfig/20250813-102243-ladsgroup.json
  • 10:22 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T400854)', diff saved to https://phabricator.wikimedia.org/P81239 and previous config saved to /var/cache/conftool/dbconfig/20250813-102232-ladsgroup.json
  • 10:17 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 10:16 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:13 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 10:13 btullis@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:12 moritzm: installing openssl updates on Bookworm
  • 10:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P81238 and previous config saved to /var/cache/conftool/dbconfig/20250813-100724-ladsgroup.json
  • 10:05 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 09:57 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 09:56 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 09:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P81237 and previous config saved to /var/cache/conftool/dbconfig/20250813-095217-ladsgroup.json
  • 09:46 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker1065.eqiad.wmnet
  • 09:46 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:46 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1065.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 09:44 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1065.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 09:41 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 09:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T400854)', diff saved to https://phabricator.wikimedia.org/P81236 and previous config saved to /var/cache/conftool/dbconfig/20250813-093710-ladsgroup.json
  • 09:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T400854)', diff saved to https://phabricator.wikimedia.org/P81235 and previous config saved to /var/cache/conftool/dbconfig/20250813-093423-ladsgroup.json
  • 09:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 09:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T400854)', diff saved to https://phabricator.wikimedia.org/P81234 and previous config saved to /var/cache/conftool/dbconfig/20250813-093401-ladsgroup.json
  • 09:33 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker1065.eqiad.wmnet
  • 09:29 vgutierrez: restarting varnish on cp5017
  • 09:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P81233 and previous config saved to /var/cache/conftool/dbconfig/20250813-091853-ladsgroup.json
  • 09:17 btullis@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts an-worker1065.eqiad.wmnet
  • 09:17 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-worker1065.eqiad.wmnet
  • 09:11 vgutierrez: restarting ATS on cp5017
  • 09:10 urbanecm: Set newprojects mailman list to moderate posts from nonmembers (previous: discard) to debug an issue with new projects announcements (T393444)
  • 09:06 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1054.eqiad.wmnet to cluster eqiad and group A
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti1053.eqiad.wmnet to cluster eqiad and group A
  • 09:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P81231 and previous config saved to /var/cache/conftool/dbconfig/20250813-090346-ladsgroup.json
  • 08:56 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:56 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for nokia switches codfw - cmooney@cumin1003"
  • 08:56 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for nokia switches codfw - cmooney@cumin1003"
  • 08:55 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1053.eqiad.wmnet to cluster eqiad and group A
  • 08:52 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 08:37 btullis@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts an-worker1065.eqiad.wmnet
  • 08:37 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-worker1065.eqiad.wmnet
  • 08:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1053.eqiad.wmnet
  • 08:34 btullis@cumin1003: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts an-worker1065.eqiad.wmnet
  • 08:34 btullis@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts an-worker1065.eqiad.wmnet
  • 08:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P81227 and previous config saved to /var/cache/conftool/dbconfig/20250813-083023-ladsgroup.json
  • 08:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P81226 and previous config saved to /var/cache/conftool/dbconfig/20250813-081516-ladsgroup.json
  • 08:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T400854)', diff saved to https://phabricator.wikimedia.org/P81225 and previous config saved to /var/cache/conftool/dbconfig/20250813-080008-ladsgroup.json
  • 07:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T400854)', diff saved to https://phabricator.wikimedia.org/P81224 and previous config saved to /var/cache/conftool/dbconfig/20250813-075721-ladsgroup.json
  • 07:57 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 07:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T400854)', diff saved to https://phabricator.wikimedia.org/P81223 and previous config saved to /var/cache/conftool/dbconfig/20250813-075658-ladsgroup.json
  • 07:52 fabfur: manually upgrading haproxykafka on cp1111 to test new metrics (T400978)
  • 07:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1012.eqiad.wmnet
  • 07:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
  • 07:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
  • 07:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve1012.eqiad.wmnet
  • 07:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P81222 and previous config saved to /var/cache/conftool/dbconfig/20250813-074150-ladsgroup.json
  • 07:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P81221 and previous config saved to /var/cache/conftool/dbconfig/20250813-072643-ladsgroup.json
  • 07:15 kartik@deploy1003: Finished scap sync-world: Backport for Section Translation: Add Arakan Wikipedia (T392490) (duration: 11m 06s)
  • 07:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T400854)', diff saved to https://phabricator.wikimedia.org/P81220 and previous config saved to /var/cache/conftool/dbconfig/20250813-071135-ladsgroup.json
  • 07:10 kartik@deploy1003: kartik: Continuing with sync
  • 07:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T400854)', diff saved to https://phabricator.wikimedia.org/P81219 and previous config saved to /var/cache/conftool/dbconfig/20250813-070849-ladsgroup.json
  • 07:08 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 07:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T400854)', diff saved to https://phabricator.wikimedia.org/P81218 and previous config saved to /var/cache/conftool/dbconfig/20250813-070826-ladsgroup.json
  • 07:06 kartik@deploy1003: kartik: Backport for Section Translation: Add Arakan Wikipedia (T392490) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:04 kartik@deploy1003: Started scap sync-world: Backport for Section Translation: Add Arakan Wikipedia (T392490)
  • 06:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P81216 and previous config saved to /var/cache/conftool/dbconfig/20250813-065318-ladsgroup.json
  • 06:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P81215 and previous config saved to /var/cache/conftool/dbconfig/20250813-063811-ladsgroup.json
  • 06:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T400854)', diff saved to https://phabricator.wikimedia.org/P81214 and previous config saved to /var/cache/conftool/dbconfig/20250813-062303-ladsgroup.json
  • 06:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T400854)', diff saved to https://phabricator.wikimedia.org/P81213 and previous config saved to /var/cache/conftool/dbconfig/20250813-062018-ladsgroup.json
  • 06:20 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 06:19 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 06:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T400854)', diff saved to https://phabricator.wikimedia.org/P81212 and previous config saved to /var/cache/conftool/dbconfig/20250813-061854-ladsgroup.json
  • 06:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P81211 and previous config saved to /var/cache/conftool/dbconfig/20250813-060347-ladsgroup.json
  • 05:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P81210 and previous config saved to /var/cache/conftool/dbconfig/20250813-054839-ladsgroup.json
  • 05:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T400854)', diff saved to https://phabricator.wikimedia.org/P81209 and previous config saved to /var/cache/conftool/dbconfig/20250813-053330-ladsgroup.json
  • 05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1251 (T400854)', diff saved to https://phabricator.wikimedia.org/P81208 and previous config saved to /var/cache/conftool/dbconfig/20250813-053052-ladsgroup.json
  • 05:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1251.eqiad.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T400854)', diff saved to https://phabricator.wikimedia.org/P81207 and previous config saved to /var/cache/conftool/dbconfig/20250813-052930-ladsgroup.json
  • 05:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P81206 and previous config saved to /var/cache/conftool/dbconfig/20250813-051422-ladsgroup.json
  • 05:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T399249)', diff saved to https://phabricator.wikimedia.org/P81205 and previous config saved to /var/cache/conftool/dbconfig/20250813-051045-fceratto.json
  • 05:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 05:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T399249)', diff saved to https://phabricator.wikimedia.org/P81204 and previous config saved to /var/cache/conftool/dbconfig/20250813-051022-fceratto.json
  • 04:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P81203 and previous config saved to /var/cache/conftool/dbconfig/20250813-045915-ladsgroup.json
  • 04:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P81202 and previous config saved to /var/cache/conftool/dbconfig/20250813-045514-fceratto.json
  • 04:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T400854)', diff saved to https://phabricator.wikimedia.org/P81201 and previous config saved to /var/cache/conftool/dbconfig/20250813-044408-ladsgroup.json
  • 04:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T400854)', diff saved to https://phabricator.wikimedia.org/P81200 and previous config saved to /var/cache/conftool/dbconfig/20250813-044138-ladsgroup.json
  • 04:41 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 04:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T400854)', diff saved to https://phabricator.wikimedia.org/P81199 and previous config saved to /var/cache/conftool/dbconfig/20250813-044115-ladsgroup.json
  • 04:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P81198 and previous config saved to /var/cache/conftool/dbconfig/20250813-044006-fceratto.json
  • 04:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P81197 and previous config saved to /var/cache/conftool/dbconfig/20250813-042607-ladsgroup.json
  • 04:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T399249)', diff saved to https://phabricator.wikimedia.org/P81196 and previous config saved to /var/cache/conftool/dbconfig/20250813-042458-fceratto.json
  • 04:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P81195 and previous config saved to /var/cache/conftool/dbconfig/20250813-041100-ladsgroup.json
  • 03:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P81191 and previous config saved to /var/cache/conftool/dbconfig/20250813-033745-ladsgroup.json
  • 03:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P81190 and previous config saved to /var/cache/conftool/dbconfig/20250813-032237-ladsgroup.json
  • 03:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T400854)', diff saved to https://phabricator.wikimedia.org/P81189 and previous config saved to /var/cache/conftool/dbconfig/20250813-030729-ladsgroup.json
  • 03:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T400854)', diff saved to https://phabricator.wikimedia.org/P81188 and previous config saved to /var/cache/conftool/dbconfig/20250813-030254-ladsgroup.json
  • 03:02 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 03:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T400854)', diff saved to https://phabricator.wikimedia.org/P81187 and previous config saved to /var/cache/conftool/dbconfig/20250813-030231-ladsgroup.json
  • 02:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P81186 and previous config saved to /var/cache/conftool/dbconfig/20250813-024723-ladsgroup.json
  • 02:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P81185 and previous config saved to /var/cache/conftool/dbconfig/20250813-023215-ladsgroup.json
  • 02:28 ejegg: fundraising civicrm upgraded from 13fb0ba8 to 321a17c0
  • 02:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T400854)', diff saved to https://phabricator.wikimedia.org/P81184 and previous config saved to /var/cache/conftool/dbconfig/20250813-021708-ladsgroup.json
  • 02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T400854)', diff saved to https://phabricator.wikimedia.org/P81183 and previous config saved to /var/cache/conftool/dbconfig/20250813-021434-ladsgroup.json
  • 02:14 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T400854)', diff saved to https://phabricator.wikimedia.org/P81182 and previous config saved to /var/cache/conftool/dbconfig/20250813-021411-ladsgroup.json
  • 02:06 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P81181 and previous config saved to /var/cache/conftool/dbconfig/20250813-015904-ladsgroup.json
  • 01:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P81180 and previous config saved to /var/cache/conftool/dbconfig/20250813-014357-ladsgroup.json
  • 01:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T400854)', diff saved to https://phabricator.wikimedia.org/P81179 and previous config saved to /var/cache/conftool/dbconfig/20250813-012849-ladsgroup.json
  • 01:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T400854)', diff saved to https://phabricator.wikimedia.org/P81178 and previous config saved to /var/cache/conftool/dbconfig/20250813-011308-ladsgroup.json
  • 01:13 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 01:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T400854)', diff saved to https://phabricator.wikimedia.org/P81177 and previous config saved to /var/cache/conftool/dbconfig/20250813-011245-ladsgroup.json
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 44s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P81176 and previous config saved to /var/cache/conftool/dbconfig/20250813-005737-ladsgroup.json
  • 00:56 sbassett: Deployed updated security mitigation for T401266
  • 00:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P81175 and previous config saved to /var/cache/conftool/dbconfig/20250813-004230-ladsgroup.json
  • 00:37 sbassett: Deployed security mitigation for T401266
  • 00:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T400854)', diff saved to https://phabricator.wikimedia.org/P81174 and previous config saved to /var/cache/conftool/dbconfig/20250813-002722-ladsgroup.json
  • 00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T400854)', diff saved to https://phabricator.wikimedia.org/P81173 and previous config saved to /var/cache/conftool/dbconfig/20250813-002430-ladsgroup.json
  • 00:24 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T400854)', diff saved to https://phabricator.wikimedia.org/P81172 and previous config saved to /var/cache/conftool/dbconfig/20250813-002407-ladsgroup.json
  • 00:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P81171 and previous config saved to /var/cache/conftool/dbconfig/20250813-000859-ladsgroup.json

2025-08-12

  • 23:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P81170 and previous config saved to /var/cache/conftool/dbconfig/20250812-235351-ladsgroup.json
  • 23:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T400854)', diff saved to https://phabricator.wikimedia.org/P81169 and previous config saved to /var/cache/conftool/dbconfig/20250812-233843-ladsgroup.json
  • 23:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T400854)', diff saved to https://phabricator.wikimedia.org/P81168 and previous config saved to /var/cache/conftool/dbconfig/20250812-233605-ladsgroup.json
  • 23:35 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:35 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T400854)', diff saved to https://phabricator.wikimedia.org/P81167 and previous config saved to /var/cache/conftool/dbconfig/20250812-233524-ladsgroup.json
  • 23:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P81166 and previous config saved to /var/cache/conftool/dbconfig/20250812-232016-ladsgroup.json
  • 23:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P81165 and previous config saved to /var/cache/conftool/dbconfig/20250812-230508-ladsgroup.json
  • 22:51 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 22:51 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 22:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T400854)', diff saved to https://phabricator.wikimedia.org/P81164 and previous config saved to /var/cache/conftool/dbconfig/20250812-225001-ladsgroup.json
  • 22:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T400854)', diff saved to https://phabricator.wikimedia.org/P81163 and previous config saved to /var/cache/conftool/dbconfig/20250812-224717-ladsgroup.json
  • 22:47 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 22:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T400854)', diff saved to https://phabricator.wikimedia.org/P81162 and previous config saved to /var/cache/conftool/dbconfig/20250812-224655-ladsgroup.json
  • 22:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P81161 and previous config saved to /var/cache/conftool/dbconfig/20250812-223147-ladsgroup.json
  • 22:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P81160 and previous config saved to /var/cache/conftool/dbconfig/20250812-221639-ladsgroup.json
  • 22:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T400854)', diff saved to https://phabricator.wikimedia.org/P81159 and previous config saved to /var/cache/conftool/dbconfig/20250812-220132-ladsgroup.json
  • 21:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T400854)', diff saved to https://phabricator.wikimedia.org/P81158 and previous config saved to /var/cache/conftool/dbconfig/20250812-215849-ladsgroup.json
  • 21:58 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 21:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T400854)', diff saved to https://phabricator.wikimedia.org/P81157 and previous config saved to /var/cache/conftool/dbconfig/20250812-215826-ladsgroup.json
  • 21:54 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=search,name=eqiad
  • 21:45 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 21:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P81156 and previous config saved to /var/cache/conftool/dbconfig/20250812-214318-ladsgroup.json
  • 21:42 cjming@deploy1003: Finished scap sync-world: Backport for Revert "madwikisource: set metanamespace, sitename and timezone" (duration: 07m 51s)
  • 21:37 cjming@deploy1003: cjming: Continuing with sync
  • 21:36 cjming@deploy1003: cjming: Backport for Revert "madwikisource: set metanamespace, sitename and timezone" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:34 cjming@deploy1003: Started scap sync-world: Backport for Revert "madwikisource: set metanamespace, sitename and timezone"
  • 21:32 cjming@deploy1003: Finished scap sync-world: Backport for Revert "minwikibooks , zghwiktionary : add project talk namespace aliases" (duration: 07m 48s)
  • 21:32 swfrench@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 21:32 swfrench@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 21:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P81155 and previous config saved to /var/cache/conftool/dbconfig/20250812-212811-ladsgroup.json
  • 21:27 cjming@deploy1003: cjming: Continuing with sync
  • 21:27 cjming@deploy1003: cjming: Backport for Revert "minwikibooks , zghwiktionary : add project talk namespace aliases" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:24 cjming@deploy1003: Started scap sync-world: Backport for Revert "minwikibooks , zghwiktionary : add project talk namespace aliases"
  • 21:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T399249)', diff saved to https://phabricator.wikimedia.org/P81154 and previous config saved to /var/cache/conftool/dbconfig/20250812-212344-fceratto.json
  • 21:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 21:20 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 21:20 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 21:14 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 21:14 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 55 hosts with reason: investigate cluster quorum failure
  • 21:14 andrew@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 21:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T400854)', diff saved to https://phabricator.wikimedia.org/P81153 and previous config saved to /var/cache/conftool/dbconfig/20250812-211303-ladsgroup.json
  • 21:11 cjming@deploy1003: Finished scap sync-world: Backport for minwikibooks , zghwiktionary : add project talk namespace aliases (T399785 T395499) (duration: 10m 31s)
  • 21:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T400854)', diff saved to https://phabricator.wikimedia.org/P81152 and previous config saved to /var/cache/conftool/dbconfig/20250812-211023-ladsgroup.json
  • 21:10 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 21:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T400854)', diff saved to https://phabricator.wikimedia.org/P81151 and previous config saved to /var/cache/conftool/dbconfig/20250812-211001-ladsgroup.json
  • 21:05 cjming@deploy1003: cjming, anzx: Continuing with sync
  • 21:02 cjming@deploy1003: cjming, anzx: Backport for minwikibooks , zghwiktionary : add project talk namespace aliases (T399785 T395499) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:00 cjming@deploy1003: Started scap sync-world: Backport for minwikibooks , zghwiktionary : add project talk namespace aliases (T399785 T395499)
  • 20:59 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 20:57 bking@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=search,name=eqiad
  • 20:55 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 20:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P81150 and previous config saved to /var/cache/conftool/dbconfig/20250812-205453-ladsgroup.json
  • 20:48 cjming@deploy1003: Finished scap sync-world: Backport for madwikisource: set metanamespace, sitename and timezone (T391767) (duration: 10m 02s)
  • 20:42 cjming@deploy1003: cjming, anzx: Continuing with sync
  • 20:40 cjming@deploy1003: cjming, anzx: Backport for madwikisource: set metanamespace, sitename and timezone (T391767) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P81149 and previous config saved to /var/cache/conftool/dbconfig/20250812-203945-ladsgroup.json
  • 20:38 cjming@deploy1003: Started scap sync-world: Backport for madwikisource: set metanamespace, sitename and timezone (T391767)
  • 20:36 cjming@deploy1003: Finished scap sync-world: Backport for zghwiktionary: add logos (T399785) (duration: 08m 42s)
  • 20:30 cjming@deploy1003: cjming, anzx: Continuing with sync
  • 20:29 cjming@deploy1003: cjming, anzx: Backport for zghwiktionary: add logos (T399785) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:27 cjming@deploy1003: Started scap sync-world: Backport for zghwiktionary: add logos (T399785)
  • 20:25 cjming@deploy1003: Finished scap sync-world: Backport for madwikisource: add logo (T391767) (duration: 08m 21s)
  • 20:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T400854)', diff saved to https://phabricator.wikimedia.org/P81148 and previous config saved to /var/cache/conftool/dbconfig/20250812-202437-ladsgroup.json
  • 20:19 cjming@deploy1003: cjming, anzx: Continuing with sync
  • 20:19 cjming@deploy1003: cjming, anzx: Backport for madwikisource: add logo (T391767) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T400854)', diff saved to https://phabricator.wikimedia.org/P81147 and previous config saved to /var/cache/conftool/dbconfig/20250812-201754-ladsgroup.json
  • 20:17 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 20:16 cjming@deploy1003: Started scap sync-world: Backport for madwikisource: add logo (T391767)
  • 20:15 cwhite: remove thanos-query.discovery.wmnet old puppet cert - T401671
  • 20:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T400854)', diff saved to https://phabricator.wikimedia.org/P81146 and previous config saved to /var/cache/conftool/dbconfig/20250812-201358-ladsgroup.json
  • 20:12 dbrant@deploy1003: Finished scap sync-world: Backport for Add app_activity_tab event stream. (T399630) (duration: 08m 41s)
  • 20:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T400854)', diff saved to https://phabricator.wikimedia.org/P81144 and previous config saved to /var/cache/conftool/dbconfig/20250812-201205-ladsgroup.json
  • 20:11 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 20:07 dbrant@deploy1003: dbrant: Continuing with sync
  • 20:06 dbrant@deploy1003: dbrant: Backport for Add app_activity_tab event stream. (T399630) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:04 dbrant@deploy1003: Started scap sync-world: Backport for Add app_activity_tab event stream. (T399630)
  • 19:57 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 19:53 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 19:51 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 19:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 19:49 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 19:41 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 19:28 cmooney@dns2005: END - running authdns-update
  • 19:28 cmooney@dns2005: START - running authdns-update
  • 19:14 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:14 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for nokia switches codfw - cmooney@cumin1003"
  • 19:14 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for nokia switches codfw - cmooney@cumin1003"
  • 19:11 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 18:41 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:37 cmooney@cumin1003: START - Cookbook sre.dns.netbox
  • 18:34 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 18:24 andrew@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 18:13 jhuneidi@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.14 refs T396375
  • 17:30 swfrench@deploy1003: Finished scap sync-world: Backport for image-suggestion: reconfigure for data-gateway listener (T368096) (duration: 22m 28s)
  • 17:25 swfrench@deploy1003: swfrench, eevans: Continuing with sync
  • 17:10 swfrench@deploy1003: swfrench, eevans: Backport for image-suggestion: reconfigure for data-gateway listener (T368096) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:08 swfrench@deploy1003: Started scap sync-world: Backport for image-suggestion: reconfigure for data-gateway listener (T368096)
  • 17:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-namenode1002.eqiad.wmnet with OS bookworm
  • 16:43 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-namenode1002.eqiad.wmnet with reason: host reimage
  • 16:38 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-namenode1002.eqiad.wmnet with reason: host reimage
  • 16:28 dancy@deploy1003: Installation of scap version "4.200.0" completed for 2 hosts
  • 16:26 dancy@deploy1003: Installing scap version "4.200.0" for 2 host(s)
  • 16:23 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2161 to s8 primary T401713', diff saved to https://phabricator.wikimedia.org/P81143 and previous config saved to /var/cache/conftool/dbconfig/20250812-162306-fceratto.json
  • 16:22 federico3: Starting s8 codfw failover from db2165 to db2161 - T401713
  • 16:14 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2161 with weight 0 T401713', diff saved to https://phabricator.wikimedia.org/P81142 and previous config saved to /var/cache/conftool/dbconfig/20250812-161402-fceratto.json
  • 16:13 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 T401713
  • 16:04 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 16:01 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 16:01 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 16:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 15:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 15:58 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-namenode1002.eqiad.wmnet with OS bookworm
  • 15:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 15:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T391056)', diff saved to https://phabricator.wikimedia.org/P81141 and previous config saved to /var/cache/conftool/dbconfig/20250812-155616-fceratto.json
  • 15:51 dancy@deploy1003: Installation of scap version "4.199.0" completed for 2 hosts
  • 15:49 dancy@deploy1003: Installing scap version "4.199.0" for 2 host(s)
  • 15:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P81140 and previous config saved to /var/cache/conftool/dbconfig/20250812-154109-fceratto.json
  • 15:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-backup-namenode1001.eqiad.wmnet with OS bookworm
  • 15:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P81139 and previous config saved to /var/cache/conftool/dbconfig/20250812-152601-fceratto.json
  • 15:24 sukhe: restart varnish-frontend on cp5026
  • 15:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T391056)', diff saved to https://phabricator.wikimedia.org/P81138 and previous config saved to /var/cache/conftool/dbconfig/20250812-151053-fceratto.json
  • 15:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 15:09 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T391056)', diff saved to https://phabricator.wikimedia.org/P81137 and previous config saved to /var/cache/conftool/dbconfig/20250812-150944-fceratto.json
  • 15:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 15:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T399249)', diff saved to https://phabricator.wikimedia.org/P81136 and previous config saved to /var/cache/conftool/dbconfig/20250812-150935-fceratto.json
  • 15:08 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-backup-namenode1001.eqiad.wmnet with reason: host reimage
  • 15:08 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 15:07 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 15:04 godog: tcpdump dhcp traffic capture on cloudnet1005 and cloudnet1006 - T400223
  • 15:02 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on an-backup-namenode1001.eqiad.wmnet with reason: host reimage
  • 15:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 15:01 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 15:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 14:56 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 14:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P81135 and previous config saved to /var/cache/conftool/dbconfig/20250812-145428-fceratto.json
  • 14:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 14:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 14:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 14:43 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 14:42 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 14:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P81134 and previous config saved to /var/cache/conftool/dbconfig/20250812-143920-fceratto.json
  • 14:35 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 14:34 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 14:25 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host an-backup-namenode1001.eqiad.wmnet with OS bookworm
  • 14:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T399249)', diff saved to https://phabricator.wikimedia.org/P81132 and previous config saved to /var/cache/conftool/dbconfig/20250812-142413-fceratto.json
  • 14:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T391056)', diff saved to https://phabricator.wikimedia.org/P81131 and previous config saved to /var/cache/conftool/dbconfig/20250812-142400-fceratto.json
  • 14:17 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1101 to an-backup-namenode1002
  • 14:16 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-namenode1002
  • 14:15 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-namenode1002
  • 14:15 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-namenode1002 on all recursors
  • 14:15 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-namenode1002 on all recursors
  • 14:15 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1101 to an-backup-namenode1002 - btullis@cumin1003"
  • 14:14 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1101 to an-backup-namenode1002 - btullis@cumin1003"
  • 14:11 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:10 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1101 to an-backup-namenode1002
  • 14:10 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1100 to an-backup-namenode1001
  • 14:09 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-backup-namenode1001
  • 14:08 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host an-backup-namenode1001
  • 14:08 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) an-backup-namenode1001 on all recursors
  • 14:08 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache an-backup-namenode1001 on all recursors
  • 14:08 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:08 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1100 to an-backup-namenode1001 - btullis@cumin1003"
  • 14:08 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1100 to an-backup-namenode1001 - btullis@cumin1003"
  • 14:06 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T391056)', diff saved to https://phabricator.wikimedia.org/P81130 and previous config saved to /var/cache/conftool/dbconfig/20250812-140603-fceratto.json
  • 14:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 14:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 14:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T391056)', diff saved to https://phabricator.wikimedia.org/P81129 and previous config saved to /var/cache/conftool/dbconfig/20250812-140523-fceratto.json
  • 14:04 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:04 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1100 to an-backup-namenode1001
  • 14:03 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:01 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: namespaceDupes minwikibooks --fix # T395499
  • 14:01 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: namespaceDupes tlwikisource --fix # T388654
  • 14:00 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 13:59 lucaswerkmeister-wmde@deploy1003: mwscript-k8s job started: namespaceDupes zghwiktionary --fix # T399785
  • 13:59 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 13:58 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1054.eqiad.wmnet
  • 13:56 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 13:55 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 13:54 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for tlwikisource: set timezone (T388654), zghwiktionary: set sitename, timezone & metanamespace (T399785), minwikibooks: set sitename, metanamespace and timezone (T395499), tlwikisource: add author ( Manunulat ) namespace (T388654) (duration: 18m 21s)
  • 13:52 sukhe@dns1004: END - running authdns-update
  • 13:51 sukhe@dns1004: START - running authdns-update
  • 13:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1054.eqiad.wmnet
  • 13:50 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox
  • 13:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P81128 and previous config saved to /var/cache/conftool/dbconfig/20250812-135016-fceratto.json
  • 13:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1053.eqiad.wmnet
  • 13:49 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, anzx: Continuing with sync
  • 13:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1053.eqiad.wmnet
  • 13:38 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, anzx: Backport for tlwikisource: set timezone (T388654), zghwiktionary: set sitename, timezone & metanamespace (T399785), minwikibooks: set sitename, metanamespace and timezone (T395499), tlwikisource: add author ( Manunulat ) namespace (T388654) synced to the testservers (see https://wi
  • 13:36 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for tlwikisource: set timezone (T388654), zghwiktionary: set sitename, timezone & metanamespace (T399785), minwikibooks: set sitename, metanamespace and timezone (T395499), tlwikisource: add author ( Manunulat ) namespace (T388654)
  • 13:35 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1044.eqiad.wmnet with OS bookworm
  • 13:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P81127 and previous config saved to /var/cache/conftool/dbconfig/20250812-133508-fceratto.json
  • 13:23 tchanders@deploy1003: Finished scap sync-world: Backport for Enable temporary accounts for special/non-standard/private wikis (T400672) (duration: 18m 09s)
  • 13:22 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T400854)', diff saved to https://phabricator.wikimedia.org/P81126 and previous config saved to /var/cache/conftool/dbconfig/20250812-132155-ladsgroup.json
  • 13:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T391056)', diff saved to https://phabricator.wikimedia.org/P81125 and previous config saved to /var/cache/conftool/dbconfig/20250812-132001-fceratto.json
  • 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T391056)', diff saved to https://phabricator.wikimedia.org/P81124 and previous config saved to /var/cache/conftool/dbconfig/20250812-131851-fceratto.json
  • 13:18 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 13:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T391056)', diff saved to https://phabricator.wikimedia.org/P81123 and previous config saved to /var/cache/conftool/dbconfig/20250812-131829-fceratto.json
  • 13:16 tchanders@deploy1003: stran, tchanders: Continuing with sync
  • 13:10 tchanders@deploy1003: stran, tchanders: Backport for Enable temporary accounts for special/non-standard/private wikis (T400672) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P81122 and previous config saved to /var/cache/conftool/dbconfig/20250812-130648-ladsgroup.json
  • 13:05 tchanders@deploy1003: Started scap sync-world: Backport for Enable temporary accounts for special/non-standard/private wikis (T400672)
  • 13:05 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough and A:wikidough
  • 13:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P81121 and previous config saved to /var/cache/conftool/dbconfig/20250812-130321-fceratto.json
  • 12:57 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker[1096-1099].eqiad.wmnet
  • 12:57 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:57 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1096-1099].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 12:57 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1096-1099].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 12:54 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 12:53 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 12:53 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 12:53 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 12:52 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 12:52 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox
  • 12:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P81120 and previous config saved to /var/cache/conftool/dbconfig/20250812-125140-ladsgroup.json
  • 12:51 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:50 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough and A:wikidough
  • 12:50 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P81119 and previous config saved to /var/cache/conftool/dbconfig/20250812-124814-fceratto.json
  • 12:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T400854)', diff saved to https://phabricator.wikimedia.org/P81118 and previous config saved to /var/cache/conftool/dbconfig/20250812-123633-ladsgroup.json
  • 12:34 dbrant@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 12:34 dbrant@deploy1003: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 12:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T400854)', diff saved to https://phabricator.wikimedia.org/P81117 and previous config saved to /var/cache/conftool/dbconfig/20250812-123357-ladsgroup.json
  • 12:33 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 12:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T400854)', diff saved to https://phabricator.wikimedia.org/P81116 and previous config saved to /var/cache/conftool/dbconfig/20250812-123334-ladsgroup.json
  • 12:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T391056)', diff saved to https://phabricator.wikimedia.org/P81115 and previous config saved to /var/cache/conftool/dbconfig/20250812-123306-fceratto.json
  • 12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T391056)', diff saved to https://phabricator.wikimedia.org/P81114 and previous config saved to /var/cache/conftool/dbconfig/20250812-123157-fceratto.json
  • 12:31 dbrant@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 12:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 12:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T391056)', diff saved to https://phabricator.wikimedia.org/P81113 and previous config saved to /var/cache/conftool/dbconfig/20250812-123145-fceratto.json
  • 12:31 dbrant@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 12:31 dbrant@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 12:30 dbrant@deploy1003: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 12:30 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker[1096-1099].eqiad.wmnet
  • 12:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P81112 and previous config saved to /var/cache/conftool/dbconfig/20250812-121827-ladsgroup.json
  • 12:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P81111 and previous config saved to /var/cache/conftool/dbconfig/20250812-121638-fceratto.json
  • 12:05 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1044.eqiad.wmnet with OS bookworm
  • 12:04 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P81110 and previous config saved to /var/cache/conftool/dbconfig/20250812-120319-ladsgroup.json
  • 12:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P81109 and previous config saved to /var/cache/conftool/dbconfig/20250812-120131-fceratto.json
  • 11:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 11:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 11:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T400854)', diff saved to https://phabricator.wikimedia.org/P81108 and previous config saved to /var/cache/conftool/dbconfig/20250812-114812-ladsgroup.json
  • 11:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T391056)', diff saved to https://phabricator.wikimedia.org/P81107 and previous config saved to /var/cache/conftool/dbconfig/20250812-114623-fceratto.json
  • 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T400854)', diff saved to https://phabricator.wikimedia.org/P81106 and previous config saved to /var/cache/conftool/dbconfig/20250812-114527-ladsgroup.json
  • 11:45 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 11:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T391056)', diff saved to https://phabricator.wikimedia.org/P81105 and previous config saved to /var/cache/conftool/dbconfig/20250812-114514-fceratto.json
  • 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T400854)', diff saved to https://phabricator.wikimedia.org/P81104 and previous config saved to /var/cache/conftool/dbconfig/20250812-114504-ladsgroup.json
  • 11:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 11:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T391056)', diff saved to https://phabricator.wikimedia.org/P81103 and previous config saved to /var/cache/conftool/dbconfig/20250812-114455-fceratto.json
  • 11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P81102 and previous config saved to /var/cache/conftool/dbconfig/20250812-112956-ladsgroup.json
  • 11:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P81101 and previous config saved to /var/cache/conftool/dbconfig/20250812-112948-fceratto.json
  • 11:29 moritzm: installing gnutls security updates on Bookworm
  • 11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P81100 and previous config saved to /var/cache/conftool/dbconfig/20250812-111449-ladsgroup.json
  • 11:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P81099 and previous config saved to /var/cache/conftool/dbconfig/20250812-111440-fceratto.json
  • 11:04 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.14 refs T396375 (duration: 43m 06s)
  • 10:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T400854)', diff saved to https://phabricator.wikimedia.org/P81098 and previous config saved to /var/cache/conftool/dbconfig/20250812-105941-ladsgroup.json
  • 10:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T391056)', diff saved to https://phabricator.wikimedia.org/P81097 and previous config saved to /var/cache/conftool/dbconfig/20250812-105933-fceratto.json
  • 10:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T391056)', diff saved to https://phabricator.wikimedia.org/P81096 and previous config saved to /var/cache/conftool/dbconfig/20250812-105824-fceratto.json
  • 10:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 10:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T391056)', diff saved to https://phabricator.wikimedia.org/P81095 and previous config saved to /var/cache/conftool/dbconfig/20250812-105801-fceratto.json
  • 10:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T400854)', diff saved to https://phabricator.wikimedia.org/P81094 and previous config saved to /var/cache/conftool/dbconfig/20250812-105657-ladsgroup.json
  • 10:56 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 10:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T400854)', diff saved to https://phabricator.wikimedia.org/P81093 and previous config saved to /var/cache/conftool/dbconfig/20250812-105646-ladsgroup.json
  • 10:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P81092 and previous config saved to /var/cache/conftool/dbconfig/20250812-104254-fceratto.json
  • 10:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P81091 and previous config saved to /var/cache/conftool/dbconfig/20250812-104138-ladsgroup.json
  • 10:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P81090 and previous config saved to /var/cache/conftool/dbconfig/20250812-102746-fceratto.json
  • 10:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P81089 and previous config saved to /var/cache/conftool/dbconfig/20250812-102631-ladsgroup.json
  • 10:21 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.14 refs T396375
  • 10:18 hashar: systemctl start train-presync # T396375
  • 10:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T391056)', diff saved to https://phabricator.wikimedia.org/P81088 and previous config saved to /var/cache/conftool/dbconfig/20250812-101238-fceratto.json
  • 10:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T400854)', diff saved to https://phabricator.wikimedia.org/P81087 and previous config saved to /var/cache/conftool/dbconfig/20250812-101123-ladsgroup.json
  • 10:11 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 12m 12s)
  • 10:10 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T391056)', diff saved to https://phabricator.wikimedia.org/P81086 and previous config saved to /var/cache/conftool/dbconfig/20250812-101029-fceratto.json
  • 10:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 10:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T391056)', diff saved to https://phabricator.wikimedia.org/P81085 and previous config saved to /var/cache/conftool/dbconfig/20250812-101006-fceratto.json
  • 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T400854)', diff saved to https://phabricator.wikimedia.org/P81084 and previous config saved to /var/cache/conftool/dbconfig/20250812-100840-ladsgroup.json
  • 10:08 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 10:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T400854)', diff saved to https://phabricator.wikimedia.org/P81083 and previous config saved to /var/cache/conftool/dbconfig/20250812-100817-ladsgroup.json
  • 10:08 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1020.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1020.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1020.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1020.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1019.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1019.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1019.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1019.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1018.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1018.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1018.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1018.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=nginx,name=ms-fe1017.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/weight=40; selector: service=nginx,name=ms-fe1017.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=swift-fe,name=ms-fe1017.eqiad.wmnet
  • 10:08 mvernon@cumin2002: conftool action : set/weight=40; selector: service=swift-fe,name=ms-fe1017.eqiad.wmnet
  • 10:07 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 10:02 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:01 hashar: systemctl start pretrain # T396375
  • 10:01 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 09:59 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 09:56 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-fe[2017-2020].codfw.wmnet
  • 09:56 mvernon@cumin2002: START - Cookbook sre.hosts.remove-downtime for ms-fe[2017-2020].codfw.wmnet
  • 09:55 hashar: systemctl start pretrain # T396375
  • 09:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P81082 and previous config saved to /var/cache/conftool/dbconfig/20250812-095458-fceratto.json
  • 09:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P81081 and previous config saved to /var/cache/conftool/dbconfig/20250812-095310-ladsgroup.json
  • 09:49 urbanecm@deploy1003: Finished scap sync-world: Backport for Add CommunityConfigurationExample to extension-list (T372049) (duration: 43m 21s)
  • 09:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
  • 09:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P81079 and previous config saved to /var/cache/conftool/dbconfig/20250812-093951-fceratto.json
  • 09:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
  • 09:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P81078 and previous config saved to /var/cache/conftool/dbconfig/20250812-093803-ladsgroup.json
  • 09:28 mvernon@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-fe[2017-2020].codfw.wmnet with reason: reboot
  • 09:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T391056)', diff saved to https://phabricator.wikimedia.org/P81077 and previous config saved to /var/cache/conftool/dbconfig/20250812-092443-fceratto.json
  • 09:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T391056)', diff saved to https://phabricator.wikimedia.org/P81076 and previous config saved to /var/cache/conftool/dbconfig/20250812-092334-fceratto.json
  • 09:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 09:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T391056)', diff saved to https://phabricator.wikimedia.org/P81075 and previous config saved to /var/cache/conftool/dbconfig/20250812-092310-fceratto.json
  • 09:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T400854)', diff saved to https://phabricator.wikimedia.org/P81074 and previous config saved to /var/cache/conftool/dbconfig/20250812-092255-ladsgroup.json
  • 09:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T400854)', diff saved to https://phabricator.wikimedia.org/P81073 and previous config saved to /var/cache/conftool/dbconfig/20250812-092011-ladsgroup.json
  • 09:20 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 09:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T400854)', diff saved to https://phabricator.wikimedia.org/P81072 and previous config saved to /var/cache/conftool/dbconfig/20250812-091948-ladsgroup.json
  • 09:18 vgutierrez@cumin1002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "use a var to match all headers on haproxy - vgutierrez@cumin1002"
  • 09:18 vgutierrez@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: use a var to match all headers on haproxy - vgutierrez@cumin1002
  • 09:18 suzannewoodWMDE2: Finished populateSitesTable for 'zghwiktionary'  https://phabricator.wikimedia.org/T399789
  • 09:17 vgutierrez@cumin1002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: use a var to match all headers on haproxy - vgutierrez@cumin1002
  • 09:17 vgutierrez@cumin1002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "use a var to match all headers on haproxy - vgutierrez@cumin1002"
  • 09:17 zabe: manually insert 'SecurePoll' into zhwiki.content_models # T401641
  • 09:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P81071 and previous config saved to /var/cache/conftool/dbconfig/20250812-090802-fceratto.json
  • 09:06 urbanecm@deploy1003: Started scap sync-world: Backport for Add CommunityConfigurationExample to extension-list (T372049)
  • 09:05 urbanecm@deploy1003: Finished scap sync-world: Backport for Remove centralauth-unmerge from stewards (T400755) (duration: 09m 05s)
  • 09:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P81070 and previous config saved to /var/cache/conftool/dbconfig/20250812-090441-ladsgroup.json
  • 09:00 urbanecm@deploy1003: zabe, urbanecm: Continuing with sync
  • 08:58 urbanecm@deploy1003: zabe, urbanecm: Backport for Remove centralauth-unmerge from stewards (T400755) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:56 urbanecm@deploy1003: Started scap sync-world: Backport for Remove centralauth-unmerge from stewards (T400755)
  • 08:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P81069 and previous config saved to /var/cache/conftool/dbconfig/20250812-085254-fceratto.json
  • 08:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis rkiwiki, minwikibooks, zghwiktionary, madwikisource, tlwikisource in section s5
  • 08:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P81068 and previous config saved to /var/cache/conftool/dbconfig/20250812-084933-ladsgroup.json
  • 08:48 ladsgroup@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis rkiwiki, minwikibooks, zghwiktionary, madwikisource, tlwikisource in section s5
  • 08:46 suzannewoodWMDE2: suzannewood@deploy1003:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 08:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 08:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T391056)', diff saved to https://phabricator.wikimedia.org/P81067 and previous config saved to /var/cache/conftool/dbconfig/20250812-083746-fceratto.json
  • 08:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T391056)', diff saved to https://phabricator.wikimedia.org/P81066 and previous config saved to /var/cache/conftool/dbconfig/20250812-083637-fceratto.json
  • 08:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 08:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T391056)', diff saved to https://phabricator.wikimedia.org/P81065 and previous config saved to /var/cache/conftool/dbconfig/20250812-083603-fceratto.json
  • 08:35 ladsgroup@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 08:35 ladsgroup@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 08:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T400854)', diff saved to https://phabricator.wikimedia.org/P81064 and previous config saved to /var/cache/conftool/dbconfig/20250812-083426-ladsgroup.json
  • 08:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T400854)', diff saved to https://phabricator.wikimedia.org/P81063 and previous config saved to /var/cache/conftool/dbconfig/20250812-083141-ladsgroup.json
  • 08:31 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 08:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T400854)', diff saved to https://phabricator.wikimedia.org/P81062 and previous config saved to /var/cache/conftool/dbconfig/20250812-083119-ladsgroup.json
  • 08:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P81061 and previous config saved to /var/cache/conftool/dbconfig/20250812-082056-fceratto.json
  • 08:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P81060 and previous config saved to /var/cache/conftool/dbconfig/20250812-081611-ladsgroup.json
  • 08:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P81059 and previous config saved to /var/cache/conftool/dbconfig/20250812-080549-fceratto.json
  • 08:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P81058 and previous config saved to /var/cache/conftool/dbconfig/20250812-080104-ladsgroup.json
  • 07:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T391056)', diff saved to https://phabricator.wikimedia.org/P81057 and previous config saved to /var/cache/conftool/dbconfig/20250812-075041-fceratto.json
  • 07:49 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T391056)', diff saved to https://phabricator.wikimedia.org/P81056 and previous config saved to /var/cache/conftool/dbconfig/20250812-074932-fceratto.json
  • 07:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 07:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 07:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T400854)', diff saved to https://phabricator.wikimedia.org/P81055 and previous config saved to /var/cache/conftool/dbconfig/20250812-074556-ladsgroup.json
  • 07:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T400854)', diff saved to https://phabricator.wikimedia.org/P81054 and previous config saved to /var/cache/conftool/dbconfig/20250812-074312-ladsgroup.json
  • 07:43 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 07:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T400854)', diff saved to https://phabricator.wikimedia.org/P81053 and previous config saved to /var/cache/conftool/dbconfig/20250812-074249-ladsgroup.json
  • 07:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P81052 and previous config saved to /var/cache/conftool/dbconfig/20250812-072742-ladsgroup.json
  • 07:17 hashar@deploy1003: Finished deploy [integration/docroot@77c4765]: build: Updating mediawiki/mediawiki-phan-config to 0.17.0 (duration: 00m 13s)
  • 07:17 hashar@deploy1003: Started deploy [integration/docroot@77c4765]: build: Updating mediawiki/mediawiki-phan-config to 0.17.0
  • 07:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P81051 and previous config saved to /var/cache/conftool/dbconfig/20250812-071234-ladsgroup.json
  • 06:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T400854)', diff saved to https://phabricator.wikimedia.org/P81050 and previous config saved to /var/cache/conftool/dbconfig/20250812-065726-ladsgroup.json
  • 06:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T400854)', diff saved to https://phabricator.wikimedia.org/P81049 and previous config saved to /var/cache/conftool/dbconfig/20250812-065443-ladsgroup.json
  • 06:54 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 06:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T400854)', diff saved to https://phabricator.wikimedia.org/P81048 and previous config saved to /var/cache/conftool/dbconfig/20250812-065420-ladsgroup.json
  • 06:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T399249)', diff saved to https://phabricator.wikimedia.org/P81047 and previous config saved to /var/cache/conftool/dbconfig/20250812-064209-fceratto.json
  • 06:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 06:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T399249)', diff saved to https://phabricator.wikimedia.org/P81046 and previous config saved to /var/cache/conftool/dbconfig/20250812-064146-fceratto.json
  • 06:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P81045 and previous config saved to /var/cache/conftool/dbconfig/20250812-063913-ladsgroup.json
  • 06:32 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 06:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P81044 and previous config saved to /var/cache/conftool/dbconfig/20250812-062638-fceratto.json
  • 06:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P81043 and previous config saved to /var/cache/conftool/dbconfig/20250812-062405-ladsgroup.json
  • 06:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P81042 and previous config saved to /var/cache/conftool/dbconfig/20250812-061130-fceratto.json
  • 06:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T400854)', diff saved to https://phabricator.wikimedia.org/P81041 and previous config saved to /var/cache/conftool/dbconfig/20250812-060857-ladsgroup.json
  • 06:07 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1044.eqiad.wmnet with OS bookworm
  • 06:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T400854)', diff saved to https://phabricator.wikimedia.org/P81040 and previous config saved to /var/cache/conftool/dbconfig/20250812-060705-ladsgroup.json
  • 06:06 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 06:06 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 06:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T400854)', diff saved to https://phabricator.wikimedia.org/P81039 and previous config saved to /var/cache/conftool/dbconfig/20250812-060559-ladsgroup.json
  • 05:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T399249)', diff saved to https://phabricator.wikimedia.org/P81038 and previous config saved to /var/cache/conftool/dbconfig/20250812-055623-fceratto.json
  • 05:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P81037 and previous config saved to /var/cache/conftool/dbconfig/20250812-055052-ladsgroup.json
  • 05:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P81036 and previous config saved to /var/cache/conftool/dbconfig/20250812-053544-ladsgroup.json
  • 05:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T400854)', diff saved to https://phabricator.wikimedia.org/P81035 and previous config saved to /var/cache/conftool/dbconfig/20250812-052037-ladsgroup.json
  • 05:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T400854)', diff saved to https://phabricator.wikimedia.org/P81034 and previous config saved to /var/cache/conftool/dbconfig/20250812-051757-ladsgroup.json
  • 05:17 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 05:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T400854)', diff saved to https://phabricator.wikimedia.org/P81033 and previous config saved to /var/cache/conftool/dbconfig/20250812-051735-ladsgroup.json
  • 05:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P81032 and previous config saved to /var/cache/conftool/dbconfig/20250812-050227-ladsgroup.json
  • 04:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P81031 and previous config saved to /var/cache/conftool/dbconfig/20250812-044719-ladsgroup.json
  • 04:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T400854)', diff saved to https://phabricator.wikimedia.org/P81030 and previous config saved to /var/cache/conftool/dbconfig/20250812-043212-ladsgroup.json
  • 04:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T400854)', diff saved to https://phabricator.wikimedia.org/P81029 and previous config saved to /var/cache/conftool/dbconfig/20250812-042937-ladsgroup.json
  • 04:29 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 04:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T400854)', diff saved to https://phabricator.wikimedia.org/P81028 and previous config saved to /var/cache/conftool/dbconfig/20250812-042915-ladsgroup.json
  • 04:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P81027 and previous config saved to /var/cache/conftool/dbconfig/20250812-041408-ladsgroup.json
  • 04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.11 (duration: 04m 19s)
  • 03:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P81026 and previous config saved to /var/cache/conftool/dbconfig/20250812-035900-ladsgroup.json
  • 03:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T400854)', diff saved to https://phabricator.wikimedia.org/P81024 and previous config saved to /var/cache/conftool/dbconfig/20250812-034107-ladsgroup.json
  • 03:41 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 03:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T400854)', diff saved to https://phabricator.wikimedia.org/P81023 and previous config saved to /var/cache/conftool/dbconfig/20250812-034045-ladsgroup.json
  • 03:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P81022 and previous config saved to /var/cache/conftool/dbconfig/20250812-032537-ladsgroup.json
  • 03:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P81021 and previous config saved to /var/cache/conftool/dbconfig/20250812-031029-ladsgroup.json
  • 02:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T400854)', diff saved to https://phabricator.wikimedia.org/P81020 and previous config saved to /var/cache/conftool/dbconfig/20250812-025522-ladsgroup.json
  • 02:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T400854)', diff saved to https://phabricator.wikimedia.org/P81019 and previous config saved to /var/cache/conftool/dbconfig/20250812-025239-ladsgroup.json
  • 02:52 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 02:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T400854)', diff saved to https://phabricator.wikimedia.org/P81018 and previous config saved to /var/cache/conftool/dbconfig/20250812-025216-ladsgroup.json
  • 02:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P81017 and previous config saved to /var/cache/conftool/dbconfig/20250812-023709-ladsgroup.json
  • 02:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P81016 and previous config saved to /var/cache/conftool/dbconfig/20250812-022201-ladsgroup.json
  • 02:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T400854)', diff saved to https://phabricator.wikimedia.org/P81015 and previous config saved to /var/cache/conftool/dbconfig/20250812-020653-ladsgroup.json
  • 02:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T400854)', diff saved to https://phabricator.wikimedia.org/P81014 and previous config saved to /var/cache/conftool/dbconfig/20250812-020403-ladsgroup.json
  • 02:03 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 02:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T400854)', diff saved to https://phabricator.wikimedia.org/P81013 and previous config saved to /var/cache/conftool/dbconfig/20250812-020341-ladsgroup.json
  • 01:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P81012 and previous config saved to /var/cache/conftool/dbconfig/20250812-014833-ladsgroup.json
  • 01:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P81011 and previous config saved to /var/cache/conftool/dbconfig/20250812-013325-ladsgroup.json
  • 01:30 ejegg: payments-wiki upgraded from 0a1084a8 to cd876775
  • 01:26 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1044.eqiad.wmnet with OS bookworm
  • 01:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T400854)', diff saved to https://phabricator.wikimedia.org/P81010 and previous config saved to /var/cache/conftool/dbconfig/20250812-011817-ladsgroup.json
  • 01:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T400854)', diff saved to https://phabricator.wikimedia.org/P81009 and previous config saved to /var/cache/conftool/dbconfig/20250812-011427-ladsgroup.json
  • 01:14 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 01:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T400854)', diff saved to https://phabricator.wikimedia.org/P81008 and previous config saved to /var/cache/conftool/dbconfig/20250812-011403-ladsgroup.json
  • 01:01 eileen: civicrm upgraded from ebb98a9e to 13fb0ba8
  • 00:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P81007 and previous config saved to /var/cache/conftool/dbconfig/20250812-005856-ladsgroup.json
  • 00:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P81006 and previous config saved to /var/cache/conftool/dbconfig/20250812-004349-ladsgroup.json
  • 00:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T400854)', diff saved to https://phabricator.wikimedia.org/P81005 and previous config saved to /var/cache/conftool/dbconfig/20250812-002841-ladsgroup.json
  • 00:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T400854)', diff saved to https://phabricator.wikimedia.org/P81004 and previous config saved to /var/cache/conftool/dbconfig/20250812-002553-ladsgroup.json
  • 00:25 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 00:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T400854)', diff saved to https://phabricator.wikimedia.org/P81003 and previous config saved to /var/cache/conftool/dbconfig/20250812-002530-ladsgroup.json
  • 00:12 dzahn@dns1004: END - running authdns-update
  • 00:11 dzahn@dns1004: START - running authdns-update
  • 00:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P81002 and previous config saved to /var/cache/conftool/dbconfig/20250812-001022-ladsgroup.json

2025-08-11

  • 23:56 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1044.eqiad.wmnet with OS bookworm
  • 23:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P81001 and previous config saved to /var/cache/conftool/dbconfig/20250811-235515-ladsgroup.json
  • 23:44 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1046.eqiad.wmnet with OS bookworm
  • 23:44 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 23:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1068.eqiad.wmnet with OS bookworm
  • 23:43 vriley@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T400854)', diff saved to https://phabricator.wikimedia.org/P81000 and previous config saved to /var/cache/conftool/dbconfig/20250811-234007-ladsgroup.json
  • 23:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T400854)', diff saved to https://phabricator.wikimedia.org/P80999 and previous config saved to /var/cache/conftool/dbconfig/20250811-233712-ladsgroup.json
  • 23:37 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 23:36 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T400854)', diff saved to https://phabricator.wikimedia.org/P80998 and previous config saved to /var/cache/conftool/dbconfig/20250811-233545-ladsgroup.json
  • 23:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P80997 and previous config saved to /var/cache/conftool/dbconfig/20250811-232038-ladsgroup.json
  • 23:19 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 23:11 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P80996 and previous config saved to /var/cache/conftool/dbconfig/20250811-230530-ladsgroup.json
  • 22:59 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1046.eqiad.wmnet with reason: host reimage
  • 22:56 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1046.eqiad.wmnet with reason: host reimage
  • 22:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T400854)', diff saved to https://phabricator.wikimedia.org/P80995 and previous config saved to /var/cache/conftool/dbconfig/20250811-225023-ladsgroup.json
  • 22:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T400854)', diff saved to https://phabricator.wikimedia.org/P80994 and previous config saved to /var/cache/conftool/dbconfig/20250811-224841-ladsgroup.json
  • 22:48 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 22:48 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 22:47 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:46 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1044
  • 22:45 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1044
  • 22:44 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:44 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1044 - vriley@cumin1002"
  • 22:44 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1044 - vriley@cumin1002"
  • 22:41 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 22:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis rkiwiki, minwikibooks, zghwiktionary, madwikisource, tlwikisource in section s5
  • 22:34 zabe@deploy1003: Finished scap sync-world: Backport for Update interwiki cache (duration: 08m 08s)
  • 22:31 ladsgroup@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis rkiwiki, minwikibooks, zghwiktionary, madwikisource, tlwikisource in section s5
  • 22:28 zabe@deploy1003: zabe: Continuing with sync
  • 22:27 zabe@deploy1003: zabe: Backport for Update interwiki cache synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:25 zabe@deploy1003: Started scap sync-world: Backport for Update interwiki cache
  • 22:25 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1046.eqiad.wmnet with OS bookworm
  • 22:24 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:20 zabe@deploy1003: Finished scap sync-world: Backport for Activate zghwiktionary (T399684) (duration: 07m 54s)
  • 22:15 zabe@deploy1003: zabe: Continuing with sync
  • 22:15 zabe@deploy1003: zabe: Backport for Activate zghwiktionary (T399684) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:12 zabe@deploy1003: Started scap sync-world: Backport for Activate zghwiktionary (T399684)
  • 22:12 zabe@deploy1003: Finished scap sync-world: Backport for Activate minwikibooks (T395452) (duration: 08m 27s)
  • 22:11 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:11 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 22:08 zabe@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 22:08 zabe@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 22:07 zabe@deploy1003: zabe: Continuing with sync
  • 22:06 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T399249)', diff saved to https://phabricator.wikimedia.org/P80993 and previous config saved to /var/cache/conftool/dbconfig/20250811-220650-fceratto.json
  • 22:06 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 22:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T399249)', diff saved to https://phabricator.wikimedia.org/P80992 and previous config saved to /var/cache/conftool/dbconfig/20250811-220627-fceratto.json
  • 22:06 zabe@deploy1003: zabe: Backport for Activate minwikibooks (T395452) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 22:04 zabe@deploy1003: Started scap sync-world: Backport for Activate minwikibooks (T395452)
  • 22:00 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:52 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P80991 and previous config saved to /var/cache/conftool/dbconfig/20250811-215120-fceratto.json
  • 21:39 sbassett@deploy1003: Finished scap sync-world: Security deployments (duration: 02m 18s)
  • 21:37 sbassett@deploy1003: Started scap sync-world: Security deployments
  • 21:36 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 21:36 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1046
  • 21:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P80990 and previous config saved to /var/cache/conftool/dbconfig/20250811-213612-fceratto.json
  • 21:32 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1046
  • 21:32 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:32 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1046 - vriley@cumin1002"
  • 21:32 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1046 - vriley@cumin1002"
  • 21:29 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 21:27 sbassett: Deployed security fix for T399627 (#2)
  • 21:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T399249)', diff saved to https://phabricator.wikimedia.org/P80989 and previous config saved to /var/cache/conftool/dbconfig/20250811-212105-fceratto.json
  • 21:19 sbassett: Deployed security fix for T397580
  • 21:03 zabe@deploy1003: Finished scap sync-world: Backport for Activate rkiwiki (T392490) (duration: 08m 54s)
  • 20:57 zabe@deploy1003: zabe: Continuing with sync
  • 20:56 zabe@deploy1003: zabe: Backport for Activate rkiwiki (T392490) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:54 zabe@deploy1003: Started scap sync-world: Backport for Activate rkiwiki (T392490)
  • 20:50 zabe@deploy1003: Started scap sync-world: Backport for Activate rkiwiki (T392490)
  • 20:27 zabe@deploy1003: Finished scap sync-world: Backport for Initial configuration for rkiwiki (T392490), Initial configuration for minwikibooks (T395452), Initial configuration for zghwiktionary (T399684) (duration: 08m 42s)
  • 20:21 zabe@deploy1003: zabe: Continuing with sync
  • 20:20 zabe@deploy1003: zabe: Backport for Initial configuration for rkiwiki (T392490), Initial configuration for minwikibooks (T395452), Initial configuration for zghwiktionary (T399684) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:18 zabe@deploy1003: Started scap sync-world: Backport for Initial configuration for rkiwiki (T392490), Initial configuration for minwikibooks (T395452), Initial configuration for zghwiktionary (T399684)
  • 20:16 zabe@deploy1003: Started scap sync-world: Backport for Initial configuration for rkiwiki (T392490), Initial configuration for minwikibooks (T395452), Initial configuration for zghwiktionary (T399684)
  • 20:02 zabe@deploy1003: Finished scap sync-world: Backport for Activate madwikisource (T391747) (duration: 08m 14s)
  • 19:57 jhancock@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host es2049.codfw.wmnet with OS bookworm
  • 19:57 zabe@deploy1003: zabe: Continuing with sync
  • 19:56 zabe@deploy1003: zabe: Backport for Activate madwikisource (T391747) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:54 zabe@deploy1003: Started scap sync-world: Backport for Activate madwikisource (T391747)
  • 19:48 zabe@deploy1003: Finished scap sync-world: Backport for Initial configuration for madwikisource (T391747) (duration: 07m 48s)
  • 19:43 zabe@deploy1003: zabe: Continuing with sync
  • 19:42 zabe@deploy1003: zabe: Backport for Initial configuration for madwikisource (T391747) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 19:40 zabe@deploy1003: Started scap sync-world: Backport for Initial configuration for madwikisource (T391747)
  • 19:24 ejegg: payments-wiki rolled back from d744ca5c to 0a1084a8
  • 19:16 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 19:15 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 19:15 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 18:54 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1047.eqiad.wmnet with reason: host reimage
  • 18:50 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1047.eqiad.wmnet with reason: host reimage
  • 18:37 jhancock@cumin1002: START - Cookbook sre.hosts.reimage for host es2049.codfw.wmnet with OS bookworm
  • 18:21 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 18:21 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:17 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:16 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1047
  • 18:16 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1047
  • 18:15 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:12 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 17:28 jhancock@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['es2049']
  • 17:28 jhancock@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['es2049']
  • 17:27 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:23 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host es2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:20 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve2009.codfw.wmnet
  • 17:20 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve2009.codfw.wmnet
  • 16:46 jgreen@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:46 jgreen@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove host frdb1003.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1002"
  • 16:46 krinkle@deploy1003: Finished scap sync-world: Backport for tests: Improve false-positive testOnlyExistingWikis, WmfConfig: Document why 'preinstall' is indexed, manage-dblist: Remove mention of non-existant "preinstall-labs" (duration: 14m 26s)
  • 16:46 jgreen@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove host frdb1003.frack.eqiad.wmnet from DNS for decommissioning - jgreen@cumin1002"
  • 16:42 jgreen@cumin1002: START - Cookbook sre.dns.netbox
  • 16:40 krinkle@deploy1003: krinkle: Continuing with sync
  • 16:33 krinkle@deploy1003: krinkle: Backport for tests: Improve false-positive testOnlyExistingWikis, WmfConfig: Document why 'preinstall' is indexed, manage-dblist: Remove mention of non-existant "preinstall-labs" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:31 krinkle@deploy1003: Started scap sync-world: Backport for tests: Improve false-positive testOnlyExistingWikis, WmfConfig: Document why 'preinstall' is indexed, manage-dblist: Remove mention of non-existant "preinstall-labs"
  • 16:29 krinkle@deploy1003: Finished scap sync-world: Backport for Disable MobileFrontend on thankyou.wikipedia.org and nostalgia.wikipedia.org (T400855 T152882) (duration: 09m 52s)
  • 16:26 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2020.codfw.wmnet with OS bullseye
  • 16:26 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 16:26 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 16:24 krinkle@deploy1003: krinkle: Continuing with sync
  • 16:21 krinkle@deploy1003: krinkle: Backport for Disable MobileFrontend on thankyou.wikipedia.org and nostalgia.wikipedia.org (T400855 T152882) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:19 krinkle@deploy1003: Started scap sync-world: Backport for Disable MobileFrontend on thankyou.wikipedia.org and nostalgia.wikipedia.org (T400855 T152882)
  • 16:11 cdanis@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "feat: haproxy allhdrs - cdanis@cumin1003"
  • 16:11 cdanis@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: haproxy allhdrs - cdanis@cumin1003
  • 16:10 cdanis@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: feat: haproxy allhdrs - cdanis@cumin1003
  • 16:10 cdanis@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "feat: haproxy allhdrs - cdanis@cumin1003"
  • 16:02 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2020.codfw.wmnet with reason: host reimage
  • 16:01 jhancock@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host es2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 16:01 ejegg: payments-wiki upgraded from 0a1084a8 to d744ca5c
  • 15:52 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2020.codfw.wmnet with reason: host reimage
  • 15:52 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:52 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:49 jhancock@cumin1002: START - Cookbook sre.hosts.provision for host es2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:46 jhancock@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host es2049
  • 15:46 jhancock@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host es2049
  • 15:45 jhancock@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:45 jhancock@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2049 to codfw - jhancock@cumin1002"
  • 15:45 jhancock@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding es2049 to codfw - jhancock@cumin1002"
  • 15:45 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:45 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:44 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:43 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:43 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:43 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve2009.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
  • 15:42 jhancock@cumin1002: START - Cookbook sre.dns.netbox
  • 15:35 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe2020.codfw.wmnet with OS bullseye
  • 15:30 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve2009.codfw.wmnet
  • 15:30 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve2009.codfw.wmnet
  • 15:29 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve2004.codfw.wmnet
  • 15:29 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve2004.codfw.wmnet
  • 15:28 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:24 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:23 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:19 moritzm: installing node-form-data security updates
  • 15:12 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:09 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve2004.codfw.wmnet
  • 15:09 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve2004.codfw.wmnet
  • 15:06 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve2003.codfw.wmnet
  • 15:06 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve2003.codfw.wmnet
  • 15:01 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:55 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1045.eqiad.wmnet with OS bookworm
  • 14:52 brouberol: kafka-jumbo1007->9 are now decommissioned - T397447
  • 14:52 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1009.eqiad.wmnet
  • 14:52 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:52 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1009.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 14:51 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1009.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 14:47 brouberol@cumin1003: START - Cookbook sre.dns.netbox
  • 14:46 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 14:41 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1009.eqiad.wmnet
  • 14:39 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1008.eqiad.wmnet
  • 14:39 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:39 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1008.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 14:39 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1008.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 14:36 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve2003.codfw.wmnet
  • 14:36 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve2003.codfw.wmnet
  • 14:35 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve2002.codfw.wmnet
  • 14:35 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve2002.codfw.wmnet
  • 14:35 brouberol@cumin1003: START - Cookbook sre.dns.netbox
  • 14:34 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:34 sukhe@dns1004: END - running authdns-update
  • 14:33 sukhe@dns1004: START - running authdns-update
  • 14:27 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1008.eqiad.wmnet
  • 14:27 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-jumbo1007.eqiad.wmnet
  • 14:27 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:27 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1007.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 14:26 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-jumbo1007.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 14:23 brouberol@cumin1003: START - Cookbook sre.dns.netbox
  • 14:22 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 14:20 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve2002.codfw.wmnet
  • 14:20 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve2002.codfw.wmnet
  • 14:20 klausman@cumin1003: END (ERROR) - Cookbook sre.k8s.pool-depool-node (exit_code=97) depool for host ml-serve2002.codfw.wmnet
  • 14:17 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts kafka-jumbo1007.eqiad.wmnet
  • 14:10 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve2002.codfw.wmnet
  • 14:08 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host ml-serve2001.codfw.wmnet
  • 14:08 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host ml-serve2001.codfw.wmnet
  • 14:02 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:02 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-serve2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 13:55 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 13:53 klausman@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host ml-serve2001.codfw.wmnet
  • 13:53 klausman@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host ml-serve2001.codfw.wmnet
  • 13:53 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:41 phuedx@deploy1003: Finished scap sync-world: Backport for Analytics - Refine eventlogging_MediaWikiPingback (T369845) (duration: 08m 16s)
  • 13:36 phuedx@deploy1003: aqu, phuedx: Continuing with sync
  • 13:35 phuedx@deploy1003: aqu, phuedx: Backport for Analytics - Refine eventlogging_MediaWikiPingback (T369845) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:33 phuedx@deploy1003: Started scap sync-world: Backport for Analytics - Refine eventlogging_MediaWikiPingback (T369845)
  • 13:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T399249)', diff saved to https://phabricator.wikimedia.org/P80986 and previous config saved to /var/cache/conftool/dbconfig/20250811-133201-fceratto.json
  • 13:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 13:30 stran@deploy1003: Finished scap sync-world: Backport for Defer to * group for per-wiki temp account permissions (T400672) (duration: 15m 12s)
  • 13:25 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1045.eqiad.wmnet with OS bookworm
  • 13:24 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:22 stran@deploy1003: stran: Continuing with sync
  • 13:20 stran@deploy1003: stran: Backport for Defer to * group for per-wiki temp account permissions (T400672) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:14 stran@deploy1003: Started scap sync-world: Backport for Defer to * group for per-wiki temp account permissions (T400672)
  • 13:12 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:11 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 13:10 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:06 stran@deploy1003: Started scap sync-world: Backport for Defer to * group for per-wiki temp account permissions (T400672)
  • 13:06 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:05 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 13:04 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 13:02 moritzm: installing libcommons-lang-java security updates
  • 13:02 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 13:02 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 13:01 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 13:01 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 13:01 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1045.eqiad.wmnet with OS bookworm
  • 12:49 zabe@deploy1003: Finished scap sync-world: Backport for Stop writing to cl_to and cl_collation on small wikis (T399579) (duration: 43m 30s)
  • 12:48 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:40 hashar@deploy1003: Finished deploy [gerrit/gerrit@7d55b4f]: build: upgrade QUnit (duration: 00m 12s)
  • 12:39 hashar@deploy1003: Started deploy [gerrit/gerrit@7d55b4f]: build: upgrade QUnit
  • 12:36 zabe@deploy1003: zabe: Continuing with sync
  • 12:34 zabe@deploy1003: zabe: Backport for Stop writing to cl_to and cl_collation on small wikis (T399579) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:15 hashar@deploy1003: Finished deploy [integration/docroot@1c2af1f]: build: Upgrade eslint-config-wikimedia to 0.31.0 (duration: 00m 13s)
  • 12:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 12:15 hashar@deploy1003: Started deploy [integration/docroot@1c2af1f]: build: Upgrade eslint-config-wikimedia to 0.31.0
  • 12:05 zabe@deploy1003: Started scap sync-world: Backport for Stop writing to cl_to and cl_collation on small wikis (T399579)
  • 11:53 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:46 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
  • 11:33 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1045.eqiad.wmnet with OS bookworm
  • 11:32 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 11:31 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003
  • 11:25 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1046.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:24 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1046
  • 11:24 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1046
  • 11:23 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:22 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:22 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt clouddb1026 - vriley@cumin1002"
  • 11:22 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt clouddb1026 - vriley@cumin1002"
  • 11:19 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 11:13 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Enable Java security updates - klausman@cumin1003
  • 11:10 klausman@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003
  • 11:09 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:06 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 11:01 moritzm: installing djvulibre security updates
  • 10:59 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:55 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:52 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 10:52 klausman@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Enable Java security updates - klausman@cumin1003
  • 10:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 10:49 taavi: manually built first trixie docker image T393173
  • 10:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 10:44 moritzm: installing batik security updates
  • 10:32 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1047.eqiad.wmnet with OS bookworm
  • 10:30 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1045.eqiad.wmnet with OS bookworm
  • 10:28 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 10:24 brouberol@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 10:23 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 10:22 brouberol@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 10:21 brouberol@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 10:20 brouberol@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 10:20 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Java updates - jmm@cumin2002
  • 10:20 brouberol@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 10:14 brouberol@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:14 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:13 brouberol@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:13 brouberol@deploy1003: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:10 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 10:08 brouberol@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 10:07 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 10:06 brouberol@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 10:05 brouberol@deploy1003: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 10:04 brouberol@deploy1003: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 10:04 brouberol@deploy1003: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 10:04 brouberol: redeploying eventstreams-internal to remove references to soon-to-be decommissioned brokers - T397447
  • 10:01 brouberol@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 10:00 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Java updates - jmm@cumin2002
  • 09:58 brouberol@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 09:57 brouberol@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:57 brouberol@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 09:57 brouberol: redeploying eventgate-analytics to remove references to soon-to-be decommissioned brokers - T397447
  • 09:55 brouberol@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 09:55 brouberol@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 09:46 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis tlwikisource in section s5
  • 09:38 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
  • 09:38 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
  • 09:38 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
  • 09:37 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 09:37 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 09:36 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 09:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 09:35 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 09:34 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tlwikisource in section s5
  • 09:32 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Managing sanitization for wikis tlwikisource in section s5
  • 09:31 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 09:31 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 09:29 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:28 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:27 cgoubert@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 09:27 cgoubert@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 09:25 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Managing sanitization for wikis tlwikisource in section s5
  • 09:25 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.sanitize-wiki (exit_code=0) Checking sanitization for wikis tlwikisource in section s5
  • 09:20 fceratto@cumin1002: START - Cookbook sre.mysql.sanitize-wiki Checking sanitization for wikis tlwikisource in section s5
  • 09:13 joelyrookewmde: Finished populateSitesTable for tlwikisource [as per T388658]
  • 09:09 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 09:04 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 08:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2229.codfw.wmnet with reason: Maintenance
  • 08:50 joelyrookewmde: joelyrookewmde@deploy1003:~$ foreachwikiindblist wikidataclient extensions/Wikibase/lib/maintenance/populateSitesTable.php --force-protocol https
  • 08:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1013.eqiad.wmnet
  • 08:36 vgutierrez: reducing haproxykafka socket batch deadline to 500ms - T400039
  • 08:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve1013.eqiad.wmnet
  • 08:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2088.codfw.wmnet with OS bullseye
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1012.eqiad.wmnet
  • 08:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-serve1012.eqiad.wmnet
  • 08:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1003.eqiad.wmnet
  • 08:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage
  • 08:10 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage
  • 08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1003.eqiad.wmnet
  • 07:56 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye
  • 07:54 moritzm: installing openjdk-11 security updates
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 43s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-08-10

  • 08:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 08:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 28s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-08-09

  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 23s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-08-08

  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 12m 11s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:19 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1047.eqiad.wmnet with OS bullseye
  • 00:03 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED

2025-08-07

  • 23:38 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1047.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 23:38 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1047
  • 23:37 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1047
  • 23:37 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:37 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1047 - vriley@cumin1002"
  • 23:37 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1047 - vriley@cumin1002"
  • 23:33 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 22:59 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 22:59 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 22:58 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 22:38 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1042.eqiad.wmnet with reason: host reimage
  • 22:34 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1042.eqiad.wmnet with reason: host reimage
  • 22:15 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 22:15 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 22:09 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2018.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 21:16 jgleeson: payments-wiki upgraded from 0ab5bab9 to 0a1084a8
  • 21:15 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2018.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 21:15 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1016.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:14 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 21:01 swfrench@deploy1003: Finished scap sync-world: No-op deployment to clear chart version diffs from https://gerrit.wikimedia.org/r/1176543 (duration: 02m 45s)
  • 20:58 swfrench@deploy1003: Started scap sync-world: No-op deployment to clear chart version diffs from https://gerrit.wikimedia.org/r/1176543
  • 20:30 cjming@deploy1003: Finished scap sync-world: Backport for Update PageVisit instruments for a logged-in synth experiment (T397140) (duration: 07m 34s)
  • 20:25 cjming@deploy1003: cjming: Continuing with sync
  • 20:24 cjming@deploy1003: cjming: Backport for Update PageVisit instruments for a logged-in synth experiment (T397140) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:23 cjming@deploy1003: Started scap sync-world: Backport for Update PageVisit instruments for a logged-in synth experiment (T397140)
  • 20:10 cjming@deploy1003: Finished scap sync-world: Backport for XLab/Hooks: Only fetch experiment configs when user is registered (duration: 08m 05s)
  • 20:05 cjming@deploy1003: cjming: Continuing with sync
  • 20:04 cjming@deploy1003: cjming: Backport for XLab/Hooks: Only fetch experiment configs when user is registered synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:02 cjming@deploy1003: Started scap sync-world: Backport for XLab/Hooks: Only fetch experiment configs when user is registered
  • 20:01 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe2020.codfw.wmnet with OS bullseye
  • 19:55 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 19:28 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2019.codfw.wmnet with OS bullseye
  • 19:28 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 19:28 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 19:07 rzl@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
  • 19:07 rzl@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
  • 19:03 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2019.codfw.wmnet with reason: host reimage
  • 18:57 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2019.codfw.wmnet with reason: host reimage
  • 18:50 cjming@deploy1003: mwscript-k8s job started: extensions/MetricsPlatform/maintenance/UpdateConfigs.php --wiki aawiki # Test run for T398422
  • 18:41 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe2020.codfw.wmnet with OS bullseye
  • 18:41 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe2019.codfw.wmnet with OS bullseye
  • 18:24 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1015.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 18:23 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2022.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 18:16 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 18:07 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:59 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:59 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2019.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:54 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 17:32 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe2019.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:29 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2017.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 17:27 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2022.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:26 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1015.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 17:11 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:08 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 16:51 dancy@deploy1003: Installation of scap version "4.198.0" completed for 2 hosts
  • 16:49 dancy@deploy1003: Installing scap version "4.198.0" for 2 host(s)
  • 16:26 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host deploy2003.codfw.wmnet with OS bookworm
  • 16:03 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2018.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:57 zabe@deploy1003: Finished scap sync-world: update interwiki cache (duration: 07m 29s)
  • 15:56 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe2018.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:55 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe2017.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:55 jhancock@cumin1003: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host ms-fe2017.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:55 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe2017.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 15:50 zabe@deploy1003: Started scap sync-world: update interwiki cache
  • 15:49 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2018.codfw.wmnet with OS bullseye
  • 15:49 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 15:49 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 15:46 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2017.codfw.wmnet with OS bullseye
  • 15:46 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 15:46 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 15:44 zabe@deploy1003: Finished scap sync-world: Backport for Activate tlwikisource (T388639) (duration: 07m 47s)
  • 15:38 zabe@deploy1003: zabe: Continuing with sync
  • 15:38 zabe@deploy1003: zabe: Backport for Activate tlwikisource (T388639) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:36 zabe@deploy1003: Started scap sync-world: Backport for Activate tlwikisource (T388639)
  • 15:33 zabe: Create Wikisource Tagalog # T388639
  • 15:31 zabe@deploy1003: Finished scap sync-world: Backport for Initial configuration for tlwikisource (T388639) (duration: 07m 45s)
  • 15:31 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2018.codfw.wmnet with reason: host reimage
  • 15:28 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2017.codfw.wmnet with reason: host reimage
  • 15:26 zabe@deploy1003: zabe: Continuing with sync
  • 15:26 zabe@deploy1003: zabe: Backport for Initial configuration for tlwikisource (T388639) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 15:24 zabe@deploy1003: Started scap sync-world: Backport for Initial configuration for tlwikisource (T388639)
  • 15:23 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2018.codfw.wmnet with reason: host reimage
  • 15:23 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2017.codfw.wmnet with reason: host reimage
  • 15:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1263.eqiad.wmnet with OS bookworm
  • 15:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1262.eqiad.wmnet with OS bookworm
  • 15:10 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:08 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe2018.codfw.wmnet with OS bullseye
  • 15:08 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe2017.codfw.wmnet with OS bullseye
  • 15:07 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host deploy2003.codfw.wmnet with OS bookworm
  • 15:07 jhancock@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['deploy2003']
  • 15:06 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['deploy2003']
  • 15:05 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2019.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1261.eqiad.wmnet with OS bookworm
  • 15:05 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:02 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe2019.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1260.eqiad.wmnet with OS bookworm
  • 15:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:01 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:59 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe2019.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:59 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:58 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2018.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:58 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2017.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:58 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host deploy2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:55 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:55 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe2019.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1263.eqiad.wmnet with reason: host reimage
  • 14:55 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe2018.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:55 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe2017.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:54 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host deploy2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 14:53 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2020
  • 14:53 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2019
  • 14:53 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2018
  • 14:53 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2017
  • 14:53 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host deploy2003
  • 14:53 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2020
  • 14:53 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2019
  • 14:53 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2018
  • 14:53 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2017
  • 14:53 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host deploy2003
  • 14:52 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:52 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2020 to codfw - jhancock@cumin1003"
  • 14:52 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2020 to codfw - jhancock@cumin1003"
  • 14:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1262.eqiad.wmnet with reason: host reimage
  • 14:49 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 14:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1261.eqiad.wmnet with reason: host reimage
  • 14:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1260.eqiad.wmnet with reason: host reimage
  • 14:41 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1263.eqiad.wmnet with reason: host reimage
  • 14:41 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1262.eqiad.wmnet with reason: host reimage
  • 14:41 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1261.eqiad.wmnet with reason: host reimage
  • 14:41 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1260.eqiad.wmnet with reason: host reimage
  • 14:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1263.eqiad.wmnet with OS bookworm
  • 14:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1262.eqiad.wmnet with OS bookworm
  • 14:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1261.eqiad.wmnet with OS bookworm
  • 14:24 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1260.eqiad.wmnet with OS bookworm
  • 14:24 zabe@deploy1003: Finished scap sync-world: Backport for Do not create a database table when a different provider is used (T397348), Do not create a database table when a different provider is used (T397348) (duration: 07m 54s)
  • 14:18 zabe@deploy1003: zabe: Continuing with sync
  • 14:18 zabe@deploy1003: zabe: Backport for Do not create a database table when a different provider is used (T397348), Do not create a database table when a different provider is used (T397348) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:16 zabe@deploy1003: Started scap sync-world: Backport for Do not create a database table when a different provider is used (T397348), Do not create a database table when a different provider is used (T397348)
  • 14:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1263.eqiad.wmnet with OS bookworm
  • 14:02 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1262.eqiad.wmnet with OS bookworm
  • 14:01 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1091.eqiad.wmnet with OS bullseye
  • 13:49 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2248.codfw.wmnet with OS bookworm
  • 13:49 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 13:47 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 13:44 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1091.eqiad.wmnet with reason: host reimage
  • 13:41 mvernon@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1091.eqiad.wmnet with reason: host reimage
  • 13:30 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2248.codfw.wmnet with reason: host reimage
  • 13:29 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1091.eqiad.wmnet with OS bullseye
  • 13:28 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:26 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2248.codfw.wmnet with reason: host reimage
  • 13:16 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Enable wgParserEnableUserLanguage for incubatorwiki (duration: 09m 37s)
  • 13:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1263.eqiad.wmnet with OS bookworm
  • 13:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1262.eqiad.wmnet with OS bookworm
  • 13:11 lucaswerkmeister-wmde@deploy1003: jhsoby, lucaswerkmeister-wmde: Continuing with sync
  • 13:09 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host db2248.codfw.wmnet with OS bookworm
  • 13:09 lucaswerkmeister-wmde@deploy1003: jhsoby, lucaswerkmeister-wmde: Backport for Enable wgParserEnableUserLanguage for incubatorwiki synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:07 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2248.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:07 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Enable wgParserEnableUserLanguage for incubatorwiki
  • 13:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1262.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 13:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1263.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:55 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2247.codfw.wmnet with OS bookworm
  • 12:55 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 12:55 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host db2248.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:54 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 12:54 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2246.codfw.wmnet with OS bookworm
  • 12:54 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 12:51 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:49 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 12:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 12:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T399728)', diff saved to https://phabricator.wikimedia.org/P80972 and previous config saved to /var/cache/conftool/dbconfig/20250807-124728-fceratto.json
  • 12:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1260.eqiad.wmnet with OS bookworm
  • 12:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1261.eqiad.wmnet with OS bookworm
  • 12:45 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2245.codfw.wmnet with OS bookworm
  • 12:45 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 12:45 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 12:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1261.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1260.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:37 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1263.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:37 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2247.codfw.wmnet with reason: host reimage
  • 12:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1263.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P80970 and previous config saved to /var/cache/conftool/dbconfig/20250807-123220-fceratto.json
  • 12:32 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2246.codfw.wmnet with reason: host reimage
  • 12:28 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2245.codfw.wmnet with reason: host reimage
  • 12:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1262.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1263.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:24 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2247.codfw.wmnet with reason: host reimage
  • 12:24 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2246.codfw.wmnet with reason: host reimage
  • 12:24 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2245.codfw.wmnet with reason: host reimage
  • 12:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1262.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1263.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P80969 and previous config saved to /var/cache/conftool/dbconfig/20250807-121712-fceratto.json
  • 12:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1262.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1263.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1261.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1260.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:13 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:13 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for db1260-3 - jclark@cumin1002"
  • 12:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for db1260-3 - jclark@cumin1002"
  • 12:08 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host db2247.codfw.wmnet with OS bookworm
  • 12:07 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host db2246.codfw.wmnet with OS bookworm
  • 12:07 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host db2245.codfw.wmnet with OS bookworm
  • 12:05 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T399728)', diff saved to https://phabricator.wikimedia.org/P80967 and previous config saved to /var/cache/conftool/dbconfig/20250807-120205-fceratto.json
  • 11:56 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T399728)', diff saved to https://phabricator.wikimedia.org/P80966 and previous config saved to /var/cache/conftool/dbconfig/20250807-115646-fceratto.json
  • 11:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:56 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 11:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T399728)', diff saved to https://phabricator.wikimedia.org/P80965 and previous config saved to /var/cache/conftool/dbconfig/20250807-115606-fceratto.json
  • 11:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P80964 and previous config saved to /var/cache/conftool/dbconfig/20250807-114058-fceratto.json
  • 11:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P80963 and previous config saved to /var/cache/conftool/dbconfig/20250807-112551-fceratto.json
  • 11:21 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 11:19 claime: deploy1003:~# lvextend -L+30G /dev/vg0/srv
  • 11:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T399728)', diff saved to https://phabricator.wikimedia.org/P80961 and previous config saved to /var/cache/conftool/dbconfig/20250807-111043-fceratto.json
  • 11:05 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T399728)', diff saved to https://phabricator.wikimedia.org/P80960 and previous config saved to /var/cache/conftool/dbconfig/20250807-110549-fceratto.json
  • 11:05 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 11:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T399728)', diff saved to https://phabricator.wikimedia.org/P80959 and previous config saved to /var/cache/conftool/dbconfig/20250807-110527-fceratto.json
  • 10:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P80958 and previous config saved to /var/cache/conftool/dbconfig/20250807-105019-fceratto.json
  • 10:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P80957 and previous config saved to /var/cache/conftool/dbconfig/20250807-103512-fceratto.json
  • 10:24 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1045.eqiad.wmnet with OS bullseye
  • 10:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T399728)', diff saved to https://phabricator.wikimedia.org/P80956 and previous config saved to /var/cache/conftool/dbconfig/20250807-102004-fceratto.json
  • 10:16 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:15 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T399728)', diff saved to https://phabricator.wikimedia.org/P80955 and previous config saved to /var/cache/conftool/dbconfig/20250807-101515-fceratto.json
  • 10:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 10:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T399728)', diff saved to https://phabricator.wikimedia.org/P80954 and previous config saved to /var/cache/conftool/dbconfig/20250807-101452-fceratto.json
  • 09:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P80953 and previous config saved to /var/cache/conftool/dbconfig/20250807-095945-fceratto.json
  • 09:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P80952 and previous config saved to /var/cache/conftool/dbconfig/20250807-094437-fceratto.json
  • 09:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T399728)', diff saved to https://phabricator.wikimedia.org/P80951 and previous config saved to /var/cache/conftool/dbconfig/20250807-092930-fceratto.json
  • 09:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T399728)', diff saved to https://phabricator.wikimedia.org/P80950 and previous config saved to /var/cache/conftool/dbconfig/20250807-092433-fceratto.json
  • 09:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T399728)', diff saved to https://phabricator.wikimedia.org/P80949 and previous config saved to /var/cache/conftool/dbconfig/20250807-092410-fceratto.json
  • 09:09 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@417d4e8] (releasing): T400645 (duration: 00m 31s)
  • 09:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P80948 and previous config saved to /var/cache/conftool/dbconfig/20250807-090903-fceratto.json
  • 09:08 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@417d4e8] (releasing): T400645
  • 09:04 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1045.eqiad.wmnet with OS bullseye
  • 09:03 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P80947 and previous config saved to /var/cache/conftool/dbconfig/20250807-085355-fceratto.json
  • 08:51 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:45 brouberol@cumin1003: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch
  • 08:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:41 hashar@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.45.0-wmf.13 refs T396374
  • 08:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T399728)', diff saved to https://phabricator.wikimedia.org/P80946 and previous config saved to /var/cache/conftool/dbconfig/20250807-083848-fceratto.json
  • 08:37 brouberol@cumin1003: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch
  • 08:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T399728)', diff saved to https://phabricator.wikimedia.org/P80945 and previous config saved to /var/cache/conftool/dbconfig/20250807-083348-fceratto.json
  • 08:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T399728)', diff saved to https://phabricator.wikimedia.org/P80944 and previous config saved to /var/cache/conftool/dbconfig/20250807-083325-fceratto.json
  • 08:25 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[1061-1063].eqiad.wmnet
  • 08:25 mvernon@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:25 mvernon@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[1061-1063].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1003"
  • 08:20 mvernon@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[1061-1063].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin1003"
  • 08:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P80943 and previous config saved to /var/cache/conftool/dbconfig/20250807-081818-fceratto.json
  • 08:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
  • 08:17 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
  • 08:15 mvernon@cumin1003: START - Cookbook sre.dns.netbox
  • 08:15 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-wmde: apply
  • 08:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-wmde: apply
  • 08:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: apply
  • 08:13 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: apply
  • 08:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-search: apply
  • 08:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-search: apply
  • 08:09 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-research: apply
  • 08:09 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-research: apply
  • 08:08 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-platform-eng: apply
  • 08:08 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-platform-eng: apply
  • 08:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-analytics-product: apply
  • 08:06 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-analytics-product: apply
  • 08:04 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-analytics-test: apply
  • 08:04 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-analytics-test: apply
  • 08:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P80942 and previous config saved to /var/cache/conftool/dbconfig/20250807-080311-fceratto.json
  • 08:02 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 08:02 mvernon@cumin1003: START - Cookbook sre.hosts.decommission for hosts ms-be[1061-1063].eqiad.wmnet
  • 08:01 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-dev: apply
  • 08:01 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-dev: apply
  • 07:59 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1045
  • 07:59 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1045
  • 07:58 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:58 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1045 - vriley@cumin1002"
  • 07:57 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1045 - vriley@cumin1002"
  • 07:54 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 07:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-ml: apply
  • 07:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-ml: apply
  • 07:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T399728)', diff saved to https://phabricator.wikimedia.org/P80941 and previous config saved to /var/cache/conftool/dbconfig/20250807-074803-fceratto.json
  • 07:43 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T399728)', diff saved to https://phabricator.wikimedia.org/P80940 and previous config saved to /var/cache/conftool/dbconfig/20250807-074306-fceratto.json
  • 07:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 07:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 06:34 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1015.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 06:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 05:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 05:36 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1015.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 05:30 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 05:17 ryankemper@cumin2002: START - Cookbook sre.wdqs.restart
  • 05:16 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 04:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.restart
  • 04:07 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 01:39 ejegg: fundraising civicrm upgraded from e591fe72 to ebb98a9e
  • 01:12 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 24s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-08-06

  • 22:09 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
  • 21:57 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1014.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:50 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 21:15 ryankemper@cumin2002: START - Cookbook sre.wdqs.restart
  • 21:14 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)
  • 21:14 ryankemper@cumin2002: START - Cookbook sre.wdqs.restart
  • 20:59 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1014.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 20:53 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2015.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 20:43 dancy@deploy1003: Installation of scap version "4.197.1" completed for 1 hosts
  • 20:42 dancy@deploy1003: Installing scap version "4.197.1" for 1 host(s)
  • 20:38 dancy@deploy1003: Installing scap version "4.197.1" for 169 host(s)
  • 20:09 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2247.codfw.wmnet with OS bookworm
  • 20:08 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2246.codfw.wmnet with OS bookworm
  • 20:07 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2245.codfw.wmnet with OS bookworm
  • 19:54 papaul: maintenance goin on on msw1-eqiad
  • 19:30 swfrench@deploy1003: Finished scap sync-world: No-op deployment to configure PHP 8.3 image builds - T399884 (duration: 19m 22s)
  • 19:11 swfrench@deploy1003: Started scap sync-world: No-op deployment to configure PHP 8.3 image builds - T399884
  • 18:55 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host db2247.codfw.wmnet with OS bookworm
  • 18:55 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host db2246.codfw.wmnet with OS bookworm
  • 18:54 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host db2245.codfw.wmnet with OS bookworm
  • 18:52 jhancock@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2245']
  • 18:52 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2245']
  • 18:17 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1013.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 18:13 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 18:03 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2245.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:56 swfrench@deploy1003: Finished scap sync-world: Deployment to pick up new 8.1.33-1-s3 production images - T383047 (duration: 45m 10s)
  • 17:44 swfrench@deploy1003: swfrench: Continuing with sync
  • 17:32 swfrench@deploy1003: swfrench: Deployment to pick up new 8.1.33-1-s3 production images - T383047 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:30 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host db2245.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 17:20 amastilovic@deploy1003: Finished deploy [analytics/refinery@2178dda] (thin): Updates to sqoop THIN [analytics/refinery@2178dda8] (duration: 01m 08s)
  • 17:19 amastilovic@deploy1003: Started deploy [analytics/refinery@2178dda] (thin): Updates to sqoop THIN [analytics/refinery@2178dda8]
  • 17:19 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1013.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 17:19 amastilovic@deploy1003: Finished deploy [analytics/refinery@2178dda]: Updates to sqoop [analytics/refinery@2178dda8] (duration: 02m 29s)
  • 17:17 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2014.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:16 amastilovic@deploy1003: Started deploy [analytics/refinery@2178dda]: Updates to sqoop [analytics/refinery@2178dda8]
  • 17:15 amastilovic@deploy1003: Finished deploy [analytics/refinery@2178dda] (hadoop-test): Updates to sqoop TEST [analytics/refinery@2178dda8] (duration: 00m 53s)
  • 17:15 amastilovic@deploy1003: Started deploy [analytics/refinery@2178dda] (hadoop-test): Updates to sqoop TEST [analytics/refinery@2178dda8]
  • 17:11 swfrench@deploy1003: Started scap sync-world: Deployment to pick up new 8.1.33-1-s3 production images - T383047
  • 17:10 swfrench-wmf: built and published php8.1 production image stack at 8.1.33-1-s3 - T383047
  • 17:03 swfrench-wmf: reprepro include php8.1_8.1.33-1+wmf11u2 in component/php81 - T383047
  • 16:49 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 16:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T399728)', diff saved to https://phabricator.wikimedia.org/P80935 and previous config saved to /var/cache/conftool/dbconfig/20250806-164846-fceratto.json
  • 16:37 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 16:34 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 16:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P80934 and previous config saved to /var/cache/conftool/dbconfig/20250806-163338-fceratto.json
  • 16:33 cjming@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 16:32 cjming@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 16:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P80931 and previous config saved to /var/cache/conftool/dbconfig/20250806-161831-fceratto.json
  • 16:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T399728)', diff saved to https://phabricator.wikimedia.org/P80930 and previous config saved to /var/cache/conftool/dbconfig/20250806-160323-fceratto.json
  • 15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1251 (T399728)', diff saved to https://phabricator.wikimedia.org/P80929 and previous config saved to /var/cache/conftool/dbconfig/20250806-155939-fceratto.json
  • 15:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1251.eqiad.wmnet with reason: Maintenance
  • 15:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 15:55 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 15:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T399728)', diff saved to https://phabricator.wikimedia.org/P80928 and previous config saved to /var/cache/conftool/dbconfig/20250806-155540-fceratto.json
  • 15:48 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1012.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 15:48 dancy@deploy1003: Installation of scap version "4.197.0" completed for 169 hosts
  • 15:46 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2013.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 15:42 dancy@deploy1003: Installing scap version "4.197.0" for 169 host(s)
  • 15:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P80927 and previous config saved to /var/cache/conftool/dbconfig/20250806-154032-fceratto.json
  • 15:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P80926 and previous config saved to /var/cache/conftool/dbconfig/20250806-152524-fceratto.json
  • 15:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T399728)', diff saved to https://phabricator.wikimedia.org/P80925 and previous config saved to /var/cache/conftool/dbconfig/20250806-151017-fceratto.json
  • 15:06 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T399728)', diff saved to https://phabricator.wikimedia.org/P80924 and previous config saved to /var/cache/conftool/dbconfig/20250806-150631-fceratto.json
  • 15:06 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 15:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T399728)', diff saved to https://phabricator.wikimedia.org/P80923 and previous config saved to /var/cache/conftool/dbconfig/20250806-150609-fceratto.json
  • 14:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P80922 and previous config saved to /var/cache/conftool/dbconfig/20250806-145101-fceratto.json
  • 14:50 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1012.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 14:50 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2013.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 14:35 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P80921 and previous config saved to /var/cache/conftool/dbconfig/20250806-143554-fceratto.json
  • 14:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T399728)', diff saved to https://phabricator.wikimedia.org/P80920 and previous config saved to /var/cache/conftool/dbconfig/20250806-142046-fceratto.json
  • 14:19 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 14:18 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2012.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 14:17 jgiannelos@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:17 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T399728)', diff saved to https://phabricator.wikimedia.org/P80919 and previous config saved to /var/cache/conftool/dbconfig/20250806-141701-fceratto.json
  • 14:16 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 14:16 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T399728)', diff saved to https://phabricator.wikimedia.org/P80918 and previous config saved to /var/cache/conftool/dbconfig/20250806-141638-fceratto.json
  • 14:16 jgiannelos@deploy1003: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:15 jgiannelos@deploy1003: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:14 jgiannelos@deploy1003: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:14 gengh@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:14 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 14:14 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 14:14 gengh@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:14 gengh@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:13 gengh@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:13 gengh@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 gengh@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:08 gengh@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:07 gengh@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:07 gengh@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:07 gengh@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:06 gengh@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 gengh@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:01 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P80917 and previous config saved to /var/cache/conftool/dbconfig/20250806-140130-fceratto.json
  • 13:50 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov2007.codfw.wmnet with OS bookworm
  • 13:50 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 13:49 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin1003"
  • 13:46 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P80916 and previous config saved to /var/cache/conftool/dbconfig/20250806-134623-fceratto.json
  • 13:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2088.codfw.wmnet with OS bullseye
  • 13:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T399728)', diff saved to https://phabricator.wikimedia.org/P80915 and previous config saved to /var/cache/conftool/dbconfig/20250806-133115-fceratto.json
  • 13:29 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1091.eqiad.wmnet with OS bullseye
  • 13:27 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T399728)', diff saved to https://phabricator.wikimedia.org/P80914 and previous config saved to /var/cache/conftool/dbconfig/20250806-132725-fceratto.json
  • 13:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 13:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T399728)', diff saved to https://phabricator.wikimedia.org/P80913 and previous config saved to /var/cache/conftool/dbconfig/20250806-132703-fceratto.json
  • 13:25 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov2007.codfw.wmnet with reason: host reimage
  • 13:21 jhancock@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov2007.codfw.wmnet with reason: host reimage
  • 13:20 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs1011.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 13:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage
  • 13:18 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2012.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 13:14 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:14 mvernon@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1091.eqiad.wmnet with reason: host reimage
  • 13:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P80912 and previous config saved to /var/cache/conftool/dbconfig/20250806-131155-fceratto.json
  • 13:11 Reedy: ran `foreachwiki extensions/Nuke/maintenance/normalizeNukeTags.php` T381598
  • 13:11 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2088.codfw.wmnet with reason: host reimage
  • 13:08 mvernon@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1091.eqiad.wmnet with reason: host reimage
  • 13:05 brouberol: committing new homer config to add dse-k8s-worker101[5-9] to the bgp groups
  • 13:05 reedy@deploy1003: Finished scap sync-world: Backport for Add maintenance script to recapitalize 'Nuke' tags (T381598) (duration: 08m 18s)
  • 13:04 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host dbprov2007.codfw.wmnet with OS bookworm
  • 12:59 reedy@deploy1003: chlod, reedy: Continuing with sync
  • 12:59 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:58 reedy@deploy1003: chlod, reedy: Backport for Add maintenance script to recapitalize 'Nuke' tags (T381598) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:58 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2248.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:57 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2088.codfw.wmnet with OS bullseye
  • 12:56 reedy@deploy1003: Started scap sync-world: Backport for Add maintenance script to recapitalize 'Nuke' tags (T381598)
  • 12:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P80911 and previous config saved to /var/cache/conftool/dbconfig/20250806-125648-fceratto.json
  • 12:56 mvernon@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1091.eqiad.wmnet with OS bullseye
  • 12:55 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2247.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:53 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2246.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:49 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bookworm
  • 12:42 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 12:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T399728)', diff saved to https://phabricator.wikimedia.org/P80910 and previous config saved to /var/cache/conftool/dbconfig/20250806-124140-fceratto.json
  • 12:41 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 12:39 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2245.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:37 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T399728)', diff saved to https://phabricator.wikimedia.org/P80909 and previous config saved to /var/cache/conftool/dbconfig/20250806-123751-fceratto.json
  • 12:37 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host db2248.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:37 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 12:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T399728)', diff saved to https://phabricator.wikimedia.org/P80908 and previous config saved to /var/cache/conftool/dbconfig/20250806-123738-fceratto.json
  • 12:37 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host db2247.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:36 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host db2246.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:36 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host db2245.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 12:35 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2248
  • 12:35 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2247
  • 12:35 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2246
  • 12:35 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host db2245
  • 12:35 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2248
  • 12:35 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2247
  • 12:35 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2246
  • 12:35 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host db2245
  • 12:35 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:35 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2245 to codfw - jhancock@cumin1003"
  • 12:35 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2245 to codfw - jhancock@cumin1003"
  • 12:35 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:33 Reedy: run namespaceDupes.php on thwiki T401287
  • 12:32 reedy@deploy1003: Finished scap sync-world: Backport for thwiki: add WT namespace alias (T401287) (duration: 08m 56s)
  • 12:31 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 12:29 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm
  • 12:26 reedy@deploy1003: chlod, reedy: Continuing with sync
  • 12:25 reedy@deploy1003: chlod, reedy: Backport for thwiki: add WT namespace alias (T401287) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:23 reedy@deploy1003: Started scap sync-world: Backport for thwiki: add WT namespace alias (T401287)
  • 12:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P80907 and previous config saved to /var/cache/conftool/dbconfig/20250806-122231-fceratto.json
  • 12:18 btullis@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:17 btullis@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:13 btullis@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:11 btullis@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P80906 and previous config saved to /var/cache/conftool/dbconfig/20250806-120723-fceratto.json
  • 11:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T399728)', diff saved to https://phabricator.wikimedia.org/P80905 and previous config saved to /var/cache/conftool/dbconfig/20250806-115216-fceratto.json
  • 11:44 btullis@deploy1003: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: sync
  • 11:44 btullis@deploy1003: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: sync
  • 11:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T399728)', diff saved to https://phabricator.wikimedia.org/P80904 and previous config saved to /var/cache/conftool/dbconfig/20250806-113633-fceratto.json
  • 11:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 11:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T399728)', diff saved to https://phabricator.wikimedia.org/P80903 and previous config saved to /var/cache/conftool/dbconfig/20250806-113609-fceratto.json
  • 11:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P80902 and previous config saved to /var/cache/conftool/dbconfig/20250806-112102-fceratto.json
  • 11:15 btullis@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:14 btullis@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:09 btullis@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P80901 and previous config saved to /var/cache/conftool/dbconfig/20250806-110555-fceratto.json
  • 11:00 btullis@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:58 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:58 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:50 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T399728)', diff saved to https://phabricator.wikimedia.org/P80900 and previous config saved to /var/cache/conftool/dbconfig/20250806-105047-fceratto.json
  • 10:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T399728)', diff saved to https://phabricator.wikimedia.org/P80899 and previous config saved to /var/cache/conftool/dbconfig/20250806-104805-fceratto.json
  • 10:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 10:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T399728)', diff saved to https://phabricator.wikimedia.org/P80898 and previous config saved to /var/cache/conftool/dbconfig/20250806-104743-fceratto.json
  • 10:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P80897 and previous config saved to /var/cache/conftool/dbconfig/20250806-103235-fceratto.json
  • 10:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P80896 and previous config saved to /var/cache/conftool/dbconfig/20250806-101728-fceratto.json
  • 10:03 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1019.eqiad.wmnet with OS bookworm
  • 10:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T399728)', diff saved to https://phabricator.wikimedia.org/P80895 and previous config saved to /var/cache/conftool/dbconfig/20250806-100220-fceratto.json
  • 10:00 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
  • 09:58 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T399728)', diff saved to https://phabricator.wikimedia.org/P80894 and previous config saved to /var/cache/conftool/dbconfig/20250806-095839-fceratto.json
  • 09:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:58 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 09:58 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T399728)', diff saved to https://phabricator.wikimedia.org/P80893 and previous config saved to /var/cache/conftool/dbconfig/20250806-095758-fceratto.json
  • 09:54 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1018.eqiad.wmnet with OS bookworm
  • 09:54 hashar@deploy1003: Finished scap sync-world: Backport for ExperimentManager: Fix #getExperiment() when uninitialized (T401294) (duration: 08m 20s)
  • 09:50 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1016.eqiad.wmnet with OS bookworm
  • 09:48 hashar@deploy1003: hashar: Continuing with sync
  • 09:47 hashar@deploy1003: hashar: Backport for ExperimentManager: Fix #getExperiment() when uninitialized (T401294) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:45 hashar@deploy1003: Started scap sync-world: Backport for ExperimentManager: Fix #getExperiment() when uninitialized (T401294)
  • 09:45 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1019.eqiad.wmnet with reason: host reimage
  • 09:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P80892 and previous config saved to /var/cache/conftool/dbconfig/20250806-094250-fceratto.json
  • 09:41 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1017.eqiad.wmnet with reason: host reimage
  • 09:38 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1019.eqiad.wmnet with reason: host reimage
  • 09:38 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1018.eqiad.wmnet with reason: host reimage
  • 09:35 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
  • 09:34 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1018.eqiad.wmnet with reason: host reimage
  • 09:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1016.eqiad.wmnet with reason: host reimage
  • 09:33 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1017.eqiad.wmnet with reason: host reimage
  • 09:31 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1016.eqiad.wmnet with reason: host reimage
  • 09:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P80891 and previous config saved to /var/cache/conftool/dbconfig/20250806-092743-fceratto.json
  • 09:20 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1019.eqiad.wmnet with OS bookworm
  • 09:19 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1018.eqiad.wmnet with OS bookworm
  • 09:18 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1015.eqiad.wmnet with reason: host reimage
  • 09:18 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1017.eqiad.wmnet with OS bookworm
  • 09:15 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1016.eqiad.wmnet with OS bookworm
  • 09:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T399728)', diff saved to https://phabricator.wikimedia.org/P80890 and previous config saved to /var/cache/conftool/dbconfig/20250806-091235-fceratto.json
  • 09:12 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1015.eqiad.wmnet with reason: host reimage
  • 09:08 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T399728)', diff saved to https://phabricator.wikimedia.org/P80889 and previous config saved to /var/cache/conftool/dbconfig/20250806-090856-fceratto.json
  • 09:08 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 09:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T399728)', diff saved to https://phabricator.wikimedia.org/P80888 and previous config saved to /var/cache/conftool/dbconfig/20250806-090833-fceratto.json
  • 09:07 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:07 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 08:56 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1015.eqiad.wmnet with OS bookworm
  • 08:55 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from snapshot1016 to dse-k8s-worker1019
  • 08:54 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1019
  • 08:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P80887 and previous config saved to /var/cache/conftool/dbconfig/20250806-085326-fceratto.json
  • 08:52 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1019
  • 08:52 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1019 on all recursors
  • 08:52 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1019 on all recursors
  • 08:52 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:52 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1016 to dse-k8s-worker1019 - btullis@cumin1003"
  • 08:52 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1016 to dse-k8s-worker1019 - btullis@cumin1003"
  • 08:48 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 08:45 btullis@cumin1003: START - Cookbook sre.hosts.rename from snapshot1016 to dse-k8s-worker1019
  • 08:39 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from snapshot1016 to dse-k8s-worker1019
  • 08:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P80886 and previous config saved to /var/cache/conftool/dbconfig/20250806-083818-fceratto.json
  • 08:34 btullis@cumin1003: START - Cookbook sre.hosts.rename from snapshot1016 to dse-k8s-worker1019
  • 08:25 hashar@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.45.0-wmf.13 refs T396374
  • 08:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T399728)', diff saved to https://phabricator.wikimedia.org/P80885 and previous config saved to /var/cache/conftool/dbconfig/20250806-082311-fceratto.json
  • 08:19 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T399728)', diff saved to https://phabricator.wikimedia.org/P80884 and previous config saved to /var/cache/conftool/dbconfig/20250806-081929-fceratto.json
  • 08:19 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 08:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T399728)', diff saved to https://phabricator.wikimedia.org/P80883 and previous config saved to /var/cache/conftool/dbconfig/20250806-081906-fceratto.json
  • 08:16 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
  • 08:15 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
  • 08:08 reedy@deploy1003: Finished scap sync-world: Backport for thwiki: enable WikiLove (T401279) (duration: 07m 46s)
  • 08:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P80882 and previous config saved to /var/cache/conftool/dbconfig/20250806-080359-fceratto.json
  • 08:03 reedy@deploy1003: reedy, chlod: Continuing with sync
  • 08:03 reedy@deploy1003: reedy, chlod: Backport for thwiki: enable WikiLove (T401279) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 08:01 reedy@deploy1003: Started scap sync-world: Backport for thwiki: enable WikiLove (T401279)
  • 07:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P80881 and previous config saved to /var/cache/conftool/dbconfig/20250806-074851-fceratto.json
  • 07:45 Reedy: created wikilove tables on thwiki T401279
  • 07:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T399728)', diff saved to https://phabricator.wikimedia.org/P80880 and previous config saved to /var/cache/conftool/dbconfig/20250806-073343-fceratto.json
  • 07:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T399728)', diff saved to https://phabricator.wikimedia.org/P80879 and previous config saved to /var/cache/conftool/dbconfig/20250806-072448-fceratto.json
  • 07:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 07:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T399728)', diff saved to https://phabricator.wikimedia.org/P80878 and previous config saved to /var/cache/conftool/dbconfig/20250806-072425-fceratto.json
  • 07:13 kartik@deploy1003: Finished scap sync-world: Backport for Enable the Contribute menu in 9th group of Wikipedias (T397122) (duration: 09m 37s)
  • 07:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P80877 and previous config saved to /var/cache/conftool/dbconfig/20250806-070918-fceratto.json
  • 07:08 kartik@deploy1003: kartik: Continuing with sync
  • 07:05 kartik@deploy1003: kartik: Backport for Enable the Contribute menu in 9th group of Wikipedias (T397122) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 07:03 kartik@deploy1003: Started scap sync-world: Backport for Enable the Contribute menu in 9th group of Wikipedias (T397122)
  • 06:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P80876 and previous config saved to /var/cache/conftool/dbconfig/20250806-065410-fceratto.json
  • 06:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T399728)', diff saved to https://phabricator.wikimedia.org/P80875 and previous config saved to /var/cache/conftool/dbconfig/20250806-063903-fceratto.json
  • 06:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T399728)', diff saved to https://phabricator.wikimedia.org/P80874 and previous config saved to /var/cache/conftool/dbconfig/20250806-063521-fceratto.json
  • 06:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 03:05 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host sretest2003.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
  • 00:29 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 00:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T399728)', diff saved to https://phabricator.wikimedia.org/P80873 and previous config saved to /var/cache/conftool/dbconfig/20250806-002921-fceratto.json
  • 00:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P80872 and previous config saved to /var/cache/conftool/dbconfig/20250806-001413-fceratto.json

2025-08-05

  • 23:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P80871 and previous config saved to /var/cache/conftool/dbconfig/20250805-235905-fceratto.json
  • 23:55 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 23:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T399728)', diff saved to https://phabricator.wikimedia.org/P80870 and previous config saved to /var/cache/conftool/dbconfig/20250805-234358-fceratto.json
  • 23:39 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T399728)', diff saved to https://phabricator.wikimedia.org/P80869 and previous config saved to /var/cache/conftool/dbconfig/20250805-233907-fceratto.json
  • 23:39 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 23:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T399728)', diff saved to https://phabricator.wikimedia.org/P80868 and previous config saved to /var/cache/conftool/dbconfig/20250805-233843-fceratto.json
  • 23:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P80867 and previous config saved to /var/cache/conftool/dbconfig/20250805-232336-fceratto.json
  • 23:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P80866 and previous config saved to /var/cache/conftool/dbconfig/20250805-230828-fceratto.json
  • 22:55 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2011.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 22:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T399728)', diff saved to https://phabricator.wikimedia.org/P80865 and previous config saved to /var/cache/conftool/dbconfig/20250805-225320-fceratto.json
  • 22:52 bking@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=wdqs-scholarly,name=eqiad
  • 22:48 sfaci@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 22:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T399728)', diff saved to https://phabricator.wikimedia.org/P80864 and previous config saved to /var/cache/conftool/dbconfig/20250805-224824-fceratto.json
  • 22:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 22:48 sfaci@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 22:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T399728)', diff saved to https://phabricator.wikimedia.org/P80863 and previous config saved to /var/cache/conftool/dbconfig/20250805-224801-fceratto.json
  • 22:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P80862 and previous config saved to /var/cache/conftool/dbconfig/20250805-223253-fceratto.json
  • 22:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbprov1007.eqiad.wmnet with OS bookworm
  • 22:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 22:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P80861 and previous config saved to /var/cache/conftool/dbconfig/20250805-221746-fceratto.json
  • 22:17 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 22:05 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 22:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T399728)', diff saved to https://phabricator.wikimedia.org/P80860 and previous config saved to /var/cache/conftool/dbconfig/20250805-220238-fceratto.json
  • 21:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbprov1007.eqiad.wmnet with reason: host reimage
  • 21:57 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T399728)', diff saved to https://phabricator.wikimedia.org/P80859 and previous config saved to /var/cache/conftool/dbconfig/20250805-215738-fceratto.json
  • 21:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 21:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T399728)', diff saved to https://phabricator.wikimedia.org/P80858 and previous config saved to /var/cache/conftool/dbconfig/20250805-215715-fceratto.json
  • 21:55 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbprov1007.eqiad.wmnet with reason: host reimage
  • 21:46 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2010.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 21:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P80857 and previous config saved to /var/cache/conftool/dbconfig/20250805-214208-fceratto.json
  • 21:41 jgleeson: SmashPig upgraded from a7e897ec to 83293ee1
  • 21:37 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbprov1007.eqiad.wmnet with OS bookworm
  • 21:31 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 21:31 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 21:31 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 21:31 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 21:31 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 21:31 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 21:31 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbprov1007.eqiad.wmnet with OS bookworm
  • 21:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P80856 and previous config saved to /var/cache/conftool/dbconfig/20250805-212701-fceratto.json
  • 21:25 brett@dns1004: END - running authdns-update
  • 21:25 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 21:25 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 21:24 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 21:24 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 21:24 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 21:24 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 21:23 bking@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=wdqs-scholarly,name=eqiad
  • 21:22 brett@dns1004: START - running authdns-update
  • 21:18 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:17 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T386098, transfer newly-reloaded data) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:17 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet w/ force delete existing files, repooling both afterwards
  • 21:14 ryankemper@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-reload (exit_code=97) reloading scholarly_articles on wdqs1024.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250714/ using stat1009.eqiad.wmnet)
  • 21:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T399728)', diff saved to https://phabricator.wikimedia.org/P80855 and previous config saved to /var/cache/conftool/dbconfig/20250805-211153-fceratto.json
  • 21:07 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbprov1007.eqiad.wmnet with OS bookworm
  • 21:06 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T399728)', diff saved to https://phabricator.wikimedia.org/P80854 and previous config saved to /var/cache/conftool/dbconfig/20250805-210649-fceratto.json
  • 21:06 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 21:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T399728)', diff saved to https://phabricator.wikimedia.org/P80853 and previous config saved to /var/cache/conftool/dbconfig/20250805-210627-fceratto.json
  • 20:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P80852 and previous config saved to /var/cache/conftool/dbconfig/20250805-205119-fceratto.json
  • 20:46 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2010.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 20:40 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbprov2007.codfw.wmnet with OS bookworm
  • 20:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P80851 and previous config saved to /var/cache/conftool/dbconfig/20250805-203612-fceratto.json
  • 20:35 ebernhardson: starting cluster mutation test on relforge*.eqiad.wmnet servers
  • 20:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T399728)', diff saved to https://phabricator.wikimedia.org/P80850 and previous config saved to /var/cache/conftool/dbconfig/20250805-202104-fceratto.json
  • 20:20 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 20:19 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbprov1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:16 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T399728)', diff saved to https://phabricator.wikimedia.org/P80849 and previous config saved to /var/cache/conftool/dbconfig/20250805-201601-fceratto.json
  • 20:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 20:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T399728)', diff saved to https://phabricator.wikimedia.org/P80848 and previous config saved to /var/cache/conftool/dbconfig/20250805-201539-fceratto.json
  • 20:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P80847 and previous config saved to /var/cache/conftool/dbconfig/20250805-200031-fceratto.json
  • 19:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host dbprov1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:49 mutante: [gitlab2002:~] $ sudo systemctl start wmf_auto_restart_ssh-gitlab T401191
  • 19:47 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:47 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for dbprov1007 - jclark@cumin1002"
  • 19:47 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for dbprov1007 - jclark@cumin1002"
  • 19:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P80846 and previous config saved to /var/cache/conftool/dbconfig/20250805-194524-fceratto.json
  • 19:39 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 19:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T399728)', diff saved to https://phabricator.wikimedia.org/P80845 and previous config saved to /var/cache/conftool/dbconfig/20250805-193016-fceratto.json
  • 19:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T399728)', diff saved to https://phabricator.wikimedia.org/P80844 and previous config saved to /var/cache/conftool/dbconfig/20250805-192410-fceratto.json
  • 19:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 19:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T399728)', diff saved to https://phabricator.wikimedia.org/P80843 and previous config saved to /var/cache/conftool/dbconfig/20250805-192347-fceratto.json
  • 19:22 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs2007.codfw.wmnet -> wdqs2008.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 19:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P80842 and previous config saved to /var/cache/conftool/dbconfig/20250805-190840-fceratto.json
  • 19:04 rzl: rzl@deploy1003:/srv/deployment-charts$ sudo git restore helmfile.d/dse-k8s-services/airflow-ml/values-production.yaml # discarding local changes to unblock the minutely git pull
  • 19:01 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host dbprov2007.codfw.wmnet with OS bookworm
  • 19:01 krinkle@deploy1003: Finished scap sync-world: Backport for Profiler: Remove support for php-tideways_xhprof (T401152) (duration: 14m 54s)
  • 18:55 krinkle@deploy1003: krinkle: Continuing with sync
  • 18:54 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbprov2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P80841 and previous config saved to /var/cache/conftool/dbconfig/20250805-185332-fceratto.json
  • 18:50 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host dbprov2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:50 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbprov2007
  • 18:50 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dbprov2007
  • 18:50 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:48 krinkle@deploy1003: krinkle: Backport for Profiler: Remove support for php-tideways_xhprof (T401152) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:47 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 18:46 krinkle@deploy1003: Started scap sync-world: Backport for Profiler: Remove support for php-tideways_xhprof (T401152)
  • 18:39 dancy@deploy1003: Finished scap sync-world: testing T398875 (duration: 02m 54s)
  • 18:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T399728)', diff saved to https://phabricator.wikimedia.org/P80840 and previous config saved to /var/cache/conftool/dbconfig/20250805-183824-fceratto.json
  • 18:37 dancy@deploy1003: Started scap sync-world: testing T398875
  • 18:35 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 10m 39s)
  • 18:35 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 18:35 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 18:35 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 18:35 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 18:35 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 18:35 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 18:33 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T399728)', diff saved to https://phabricator.wikimedia.org/P80839 and previous config saved to /var/cache/conftool/dbconfig/20250805-183319-fceratto.json
  • 18:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 18:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T399728)', diff saved to https://phabricator.wikimedia.org/P80838 and previous config saved to /var/cache/conftool/dbconfig/20250805-183256-fceratto.json
  • 18:27 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 18:25 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 18:18 dancy@deploy1003: Installation of scap version "4.196.0" completed for 2 hosts
  • 18:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P80837 and previous config saved to /var/cache/conftool/dbconfig/20250805-181749-fceratto.json
  • 18:16 dancy@deploy1003: Installing scap version "4.196.0" for 2 host(s)
  • 18:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P80836 and previous config saved to /var/cache/conftool/dbconfig/20250805-180241-fceratto.json
  • 17:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T399728)', diff saved to https://phabricator.wikimedia.org/P80835 and previous config saved to /var/cache/conftool/dbconfig/20250805-174734-fceratto.json
  • 17:42 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading scholarly_articles on wdqs1024.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20250714/ using stat1009.eqiad.wmnet)
  • 17:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T399728)', diff saved to https://phabricator.wikimedia.org/P80834 and previous config saved to /var/cache/conftool/dbconfig/20250805-174219-fceratto.json
  • 17:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 17:38 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 17:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T399728)', diff saved to https://phabricator.wikimedia.org/P80833 and previous config saved to /var/cache/conftool/dbconfig/20250805-173835-fceratto.json
  • 17:37 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:37 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:37 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:37 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:37 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:37 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 17:35 krinkle@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
  • 17:33 krinkle@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
  • 17:28 swfrench@deploy1003: Finished scap sync-world: Migrate debug and cli images to xhprof - T401152 (duration: 22m 02s)
  • 17:27 swfrench@deploy1003: swfrench: Continuing with sync
  • 17:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P80832 and previous config saved to /var/cache/conftool/dbconfig/20250805-172327-fceratto.json
  • 17:19 bblack@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jobo out of all services on: 2395 hosts
  • 17:16 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:15 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:15 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:14 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:14 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:14 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:14 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:14 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:14 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:14 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:14 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 17:14 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:14 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:14 bking@cumin2002: END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 17:12 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:12 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:12 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:11 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:11 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:11 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 17:10 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:10 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:10 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:10 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:10 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:10 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 17:09 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:09 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:09 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:09 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:08 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:08 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 17:08 swfrench@deploy1003: swfrench: Migrate debug and cli images to xhprof - T401152 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:08 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:08 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:08 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:08 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:08 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:08 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 17:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P80831 and previous config saved to /var/cache/conftool/dbconfig/20250805-170820-fceratto.json
  • 17:08 dani@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:08 dani@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:08 dani@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:07 dani@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:07 dani@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:07 dani@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 17:07 swfrench@deploy1003: Started scap sync-world: Migrate debug and cli images to xhprof - T401152
  • 17:05 krinkle@deploy1003: Finished scap sync-world: Backport for Profiler: Add php-xhprof support besides php-tideways_xhprof (T401152) (duration: 11m 15s)
  • 17:02 bking@cumin2002: START - Cookbook sre.wdqs.data-transfer (T386098, transfer newly-reloaded data) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2007.codfw.wmnet w/ force delete existing files, repooling both afterwards
  • 16:59 krinkle@deploy1003: krinkle: Continuing with sync
  • 16:56 bblack@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jobo out of all services on: 2395 hosts
  • 16:55 krinkle@deploy1003: krinkle: Backport for Profiler: Add php-xhprof support besides php-tideways_xhprof (T401152) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:53 krinkle@deploy1003: Started scap sync-world: Backport for Profiler: Add php-xhprof support besides php-tideways_xhprof (T401152)
  • 16:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T399728)', diff saved to https://phabricator.wikimedia.org/P80830 and previous config saved to /var/cache/conftool/dbconfig/20250805-165312-fceratto.json
  • 16:49 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T399728)', diff saved to https://phabricator.wikimedia.org/P80829 and previous config saved to /var/cache/conftool/dbconfig/20250805-164902-fceratto.json
  • 16:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 16:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 16:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from snapshot1015 to dse-k8s-worker1018
  • 16:47 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1018
  • 16:45 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1018
  • 16:45 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1018 on all recursors
  • 16:45 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1018 on all recursors
  • 16:45 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:45 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1015 to dse-k8s-worker1018 - btullis@cumin1003"
  • 16:44 bblack@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jobo out of all services on: 2396 hosts
  • 16:40 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1015 to dse-k8s-worker1018 - btullis@cumin1003"
  • 16:34 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 16:34 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 16:34 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 16:33 btullis@cumin1003: START - Cookbook sre.hosts.rename from snapshot1015 to dse-k8s-worker1018
  • 16:32 jgiannelos@deploy1003: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 16:32 jgiannelos@deploy1003: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 16:31 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from snapshot1013 to dse-k8s-worker1017
  • 16:30 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1017
  • 16:27 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bookworm
  • 16:25 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1017
  • 16:25 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1017 on all recursors
  • 16:25 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1017 on all recursors
  • 16:25 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:25 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1013 to dse-k8s-worker1017 - btullis@cumin1003"
  • 16:25 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1013 to dse-k8s-worker1017 - btullis@cumin1003"
  • 16:15 mszabo@deploy1003: Finished scap sync-world: Backport for UserInfoCard: Fix UA exclusion in stream config (duration: 11m 34s)
  • 16:10 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 16:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T399728)', diff saved to https://phabricator.wikimedia.org/P80828 and previous config saved to /var/cache/conftool/dbconfig/20250805-161038-fceratto.json
  • 16:09 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 16:09 btullis@cumin1003: START - Cookbook sre.hosts.rename from snapshot1013 to dse-k8s-worker1017
  • 16:08 mszabo@deploy1003: mszabo: Continuing with sync
  • 16:07 mszabo@deploy1003: mszabo: Backport for UserInfoCard: Fix UA exclusion in stream config synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 16:04 mszabo@deploy1003: Started scap sync-world: Backport for UserInfoCard: Fix UA exclusion in stream config
  • 16:01 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm
  • 15:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P80827 and previous config saved to /var/cache/conftool/dbconfig/20250805-155530-fceratto.json
  • 15:49 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from snapshot1012 to dse-k8s-worker1016
  • 15:48 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1016
  • 15:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P80826 and previous config saved to /var/cache/conftool/dbconfig/20250805-154023-fceratto.json
  • 15:36 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1016
  • 15:36 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1016 on all recursors
  • 15:36 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1016 on all recursors
  • 15:36 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:36 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1012 to dse-k8s-worker1016 - btullis@cumin1003"
  • 15:31 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1012 to dse-k8s-worker1016 - btullis@cumin1003"
  • 15:27 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 15:27 btullis@cumin1003: START - Cookbook sre.hosts.rename from snapshot1012 to dse-k8s-worker1016
  • 15:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from snapshot1011 to dse-k8s-worker1015
  • 15:26 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1015
  • 15:25 brennen@deploy1003: Finished deploy [phabricator/deployment@7b907e8]: deploy phab1004 for T401213 (duration: 00m 40s)
  • 15:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T399728)', diff saved to and previous config saved to /var/cache/conftool/dbconfig/20250805-152515-fceratto.json
  • 15:25 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1015
  • 15:25 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1015 on all recursors
  • 15:25 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1015 on all recursors
  • 15:25 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:25 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1011 to dse-k8s-worker1015 - btullis@cumin1003"
  • 15:24 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming snapshot1011 to dse-k8s-worker1015 - btullis@cumin1003"
  • 15:24 brennen@deploy1003: Started deploy [phabricator/deployment@7b907e8]: deploy phab1004 for T401213
  • 15:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1253 (T399728)', diff saved to https://phabricator.wikimedia.org/P80825 and previous config saved to /var/cache/conftool/dbconfig/20250805-152232-fceratto.json
  • 15:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1253.eqiad.wmnet with reason: Maintenance
  • 15:22 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T399728)', diff saved to https://phabricator.wikimedia.org/P80824 and previous config saved to /var/cache/conftool/dbconfig/20250805-152208-fceratto.json
  • 15:22 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:21 brennen@deploy1003: Finished deploy [phabricator/deployment@7b907e8]: deploy phab2002 for T401213 (duration: 00m 41s)
  • 15:20 brennen@deploy1003: Started deploy [phabricator/deployment@7b907e8]: deploy phab2002 for T401213
  • 15:19 sukhe@cumin1003: START - Cookbook sre.dns.netbox
  • 15:18 sukhe@dns1004: END - running authdns-update
  • 15:17 sukhe@dns1004: START - running authdns-update
  • 15:17 dzahn@cumin2002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: version upgrade
  • 15:14 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: phab deploy
  • 15:14 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: phab deploy
  • 15:11 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bookworm
  • 15:10 jhancock@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:07 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P80823 and previous config saved to /var/cache/conftool/dbconfig/20250805-150701-fceratto.json
  • 14:57 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm
  • 14:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P80822 and previous config saved to /var/cache/conftool/dbconfig/20250805-145153-fceratto.json
  • 14:49 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 14:49 jhancock@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 14:49 jhancock@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbprov2007 to codfw - jhancock@cumin1003"
  • 14:47 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbprov2007 to codfw - jhancock@cumin1003"
  • 14:44 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 14:43 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 14:42 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 14:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 14:37 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 14:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T399728)', diff saved to https://phabricator.wikimedia.org/P80821 and previous config saved to /var/cache/conftool/dbconfig/20250805-143646-fceratto.json
  • 14:34 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T399728)', diff saved to https://phabricator.wikimedia.org/P80820 and previous config saved to /var/cache/conftool/dbconfig/20250805-143359-fceratto.json
  • 14:33 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 14:33 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T399728)', diff saved to https://phabricator.wikimedia.org/P80819 and previous config saved to /var/cache/conftool/dbconfig/20250805-143336-fceratto.json
  • 14:26 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:24 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:18 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P80818 and previous config saved to /var/cache/conftool/dbconfig/20250805-141829-fceratto.json
  • 14:18 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:17 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:17 mszabo@deploy1003: Finished scap sync-world: Backport for UserInfoCard: Cap maximum count for thanks given/received (T398354) (duration: 36m 20s)
  • 14:17 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:16 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:16 cgoubert@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:15 cgoubert@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:15 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:14 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:14 cgoubert@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
  • 14:13 cgoubert@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
  • 14:13 cgoubert@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:12 cgoubert@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 14:12 cgoubert@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 14:09 cgoubert@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 14:09 cgoubert@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:08 cgoubert@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 14:06 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 14:06 cgoubert@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:06 btullis@cumin1003: START - Cookbook sre.hosts.rename from snapshot1011 to dse-k8s-worker1015
  • 14:05 mszabo@deploy1003: mszabo: Continuing with sync
  • 14:04 cgoubert@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:03 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:03 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P80817 and previous config saved to /var/cache/conftool/dbconfig/20250805-140321-fceratto.json
  • 14:02 mszabo@deploy1003: mszabo: Backport for UserInfoCard: Cap maximum count for thanks given/received (T398354) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:01 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:50 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bookworm
  • 13:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T399728)', diff saved to https://phabricator.wikimedia.org/P80816 and previous config saved to /var/cache/conftool/dbconfig/20250805-134814-fceratto.json
  • 13:45 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T399728)', diff saved to https://phabricator.wikimedia.org/P80815 and previous config saved to /var/cache/conftool/dbconfig/20250805-134539-fceratto.json
  • 13:45 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 13:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T399728)', diff saved to https://phabricator.wikimedia.org/P80814 and previous config saved to /var/cache/conftool/dbconfig/20250805-134515-fceratto.json
  • 13:45 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm
  • 13:41 mszabo@deploy1003: Started scap sync-world: Backport for UserInfoCard: Cap maximum count for thanks given/received (T398354)
  • 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2331.codfw.wmnet with OS bookworm
  • 13:40 elukey@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 13:39 elukey@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1003"
  • 13:30 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P80813 and previous config saved to /var/cache/conftool/dbconfig/20250805-133007-fceratto.json
  • 13:23 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2331.codfw.wmnet with reason: host reimage
  • 13:20 jdrewniak@deploy1003: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 01m 45s)
  • 13:18 jdrewniak@deploy1003: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 07s)
  • 13:17 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2331.codfw.wmnet with reason: host reimage
  • 13:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P80812 and previous config saved to /var/cache/conftool/dbconfig/20250805-131500-fceratto.json
  • 13:04 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host wikikube-worker2331.codfw.wmnet with OS bookworm
  • 13:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:02 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 12:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T399728)', diff saved to https://phabricator.wikimedia.org/P80811 and previous config saved to /var/cache/conftool/dbconfig/20250805-125952-fceratto.json
  • 12:57 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T399728)', diff saved to https://phabricator.wikimedia.org/P80810 and previous config saved to /var/cache/conftool/dbconfig/20250805-125719-fceratto.json
  • 12:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 12:56 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T399728)', diff saved to https://phabricator.wikimedia.org/P80809 and previous config saved to /var/cache/conftool/dbconfig/20250805-125655-fceratto.json
  • 12:52 elukey@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-worker2331.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
  • 12:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P80807 and previous config saved to /var/cache/conftool/dbconfig/20250805-124147-fceratto.json
  • 12:35 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:35 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:26 gkyziridis@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 12:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P80806 and previous config saved to /var/cache/conftool/dbconfig/20250805-122640-fceratto.json
  • 12:26 gkyziridis@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 12:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T399728)', diff saved to https://phabricator.wikimedia.org/P80805 and previous config saved to /var/cache/conftool/dbconfig/20250805-121132-fceratto.json
  • 12:08 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T399728)', diff saved to https://phabricator.wikimedia.org/P80803 and previous config saved to /var/cache/conftool/dbconfig/20250805-120857-fceratto.json
  • 12:08 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 12:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T399728)', diff saved to https://phabricator.wikimedia.org/P80802 and previous config saved to /var/cache/conftool/dbconfig/20250805-120835-fceratto.json
  • 11:53 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P80801 and previous config saved to /var/cache/conftool/dbconfig/20250805-115327-fceratto.json
  • 11:38 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P80800 and previous config saved to /var/cache/conftool/dbconfig/20250805-113820-fceratto.json
  • 11:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T399728)', diff saved to https://phabricator.wikimedia.org/P80799 and previous config saved to /var/cache/conftool/dbconfig/20250805-112312-fceratto.json
  • 11:20 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T399728)', diff saved to https://phabricator.wikimedia.org/P80798 and previous config saved to /var/cache/conftool/dbconfig/20250805-112036-fceratto.json
  • 11:20 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 11:20 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T399728)', diff saved to https://phabricator.wikimedia.org/P80797 and previous config saved to /var/cache/conftool/dbconfig/20250805-112014-fceratto.json
  • 11:15 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts dumpsdata1003.eqiad.wmnet
  • 11:15 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:15 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dumpsdata1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 11:14 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dumpsdata1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 11:05 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P80796 and previous config saved to /var/cache/conftool/dbconfig/20250805-110506-fceratto.json
  • 10:56 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@62138e1] (releasing): T401180 (duration: 00m 32s)
  • 10:56 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@62138e1] (releasing): T401180
  • 10:55 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 10:50 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts dumpsdata1003.eqiad.wmnet
  • 10:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P80795 and previous config saved to /var/cache/conftool/dbconfig/20250805-104959-fceratto.json
  • 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts snapshot1010.eqiad.wmnet
  • 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:47 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: snapshot1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 10:47 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: snapshot1010.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
  • 10:39 xSavitar: Ran fixStuckGlobalRename.php for T400862
  • 10:36 xSavitar: Ran fixStuckGlobalRename.php for T400974
  • 10:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T399728)', diff saved to https://phabricator.wikimedia.org/P80794 and previous config saved to /var/cache/conftool/dbconfig/20250805-103451-fceratto.json
  • 10:32 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T399728)', diff saved to https://phabricator.wikimedia.org/P80793 and previous config saved to /var/cache/conftool/dbconfig/20250805-103213-fceratto.json
  • 10:32 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 10:31 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 10:31 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T399728)', diff saved to https://phabricator.wikimedia.org/P80792 and previous config saved to /var/cache/conftool/dbconfig/20250805-103055-fceratto.json
  • 10:23 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gitlab2002.wikimedia.org with OS bookworm
  • 10:18 btullis@cumin1003: START - Cookbook sre.dns.netbox
  • 10:15 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P80791 and previous config saved to /var/cache/conftool/dbconfig/20250805-101548-fceratto.json
  • 10:12 hashar@deploy1003: Finished scap sync-world: Backport for In robots.txt permit access to the sitemap API (T400023 T396684) (duration: 08m 01s)
  • 10:09 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts snapshot1010.eqiad.wmnet
  • 10:06 hashar@deploy1003: tstarling, hashar: Continuing with sync
  • 10:06 hashar@deploy1003: tstarling, hashar: Backport for In robots.txt permit access to the sitemap API (T400023 T396684) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 10:04 hashar@deploy1003: Started scap sync-world: Backport for In robots.txt permit access to the sitemap API (T400023 T396684)
  • 10:00 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P80790 and previous config saved to /var/cache/conftool/dbconfig/20250805-100040-fceratto.json
  • 09:59 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
  • 09:55 jelto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on gitlab2002.wikimedia.org with reason: host reimage
  • 09:51 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f2-codfw
  • 09:51 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f2-codfw
  • 09:45 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T399728)', diff saved to https://phabricator.wikimedia.org/P80789 and previous config saved to /var/cache/conftool/dbconfig/20250805-094533-fceratto.json
  • 09:45 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e4-codfw
  • 09:45 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e4-codfw
  • 09:42 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T399728)', diff saved to https://phabricator.wikimedia.org/P80788 and previous config saved to /var/cache/conftool/dbconfig/20250805-094244-fceratto.json
  • 09:42 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:42 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T399728)', diff saved to https://phabricator.wikimedia.org/P80787 and previous config saved to /var/cache/conftool/dbconfig/20250805-094221-fceratto.json
  • 09:37 jelto@cumin1003: START - Cookbook sre.hosts.reimage for host gitlab2002.wikimedia.org with OS bookworm
  • 09:34 jelto@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host gitlab2002.wikimedia.org with OS bookworm
  • 09:33 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f4-codfw
  • 09:33 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f4-codfw
  • 09:31 hashar@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.45.0-wmf.13 refs T396374
  • 09:30 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f4-codfw
  • 09:30 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f4-codfw
  • 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f4-codfw
  • 09:29 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f4-codfw
  • 09:27 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P80786 and previous config saved to /var/cache/conftool/dbconfig/20250805-092714-fceratto.json
  • 09:20 hashar@deploy1003: Finished scap sync-world: Backport for Authorize self for Google Search Console (T400023) (duration: 17m 50s)
  • 09:12 hashar@deploy1003: tstarling, hashar: Continuing with sync
  • 09:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P80785 and previous config saved to /var/cache/conftool/dbconfig/20250805-091206-fceratto.json
  • 09:08 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e2-codfw
  • 09:07 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e2-codfw
  • 09:07 hashar@deploy1003: tstarling, hashar: Backport for Authorize self for Google Search Console (T400023) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 09:02 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e4-codfw
  • 09:02 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e4-codfw
  • 09:02 hashar@deploy1003: Started scap sync-world: Backport for Authorize self for Google Search Console (T400023)
  • 08:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e5-codfw
  • 08:59 mwpresync@deploy1003: Finished scap sync-world: testwikis to 1.45.0-wmf.13 refs T396374 (duration: 40m 12s)
  • 08:59 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device lsw1-e5-codfw
  • 08:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T399728)', diff saved to https://phabricator.wikimedia.org/P80784 and previous config saved to /var/cache/conftool/dbconfig/20250805-085658-fceratto.json
  • 08:54 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T399728)', diff saved to https://phabricator.wikimedia.org/P80783 and previous config saved to /var/cache/conftool/dbconfig/20250805-085424-fceratto.json
  • 08:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f4-codfw
  • 08:54 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f4-codfw
  • 08:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:54 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f4-codfw
  • 08:54 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f4-codfw
  • 08:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 08:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f2-codfw
  • 08:38 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device lsw1-f2-codfw
  • 08:19 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.45.0-wmf.13 refs T396374
  • 08:18 hashar: train: sudo systemctl start train-presync # T396374
  • 08:12 jelto@cumin1003: START - Cookbook sre.hosts.reimage for host gitlab2002.wikimedia.org with OS bookworm
  • 08:08 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:08 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: codfw Nokia switches mgmt - ayounsi@cumin1003"
  • 08:08 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: codfw Nokia switches mgmt - ayounsi@cumin1003"
  • 08:04 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 07:00 dcausse: repooling wdqs1021
  • 06:36 dcausse: restarting blazegraph on wdqs1021 (stuck)
  • 06:33 dcausse: repooling wdqs1016
  • 04:27 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 04:23 eileen: civicrm upgraded from f202b616 to e591fe72
  • 04:07 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 04:06 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 04:02 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 04:01 mwpresync@deploy1003: Pruned MediaWiki: 1.45.0-wmf.10 (duration: 01m 53s)
  • 03:08 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 03:07 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 02:45 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bookworm
  • 02:41 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 02:41 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 02:37 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 02:24 jhancock@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbprov2007.codfw.wmnet with OS bookworm
  • 02:23 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 02:20 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 02:19 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 02:09 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 01:47 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1042.eqiad.wmnet with OS bullseye
  • 01:41 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 01:37 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 01:34 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1043.eqiad.wmnet with OS bookworm
  • 01:34 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 01:17 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbprov2007
  • 01:16 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dbprov2007
  • 01:16 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:13 jhancock@cumin1003: START - Cookbook sre.dns.netbox
  • 01:13 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
  • 01:11 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 10m 57s)
  • 01:03 jhancock@cumin1003: START - Cookbook sre.hosts.reimage for host dbprov2007.codfw.wmnet with OS bookworm
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 01:00 jhancock@cumin1003: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['dbprov2007']
  • 00:59 jhancock@cumin1003: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['dbprov2007']
  • 00:53 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1043.eqiad.wmnet with reason: host reimage
  • 00:51 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbprov2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:47 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1043.eqiad.wmnet with reason: host reimage
  • 00:38 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host dbprov2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:29 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:28 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1043.eqiad.wmnet with OS bookworm
  • 00:25 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host dbprov2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:22 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1043.eqiad.wmnet with OS bookworm
  • 00:16 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:10 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host dbprov2007.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 00:10 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbprov2007
  • 00:10 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dbprov2007
  • 00:09 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:09 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbprov2007 to codfw - jhancock@cumin1003"
  • 00:09 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbprov2007 to codfw - jhancock@cumin1003"
  • 00:08 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1043.eqiad.wmnet with OS bookworm
  • 00:08 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1043.eqiad.wmnet with OS bookworm
  • 00:06 jhancock@cumin1003: START - Cookbook sre.dns.netbox

2025-08-04

  • 23:42 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1043.eqiad.wmnet with OS bookworm
  • 21:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T400854)', diff saved to https://phabricator.wikimedia.org/P80782 and previous config saved to /var/cache/conftool/dbconfig/20250804-214644-ladsgroup.json
  • 21:39 kemayo@deploy1003: Finished scap sync-world: Backport for Change search teardown focus to not use an over-broad route (T401090) (duration: 08m 08s)
  • 21:33 kemayo@deploy1003: kemayo: Continuing with sync
  • 21:32 kemayo@deploy1003: kemayo: Backport for Change search teardown focus to not use an over-broad route (T401090) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P80781 and previous config saved to /var/cache/conftool/dbconfig/20250804-213136-ladsgroup.json
  • 21:31 kemayo@deploy1003: Started scap sync-world: Backport for Change search teardown focus to not use an over-broad route (T401090)
  • 21:16 ebernhardson@deploy1003: Finished scap sync-world: Backport for Revert "cirrus: Start AB test of completion suggester fuzziness" (T397732), Clean up CirrusSearch settings on ex-wikipedia special wikis (T400062) (duration: 08m 06s)
  • 21:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P80780 and previous config saved to /var/cache/conftool/dbconfig/20250804-211628-ladsgroup.json
  • 21:14 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1043.eqiad.wmnet with OS bullseye
  • 21:11 ebernhardson@deploy1003: ebernhardson: Continuing with sync
  • 21:10 ebernhardson@deploy1003: ebernhardson: Backport for Revert "cirrus: Start AB test of completion suggester fuzziness" (T397732), Clean up CirrusSearch settings on ex-wikipedia special wikis (T400062) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 21:08 ebernhardson@deploy1003: Started scap sync-world: Backport for Revert "cirrus: Start AB test of completion suggester fuzziness" (T397732), Clean up CirrusSearch settings on ex-wikipedia special wikis (T400062)
  • 21:03 cjming@deploy1003: Finished scap sync-world: Backport for Clear edit count when unattaching local users for rename (T313900), fixStuckGlobalRename: Fix using actor_id from the wrong wiki (T398177), SessionManager: Add $sessionWriteReason to shutdown and when saves are triggered from the destructor (T400249) (duration: 07m 36s)
  • 21:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T400854)', diff saved to https://phabricator.wikimedia.org/P80779 and previous config saved to /var/cache/conftool/dbconfig/20250804-210119-ladsgroup.json
  • 20:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2222 (T400854)', diff saved to https://phabricator.wikimedia.org/P80778 and previous config saved to /var/cache/conftool/dbconfig/20250804-205837-ladsgroup.json
  • 20:58 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2222.codfw.wmnet with reason: Maintenance
  • 20:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T400854)', diff saved to https://phabricator.wikimedia.org/P80777 and previous config saved to /var/cache/conftool/dbconfig/20250804-205813-ladsgroup.json
  • 20:57 cjming@deploy1003: matmarex, cjming: Continuing with sync
  • 20:57 cjming@deploy1003: matmarex, cjming: Backport for Clear edit count when unattaching local users for rename (T313900), fixStuckGlobalRename: Fix using actor_id from the wrong wiki (T398177), SessionManager: Add $sessionWriteReason to shutdown and when saves are triggered from the destructor (T400249) synced to the testservers (see https://wikitech.wikimedia.org/w
  • 20:55 cjming@deploy1003: Started scap sync-world: Backport for Clear edit count when unattaching local users for rename (T313900), fixStuckGlobalRename: Fix using actor_id from the wrong wiki (T398177), SessionManager: Add $sessionWriteReason to shutdown and when saves are triggered from the destructor (T400249)
  • 20:45 ottomata: eventgate-analytics in eqiad cannot be deployed due to stuck helm STATUS: pending-upgrade. This needs to be deployed to rollback to a version that doesn't cause logspam. cc cwhite, rzl - T376026
  • 20:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P80776 and previous config saved to /var/cache/conftool/dbconfig/20250804-204305-ladsgroup.json
  • 20:39 Daimona: mwscript-k8s --comment="T397270" -f --file /srv/mediawiki/php-1.45.0-wmf.12/extensions/CampaignEvents/maintenance/countryExceptionMappings.csv -- CampaignEvents:UpdateCountriesColumn --wiki metawiki --exceptions countryExceptionMappings.csv --commit
  • 20:37 Daimona: mwscript-k8s --comment="T397270" -f --file /srv/mediawiki/php-1.45.0-wmf.12/extensions/CampaignEvents/maintenance/countryExceptionMappings.csv -- CampaignEvents:UpdateCountriesColumn --wiki officewiki --exceptions countryExceptionMappings.csv --commit
  • 20:36 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 20:36 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 20:35 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 20:35 Daimona: mwscript-k8s --comment="T397270" -f --file /srv/mediawiki/php-1.45.0-wmf.12/extensions/CampaignEvents/maintenance/countryExceptionMappings.csv -- CampaignEvents:UpdateCountriesColumn --wiki test2wiki --exceptions countryExceptionMappings.csv --commit
  • 20:34 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 20:34 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 20:34 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 20:34 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 20:33 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 20:33 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1043.eqiad.wmnet with OS bullseye
  • 20:33 Daimona: mwscript-k8s --comment="T397270" -f --file /srv/mediawiki/php-1.45.0-wmf.12/extensions/CampaignEvents/maintenance/countryExceptionMappings.csv -- CampaignEvents:UpdateCountriesColumn --wiki testwiki --exceptions countryExceptionMappings.csv --commit
  • 20:32 swfrench-wmf: reprepro include php8.3_8.3.24-1+wmf11u2 in component/php83 - T398245
  • 20:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P80771 and previous config saved to /var/cache/conftool/dbconfig/20250804-202754-ladsgroup.json
  • 20:26 Daimona: Re-run CampaignEvents country migration script in dry-run mode one last time for all wikis # T397270
  • 20:24 cjming@deploy1003: Finished scap sync-world: Backport for Add exceptions to country code migration script following test (T397270) (duration: 07m 30s)
  • 20:19 cjming@deploy1003: daimona, cjming: Continuing with sync
  • 20:19 cjming@deploy1003: daimona, cjming: Backport for Add exceptions to country code migration script following test (T397270) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:17 cjming@deploy1003: Started scap sync-world: Backport for Add exceptions to country code migration script following test (T397270)
  • 20:17 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:15 krinkle@deploy1003: Finished scap sync-world: Backport for Set wgCentralBannerRecorder to /beacon/… instead of //example.org/beacon/… (T400586) (duration: 09m 05s)
  • 20:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T400854)', diff saved to https://phabricator.wikimedia.org/P80769 and previous config saved to /var/cache/conftool/dbconfig/20250804-201246-ladsgroup.json
  • 20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2221 (T400854)', diff saved to https://phabricator.wikimedia.org/P80768 and previous config saved to /var/cache/conftool/dbconfig/20250804-201003-ladsgroup.json
  • 20:09 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2221.codfw.wmnet with reason: Maintenance
  • 20:09 krinkle@deploy1003: krinkle: Continuing with sync
  • 20:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T400854)', diff saved to https://phabricator.wikimedia.org/P80767 and previous config saved to /var/cache/conftool/dbconfig/20250804-200938-ladsgroup.json
  • 20:07 krinkle@deploy1003: krinkle: Backport for Set wgCentralBannerRecorder to /beacon/… instead of //example.org/beacon/… (T400586) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 20:06 krinkle@deploy1003: Started scap sync-world: Backport for Set wgCentralBannerRecorder to /beacon/… instead of //example.org/beacon/… (T400586)
  • 20:05 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 20:04 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1043.eqiad.wmnet with OS bullseye
  • 20:02 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 20:01 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 20:01 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 20:01 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 20:01 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 19:59 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 19:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P80765 and previous config saved to /var/cache/conftool/dbconfig/20250804-195431-ladsgroup.json
  • 19:50 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1043.eqiad.wmnet with OS bullseye
  • 19:39 rzl@deploy1003: mwscript-k8s job started: Version.php --wiki=urwiki # Testing --sal for T376776
  • 19:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P80764 and previous config saved to /var/cache/conftool/dbconfig/20250804-193923-ladsgroup.json
  • 19:38 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 19:37 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:36 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 19:36 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1043.eqiad.wmnet with OS bullseye
  • 19:35 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:35 ottomata: deploying eventgate-analytics and eventgate-main to pick up meta.dt field logic change - T376026
  • 19:35 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T400854)', diff saved to https://phabricator.wikimedia.org/P80763 and previous config saved to /var/cache/conftool/dbconfig/20250804-192415-ladsgroup.json
  • 19:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T400854)', diff saved to https://phabricator.wikimedia.org/P80762 and previous config saved to /var/cache/conftool/dbconfig/20250804-192129-ladsgroup.json
  • 19:21 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 19:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T400854)', diff saved to https://phabricator.wikimedia.org/P80761 and previous config saved to /var/cache/conftool/dbconfig/20250804-192107-ladsgroup.json
  • 19:20 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1043.eqiad.wmnet with OS bullseye
  • 19:19 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1043.eqiad.wmnet with OS bullseye
  • 19:17 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 19:12 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 19:12 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T399728)', diff saved to https://phabricator.wikimedia.org/P80760 and previous config saved to /var/cache/conftool/dbconfig/20250804-191213-fceratto.json
  • 19:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P80759 and previous config saved to /var/cache/conftool/dbconfig/20250804-190559-ladsgroup.json
  • 18:59 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:58 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:57 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P80758 and previous config saved to /var/cache/conftool/dbconfig/20250804-185705-fceratto.json
  • 18:55 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P80757 and previous config saved to /var/cache/conftool/dbconfig/20250804-185052-ladsgroup.json
  • 18:41 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P80756 and previous config saved to /var/cache/conftool/dbconfig/20250804-184156-fceratto.json
  • 18:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T400854)', diff saved to https://phabricator.wikimedia.org/P80755 and previous config saved to /var/cache/conftool/dbconfig/20250804-183543-ladsgroup.json
  • 18:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T400854)', diff saved to https://phabricator.wikimedia.org/P80754 and previous config saved to /var/cache/conftool/dbconfig/20250804-183259-ladsgroup.json
  • 18:32 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 18:31 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 18:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 18:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T400854)', diff saved to https://phabricator.wikimedia.org/P80753 and previous config saved to /var/cache/conftool/dbconfig/20250804-183033-ladsgroup.json
  • 18:26 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T399728)', diff saved to https://phabricator.wikimedia.org/P80752 and previous config saved to /var/cache/conftool/dbconfig/20250804-182649-fceratto.json
  • 18:24 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T399728)', diff saved to https://phabricator.wikimedia.org/P80751 and previous config saved to /var/cache/conftool/dbconfig/20250804-182420-fceratto.json
  • 18:24 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 18:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 18:23 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T399728)', diff saved to https://phabricator.wikimedia.org/P80750 and previous config saved to /var/cache/conftool/dbconfig/20250804-182309-fceratto.json
  • 18:23 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1043.eqiad.wmnet with OS bullseye
  • 18:20 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:19 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P80749 and previous config saved to /var/cache/conftool/dbconfig/20250804-181526-ladsgroup.json
  • 18:08 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P80748 and previous config saved to /var/cache/conftool/dbconfig/20250804-180801-fceratto.json
  • 18:06 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:06 swfrench@deploy1003: Finished scap sync-world: Deployment to pick up rebuilt mediawiki-httpd image (duration: 08m 33s)
  • 18:02 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 18:01 swfrench@deploy1003: swfrench: Continuing with sync
  • 18:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P80747 and previous config saved to /var/cache/conftool/dbconfig/20250804-180017-ladsgroup.json
  • 17:59 swfrench@deploy1003: swfrench: Deployment to pick up rebuilt mediawiki-httpd image synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:58 swfrench@deploy1003: Started scap sync-world: Deployment to pick up rebuilt mediawiki-httpd image
  • 17:54 dancy@deploy1003: Installation of scap version "4.195.0" completed for 2 hosts
  • 17:53 dancy@deploy1003: Installing scap version "4.195.0" for 2 host(s)
  • 17:52 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P80746 and previous config saved to /var/cache/conftool/dbconfig/20250804-175252-fceratto.json
  • 17:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T400854)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250804-174505-ladsgroup.json
  • 17:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T400854)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250804-174212-ladsgroup.json
  • 17:42 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 17:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T400854)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250804-174145-ladsgroup.json
  • 17:37 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T399728)', diff saved to https://phabricator.wikimedia.org/P80743 and previous config saved to /var/cache/conftool/dbconfig/20250804-173745-fceratto.json
  • 17:35 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T399728)', diff saved to https://phabricator.wikimedia.org/P80742 and previous config saved to /var/cache/conftool/dbconfig/20250804-173518-fceratto.json
  • 17:35 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 17:34 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T399728)', diff saved to https://phabricator.wikimedia.org/P80741 and previous config saved to /var/cache/conftool/dbconfig/20250804-173454-fceratto.json
  • 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P80740 and previous config saved to /var/cache/conftool/dbconfig/20250804-172637-ladsgroup.json
  • 17:19 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P80739 and previous config saved to /var/cache/conftool/dbconfig/20250804-171945-fceratto.json
  • 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P80738 and previous config saved to /var/cache/conftool/dbconfig/20250804-171130-ladsgroup.json
  • 17:04 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P80737 and previous config saved to /var/cache/conftool/dbconfig/20250804-170436-fceratto.json
  • 16:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T400854)', diff saved to https://phabricator.wikimedia.org/P80736 and previous config saved to /var/cache/conftool/dbconfig/20250804-165623-ladsgroup.json
  • 16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T400854)', diff saved to https://phabricator.wikimedia.org/P80735 and previous config saved to /var/cache/conftool/dbconfig/20250804-165335-ladsgroup.json
  • 16:53 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 16:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T400854)', diff saved to https://phabricator.wikimedia.org/P80734 and previous config saved to /var/cache/conftool/dbconfig/20250804-165312-ladsgroup.json
  • 16:49 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T399728)', diff saved to https://phabricator.wikimedia.org/P80733 and previous config saved to /var/cache/conftool/dbconfig/20250804-164928-fceratto.json
  • 16:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T399728)', diff saved to https://phabricator.wikimedia.org/P80732 and previous config saved to /var/cache/conftool/dbconfig/20250804-164759-fceratto.json
  • 16:47 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 16:47 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T399728)', diff saved to https://phabricator.wikimedia.org/P80731 and previous config saved to /var/cache/conftool/dbconfig/20250804-164736-fceratto.json
  • 16:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P80730 and previous config saved to /var/cache/conftool/dbconfig/20250804-163803-ladsgroup.json
  • 16:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P80729 and previous config saved to /var/cache/conftool/dbconfig/20250804-163226-fceratto.json
  • 16:31 Lucas_WMDE: lucaswerkmeister-wmde Deployed security patch for T401099
  • 16:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P80725 and previous config saved to /var/cache/conftool/dbconfig/20250804-162255-ladsgroup.json
  • 16:19 Daimona: Running maintenance script for T397270 in x1: testwiki, test2wiki, officewiki, wikishared
  • 16:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P80723 and previous config saved to /var/cache/conftool/dbconfig/20250804-161718-fceratto.json
  • 16:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T400854)', diff saved to https://phabricator.wikimedia.org/P80722 and previous config saved to /var/cache/conftool/dbconfig/20250804-160746-ladsgroup.json
  • 16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T400854)', diff saved to https://phabricator.wikimedia.org/P80721 and previous config saved to /var/cache/conftool/dbconfig/20250804-160456-ladsgroup.json
  • 16:04 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T400854)', diff saved to https://phabricator.wikimedia.org/P80720 and previous config saved to /var/cache/conftool/dbconfig/20250804-160433-ladsgroup.json
  • 16:04 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwmaint2002.codfw.wmnet
  • 16:04 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:04 jasmine@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mwmaint2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1003"
  • 16:03 jasmine@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mwmaint2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1003"
  • 16:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T399728)', diff saved to https://phabricator.wikimedia.org/P80719 and previous config saved to /var/cache/conftool/dbconfig/20250804-160210-fceratto.json
  • 15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T399728)', diff saved to https://phabricator.wikimedia.org/P80718 and previous config saved to /var/cache/conftool/dbconfig/20250804-155941-fceratto.json
  • 15:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 15:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T399728)', diff saved to https://phabricator.wikimedia.org/P80717 and previous config saved to /var/cache/conftool/dbconfig/20250804-155919-fceratto.json
  • 15:58 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bookworm
  • 15:57 jasmine@cumin1003: START - Cookbook sre.dns.netbox
  • 15:52 jasmine@cumin1003: START - Cookbook sre.hosts.decommission for hosts mwmaint2002.codfw.wmnet
  • 15:50 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm
  • 15:49 jasmine@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mwmaint1002.eqiad.wmnet
  • 15:49 jasmine@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:49 jasmine@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mwmaint1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1003"
  • 15:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P80716 and previous config saved to /var/cache/conftool/dbconfig/20250804-154925-ladsgroup.json
  • 15:49 jasmine@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mwmaint1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin1003"
  • 15:44 jasmine@cumin1003: START - Cookbook sre.dns.netbox
  • 15:44 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-airflow1007.eqiad.wmnet
  • 15:44 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1007.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 15:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P80715 and previous config saved to /var/cache/conftool/dbconfig/20250804-154410-fceratto.json
  • 15:41 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1007.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 15:39 jasmine@cumin1003: START - Cookbook sre.hosts.decommission for hosts mwmaint1002.eqiad.wmnet
  • 15:34 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:34 brouberol@cumin1003: START - Cookbook sre.dns.netbox
  • 15:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P80714 and previous config saved to /var/cache/conftool/dbconfig/20250804-153418-ladsgroup.json
  • 15:33 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1042.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:31 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:30 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:30 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:29 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-airflow1007.eqiad.wmnet
  • 15:29 jgreen@dns1004: END - running authdns-update
  • 15:29 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:29 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-airflow1006.eqiad.wmnet
  • 15:29 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P80713 and previous config saved to /var/cache/conftool/dbconfig/20250804-152903-fceratto.json
  • 15:28 jgreen@dns1004: START - running authdns-update
  • 15:28 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:27 vriley@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
  • 15:26 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1043
  • 15:26 brouberol@cumin1003: START - Cookbook sre.dns.netbox
  • 15:26 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1043
  • 15:25 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:25 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1043 - vriley@cumin1002"
  • 15:25 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt cloudcephosd1043 - vriley@cumin1002"
  • 15:24 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:24 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:23 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:22 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:22 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 15:21 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:20 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:19 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-airflow1006.eqiad.wmnet
  • 15:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T400854)', diff saved to https://phabricator.wikimedia.org/P80712 and previous config saved to /var/cache/conftool/dbconfig/20250804-151910-ladsgroup.json
  • 15:18 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-airflow1005.eqiad.wmnet
  • 15:18 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:18 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 15:18 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T400854)', diff saved to https://phabricator.wikimedia.org/P80710 and previous config saved to /var/cache/conftool/dbconfig/20250804-151621-ladsgroup.json
  • 15:16 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 15:15 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 15:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T400854)', diff saved to https://phabricator.wikimedia.org/P80709 and previous config saved to /var/cache/conftool/dbconfig/20250804-151526-ladsgroup.json
  • 15:13 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T399728)', diff saved to https://phabricator.wikimedia.org/P80708 and previous config saved to /var/cache/conftool/dbconfig/20250804-151355-fceratto.json
  • 15:11 brouberol@cumin1003: START - Cookbook sre.dns.netbox
  • 15:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T399728)', diff saved to https://phabricator.wikimedia.org/P80707 and previous config saved to /var/cache/conftool/dbconfig/20250804-151127-fceratto.json
  • 15:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 15:11 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T399728)', diff saved to https://phabricator.wikimedia.org/P80706 and previous config saved to /var/cache/conftool/dbconfig/20250804-151105-fceratto.json
  • 15:06 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-airflow1005.eqiad.wmnet
  • 15:05 kemayo@deploy1003: Finished scap sync-world: Backport for GutterSidebarEditCheckDialog: Guard against null bounding rects (duration: 08m 16s)
  • 15:04 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-airflow1004.eqiad.wmnet
  • 15:04 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:04 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 15:03 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 15:00 kemayo@deploy1003: kemayo: Continuing with sync
  • 15:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P80705 and previous config saved to /var/cache/conftool/dbconfig/20250804-150018-ladsgroup.json
  • 14:59 brouberol@cumin1003: START - Cookbook sre.dns.netbox
  • 14:59 kemayo@deploy1003: kemayo: Backport for GutterSidebarEditCheckDialog: Guard against null bounding rects synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:57 kemayo@deploy1003: Started scap sync-world: Backport for GutterSidebarEditCheckDialog: Guard against null bounding rects
  • 14:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P80704 and previous config saved to /var/cache/conftool/dbconfig/20250804-145557-fceratto.json
  • 14:54 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-airflow1004.eqiad.wmnet
  • 14:52 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-airflow1002.eqiad.wmnet
  • 14:52 brouberol@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:52 brouberol@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 14:51 brouberol@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-airflow1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1003"
  • 14:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P80703 and previous config saved to /var/cache/conftool/dbconfig/20250804-144509-ladsgroup.json
  • 14:45 brouberol@cumin1003: START - Cookbook sre.dns.netbox
  • 14:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P80702 and previous config saved to /var/cache/conftool/dbconfig/20250804-144050-fceratto.json
  • 14:38 brouberol@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-airflow1002.eqiad.wmnet
  • 14:32 Lucas_WMDE: UTC afternoon backport+config window hopefully done after some difficulties
  • 14:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T400854)', diff saved to https://phabricator.wikimedia.org/P80701 and previous config saved to /var/cache/conftool/dbconfig/20250804-143001-ladsgroup.json
  • 14:28 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Set wgCampaignEventsCountrySchemaMigrationStage to MIGRATION_WRITE_BOTH (T397476) (duration: 16m 17s)
  • 14:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T399728)', diff saved to https://phabricator.wikimedia.org/P80700 and previous config saved to /var/cache/conftool/dbconfig/20250804-142542-fceratto.json
  • 14:23 XioNoX: push pfw policies - https://phabricator.wikimedia.org/T400936
  • 14:23 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T399728)', diff saved to https://phabricator.wikimedia.org/P80699 and previous config saved to /var/cache/conftool/dbconfig/20250804-142314-fceratto.json
  • 14:23 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 14:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1253 (T400854)', diff saved to https://phabricator.wikimedia.org/P80698 and previous config saved to /var/cache/conftool/dbconfig/20250804-142132-ladsgroup.json
  • 14:21 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1253.eqiad.wmnet with reason: Maintenance
  • 14:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T400854)', diff saved to https://phabricator.wikimedia.org/P80697 and previous config saved to /var/cache/conftool/dbconfig/20250804-142109-ladsgroup.json
  • 14:19 lucaswerkmeister-wmde@deploy1003: daimona, lucaswerkmeister-wmde: Continuing with sync
  • 14:16 lucaswerkmeister-wmde@deploy1003: daimona, lucaswerkmeister-wmde: Backport for Set wgCampaignEventsCountrySchemaMigrationStage to MIGRATION_WRITE_BOTH (T397476) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:12 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Set wgCampaignEventsCountrySchemaMigrationStage to MIGRATION_WRITE_BOTH (T397476)
  • 14:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P80696 and previous config saved to /var/cache/conftool/dbconfig/20250804-140602-ladsgroup.json
  • 14:00 ebernhardson: T317599 start full-cluster reindex for eqiad/codfw/cloudelastic opensearch clusters
  • 13:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P80695 and previous config saved to /var/cache/conftool/dbconfig/20250804-135054-ladsgroup.json
  • 13:42 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
  • 13:42 jnuche@deploy1003: Finished deploy [releng/jenkins-deploy@b89eed0] (releasing): check fix for releases2003 (duration: 00m 26s)
  • 13:41 jnuche@deploy1003: Started deploy [releng/jenkins-deploy@b89eed0] (releasing): check fix for releases2003
  • 13:41 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
  • 13:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
  • 13:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
  • 13:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T400854)', diff saved to https://phabricator.wikimedia.org/P80694 and previous config saved to /var/cache/conftool/dbconfig/20250804-133547-ladsgroup.json
  • 13:34 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 13:33 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
  • 13:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T400854)', diff saved to https://phabricator.wikimedia.org/P80693 and previous config saved to /var/cache/conftool/dbconfig/20250804-133314-ladsgroup.json
  • 13:33 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 13:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T400854)', diff saved to https://phabricator.wikimedia.org/P80692 and previous config saved to /var/cache/conftool/dbconfig/20250804-133251-ladsgroup.json
  • 13:30 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, kharlan: Continuing with sync
  • 13:26 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, kharlan: Backport for UserInfoCard: Add config var for making UIC available (T400627), CheckUser: Make user info card feature discoverable (T398681) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:24 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for UserInfoCard: Add config var for making UIC available (T400627), CheckUser: Make user info card feature discoverable (T398681)
  • 13:22 lucaswerkmeister-wmde@deploy1003: Finished scap sync-world: Backport for Use tempaccounts.dblist to manage rollout wikis (T400672) (duration: 16m 31s)
  • 13:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P80691 and previous config saved to /var/cache/conftool/dbconfig/20250804-131744-ladsgroup.json
  • 13:15 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 13:14 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 13:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T399728)', diff saved to https://phabricator.wikimedia.org/P80690 and previous config saved to /var/cache/conftool/dbconfig/20250804-131417-fceratto.json
  • 13:14 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, stran: Continuing with sync
  • 13:07 lucaswerkmeister-wmde@deploy1003: lucaswerkmeister-wmde, stran: Backport for Use tempaccounts.dblist to manage rollout wikis (T400672) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 13:05 lucaswerkmeister-wmde@deploy1003: Started scap sync-world: Backport for Use tempaccounts.dblist to manage rollout wikis (T400672)
  • 13:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P80689 and previous config saved to /var/cache/conftool/dbconfig/20250804-130236-ladsgroup.json
  • 12:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P80688 and previous config saved to /var/cache/conftool/dbconfig/20250804-125909-fceratto.json
  • 12:50 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 12:50 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
  • 12:49 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
  • 12:48 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
  • 12:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T400854)', diff saved to https://phabricator.wikimedia.org/P80687 and previous config saved to /var/cache/conftool/dbconfig/20250804-124729-ladsgroup.json
  • 12:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T400854)', diff saved to https://phabricator.wikimedia.org/P80686 and previous config saved to /var/cache/conftool/dbconfig/20250804-124500-ladsgroup.json
  • 12:44 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T400854)', diff saved to https://phabricator.wikimedia.org/P80685 and previous config saved to /var/cache/conftool/dbconfig/20250804-124438-ladsgroup.json
  • 12:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P80684 and previous config saved to /var/cache/conftool/dbconfig/20250804-124402-fceratto.json
  • 12:41 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 12:40 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
  • 12:37 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
  • 12:36 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
  • 12:35 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 12:35 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
  • 12:34 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:33 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 12:31 dcausse: repooling wdqs1011
  • 12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P80683 and previous config saved to /var/cache/conftool/dbconfig/20250804-122931-ladsgroup.json
  • 12:28 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T399728)', diff saved to https://phabricator.wikimedia.org/P80682 and previous config saved to /var/cache/conftool/dbconfig/20250804-122855-fceratto.json
  • 12:26 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T399728)', diff saved to https://phabricator.wikimedia.org/P80681 and previous config saved to /var/cache/conftool/dbconfig/20250804-122614-fceratto.json
  • 12:26 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 12:25 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 12:24 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T399728)', diff saved to https://phabricator.wikimedia.org/P80680 and previous config saved to /var/cache/conftool/dbconfig/20250804-122454-fceratto.json
  • 12:22 dcausse: depooling & restarting blazegraph on wdqs1011 (stuck for 3hours)
  • 12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P80679 and previous config saved to /var/cache/conftool/dbconfig/20250804-121424-ladsgroup.json
  • 12:10 dcausse: depooling & restarting blazegraph on wdqs1016 (stuck for 7days)
  • 12:09 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P80678 and previous config saved to /var/cache/conftool/dbconfig/20250804-120946-fceratto.json
  • 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T400854)', diff saved to https://phabricator.wikimedia.org/P80677 and previous config saved to /var/cache/conftool/dbconfig/20250804-115917-ladsgroup.json
  • 11:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T400854)', diff saved to https://phabricator.wikimedia.org/P80676 and previous config saved to /var/cache/conftool/dbconfig/20250804-115649-ladsgroup.json
  • 11:56 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 11:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T400854)', diff saved to https://phabricator.wikimedia.org/P80675 and previous config saved to /var/cache/conftool/dbconfig/20250804-115626-ladsgroup.json
  • 11:54 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P80674 and previous config saved to /var/cache/conftool/dbconfig/20250804-115438-fceratto.json
  • 11:42 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 11:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P80673 and previous config saved to /var/cache/conftool/dbconfig/20250804-114119-ladsgroup.json
  • 11:39 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T399728)', diff saved to https://phabricator.wikimedia.org/P80672 and previous config saved to /var/cache/conftool/dbconfig/20250804-113931-fceratto.json
  • 11:39 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 11:36 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T399728)', diff saved to https://phabricator.wikimedia.org/P80671 and previous config saved to /var/cache/conftool/dbconfig/20250804-113649-fceratto.json
  • 11:36 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 11:36 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T399728)', diff saved to https://phabricator.wikimedia.org/P80670 and previous config saved to /var/cache/conftool/dbconfig/20250804-113625-fceratto.json
  • 11:28 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P80668 and previous config saved to /var/cache/conftool/dbconfig/20250804-112612-ladsgroup.json
  • 11:25 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:21 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P80667 and previous config saved to /var/cache/conftool/dbconfig/20250804-112118-fceratto.json
  • 11:16 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:16 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T400854)', diff saved to https://phabricator.wikimedia.org/P80666 and previous config saved to /var/cache/conftool/dbconfig/20250804-111103-ladsgroup.json
  • 11:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T400854)', diff saved to https://phabricator.wikimedia.org/P80665 and previous config saved to /var/cache/conftool/dbconfig/20250804-110834-ladsgroup.json
  • 11:08 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 11:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T400854)', diff saved to https://phabricator.wikimedia.org/P80664 and previous config saved to /var/cache/conftool/dbconfig/20250804-110811-ladsgroup.json
  • 11:06 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P80663 and previous config saved to /var/cache/conftool/dbconfig/20250804-110609-fceratto.json
  • 10:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P80662 and previous config saved to /var/cache/conftool/dbconfig/20250804-105303-ladsgroup.json
  • 10:51 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T399728)', diff saved to https://phabricator.wikimedia.org/P80661 and previous config saved to /var/cache/conftool/dbconfig/20250804-105101-fceratto.json
  • 10:48 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T399728)', diff saved to https://phabricator.wikimedia.org/P80660 and previous config saved to /var/cache/conftool/dbconfig/20250804-104823-fceratto.json
  • 10:48 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 10:48 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T399728)', diff saved to https://phabricator.wikimedia.org/P80659 and previous config saved to /var/cache/conftool/dbconfig/20250804-104800-fceratto.json
  • 10:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P80658 and previous config saved to /var/cache/conftool/dbconfig/20250804-103756-ladsgroup.json
  • 10:32 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P80657 and previous config saved to /var/cache/conftool/dbconfig/20250804-103252-fceratto.json
  • 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T400854)', diff saved to https://phabricator.wikimedia.org/P80656 and previous config saved to /var/cache/conftool/dbconfig/20250804-102248-ladsgroup.json
  • 10:17 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P80655 and previous config saved to /var/cache/conftool/dbconfig/20250804-101745-fceratto.json
  • 10:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T400854)', diff saved to https://phabricator.wikimedia.org/P80654 and previous config saved to /var/cache/conftool/dbconfig/20250804-101421-ladsgroup.json
  • 10:14 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 10:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T400854)', diff saved to https://phabricator.wikimedia.org/P80653 and previous config saved to /var/cache/conftool/dbconfig/20250804-101358-ladsgroup.json
  • 10:07 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 10:02 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T399728)', diff saved to https://phabricator.wikimedia.org/P80652 and previous config saved to /var/cache/conftool/dbconfig/20250804-100237-fceratto.json
  • 10:00 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T399728)', diff saved to https://phabricator.wikimedia.org/P80651 and previous config saved to /var/cache/conftool/dbconfig/20250804-095958-fceratto.json
  • 09:59 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 09:59 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T399728)', diff saved to https://phabricator.wikimedia.org/P80650 and previous config saved to /var/cache/conftool/dbconfig/20250804-095935-fceratto.json
  • 09:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P80649 and previous config saved to /var/cache/conftool/dbconfig/20250804-095851-ladsgroup.json
  • 09:46 fceratto@deploy1003: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
  • 09:44 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P80648 and previous config saved to /var/cache/conftool/dbconfig/20250804-094428-fceratto.json
  • 09:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P80647 and previous config saved to /var/cache/conftool/dbconfig/20250804-094343-ladsgroup.json
  • 09:29 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P80646 and previous config saved to /var/cache/conftool/dbconfig/20250804-092920-fceratto.json
  • 09:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T400854)', diff saved to https://phabricator.wikimedia.org/P80645 and previous config saved to /var/cache/conftool/dbconfig/20250804-092836-ladsgroup.json
  • 09:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T400854)', diff saved to https://phabricator.wikimedia.org/P80644 and previous config saved to /var/cache/conftool/dbconfig/20250804-092606-ladsgroup.json
  • 09:25 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 09:25 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 09:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T400854)', diff saved to https://phabricator.wikimedia.org/P80643 and previous config saved to /var/cache/conftool/dbconfig/20250804-092445-ladsgroup.json
  • 09:14 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T399728)', diff saved to https://phabricator.wikimedia.org/P80642 and previous config saved to /var/cache/conftool/dbconfig/20250804-091413-fceratto.json
  • 09:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T399728)', diff saved to https://phabricator.wikimedia.org/P80641 and previous config saved to /var/cache/conftool/dbconfig/20250804-091128-fceratto.json
  • 09:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 09:11 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 09:10 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T399728)', diff saved to https://phabricator.wikimedia.org/P80640 and previous config saved to /var/cache/conftool/dbconfig/20250804-091048-fceratto.json
  • 09:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P80639 and previous config saved to /var/cache/conftool/dbconfig/20250804-090938-ladsgroup.json
  • 08:55 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P80638 and previous config saved to /var/cache/conftool/dbconfig/20250804-085540-fceratto.json
  • 08:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P80637 and previous config saved to /var/cache/conftool/dbconfig/20250804-085430-ladsgroup.json
  • 08:40 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P80636 and previous config saved to /var/cache/conftool/dbconfig/20250804-084032-fceratto.json
  • 08:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T400854)', diff saved to https://phabricator.wikimedia.org/P80635 and previous config saved to /var/cache/conftool/dbconfig/20250804-083921-ladsgroup.json
  • 08:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T400854)', diff saved to https://phabricator.wikimedia.org/P80634 and previous config saved to /var/cache/conftool/dbconfig/20250804-083646-ladsgroup.json
  • 08:36 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 08:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T400854)', diff saved to https://phabricator.wikimedia.org/P80633 and previous config saved to /var/cache/conftool/dbconfig/20250804-083623-ladsgroup.json
  • 08:25 fceratto@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T399728)', diff saved to https://phabricator.wikimedia.org/P80632 and previous config saved to /var/cache/conftool/dbconfig/20250804-082524-fceratto.json
  • 08:22 fceratto@cumin1002: dbctl commit (dc=all): 'Depooling db1159 (T399728)', diff saved to https://phabricator.wikimedia.org/P80631 and previous config saved to /var/cache/conftool/dbconfig/20250804-082237-fceratto.json
  • 08:22 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 08:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P80630 and previous config saved to /var/cache/conftool/dbconfig/20250804-082116-ladsgroup.json
  • 08:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P80629 and previous config saved to /var/cache/conftool/dbconfig/20250804-080608-ladsgroup.json
  • 07:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T400854)', diff saved to https://phabricator.wikimedia.org/P80628 and previous config saved to /var/cache/conftool/dbconfig/20250804-075101-ladsgroup.json
  • 07:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T400854)', diff saved to https://phabricator.wikimedia.org/P80627 and previous config saved to /var/cache/conftool/dbconfig/20250804-074333-ladsgroup.json
  • 07:43 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:43 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 06:50 _joe_: deleting unhealthy thumbor pods
  • 06:26 _joe_: defragmented etcd k8s cluster in eqiad
  • 05:25 tstarling@deploy1003: Finished scap sync-world: Backport for In sitemap responses set CC: public (T400023) (duration: 37m 03s)
  • 05:13 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "UX improvements - oblivian@cumin1003"
  • 05:13 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: UX improvements - oblivian@cumin1003
  • 05:13 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: UX improvements - oblivian@cumin1003
  • 05:13 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "UX improvements - oblivian@cumin1003"
  • 05:13 tstarling@deploy1003: krinkle, tstarling: Continuing with sync
  • 05:09 tstarling@deploy1003: krinkle, tstarling: Backport for In sitemap responses set CC: public (T400023) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 04:48 tstarling@deploy1003: Started scap sync-world: Backport for In sitemap responses set CC: public (T400023)
  • 01:11 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 11m 03s)
  • 01:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T400854)', diff saved to https://phabricator.wikimedia.org/P80626 and previous config saved to /var/cache/conftool/dbconfig/20250804-010722-ladsgroup.json
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P80625 and previous config saved to /var/cache/conftool/dbconfig/20250804-005214-ladsgroup.json
  • 00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P80624 and previous config saved to /var/cache/conftool/dbconfig/20250804-003706-ladsgroup.json
  • 00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T400854)', diff saved to https://phabricator.wikimedia.org/P80623 and previous config saved to /var/cache/conftool/dbconfig/20250804-002159-ladsgroup.json
  • 00:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2238 (T400854)', diff saved to https://phabricator.wikimedia.org/P80622 and previous config saved to /var/cache/conftool/dbconfig/20250804-001908-ladsgroup.json
  • 00:19 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2238.codfw.wmnet with reason: Maintenance
  • 00:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T400854)', diff saved to https://phabricator.wikimedia.org/P80621 and previous config saved to /var/cache/conftool/dbconfig/20250804-001845-ladsgroup.json
  • 00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P80620 and previous config saved to /var/cache/conftool/dbconfig/20250804-000337-ladsgroup.json

2025-08-03

  • 23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P80619 and previous config saved to /var/cache/conftool/dbconfig/20250803-234829-ladsgroup.json
  • 23:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T400854)', diff saved to https://phabricator.wikimedia.org/P80618 and previous config saved to /var/cache/conftool/dbconfig/20250803-233322-ladsgroup.json
  • 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2226 (T400854)', diff saved to https://phabricator.wikimedia.org/P80617 and previous config saved to /var/cache/conftool/dbconfig/20250803-233037-ladsgroup.json
  • 23:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2226.codfw.wmnet with reason: Maintenance
  • 23:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T400854)', diff saved to https://phabricator.wikimedia.org/P80616 and previous config saved to /var/cache/conftool/dbconfig/20250803-233013-ladsgroup.json
  • 23:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P80615 and previous config saved to /var/cache/conftool/dbconfig/20250803-231505-ladsgroup.json
  • 22:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P80614 and previous config saved to /var/cache/conftool/dbconfig/20250803-225957-ladsgroup.json
  • 22:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T400854)', diff saved to https://phabricator.wikimedia.org/P80613 and previous config saved to /var/cache/conftool/dbconfig/20250803-224450-ladsgroup.json
  • 22:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2225 (T400854)', diff saved to https://phabricator.wikimedia.org/P80612 and previous config saved to /var/cache/conftool/dbconfig/20250803-224159-ladsgroup.json
  • 22:41 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2225.codfw.wmnet with reason: Maintenance
  • 22:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T400854)', diff saved to https://phabricator.wikimedia.org/P80611 and previous config saved to /var/cache/conftool/dbconfig/20250803-224147-ladsgroup.json
  • 22:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P80610 and previous config saved to /var/cache/conftool/dbconfig/20250803-222640-ladsgroup.json
  • 22:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P80609 and previous config saved to /var/cache/conftool/dbconfig/20250803-221132-ladsgroup.json
  • 21:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T400854)', diff saved to https://phabricator.wikimedia.org/P80608 and previous config saved to /var/cache/conftool/dbconfig/20250803-215625-ladsgroup.json
  • 21:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T400854)', diff saved to https://phabricator.wikimedia.org/P80607 and previous config saved to /var/cache/conftool/dbconfig/20250803-215335-ladsgroup.json
  • 21:53 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 21:51 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 21:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T400854)', diff saved to https://phabricator.wikimedia.org/P80606 and previous config saved to /var/cache/conftool/dbconfig/20250803-215131-ladsgroup.json
  • 21:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P80605 and previous config saved to /var/cache/conftool/dbconfig/20250803-213623-ladsgroup.json
  • 21:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P80604 and previous config saved to /var/cache/conftool/dbconfig/20250803-212116-ladsgroup.json
  • 21:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T400854)', diff saved to https://phabricator.wikimedia.org/P80603 and previous config saved to /var/cache/conftool/dbconfig/20250803-210608-ladsgroup.json
  • 21:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T400854)', diff saved to https://phabricator.wikimedia.org/P80602 and previous config saved to /var/cache/conftool/dbconfig/20250803-210318-ladsgroup.json
  • 21:03 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 21:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T400854)', diff saved to https://phabricator.wikimedia.org/P80601 and previous config saved to /var/cache/conftool/dbconfig/20250803-210255-ladsgroup.json
  • 20:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P80600 and previous config saved to /var/cache/conftool/dbconfig/20250803-204747-ladsgroup.json
  • 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P80599 and previous config saved to /var/cache/conftool/dbconfig/20250803-203238-ladsgroup.json
  • 20:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T400854)', diff saved to https://phabricator.wikimedia.org/P80598 and previous config saved to /var/cache/conftool/dbconfig/20250803-201730-ladsgroup.json
  • 20:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T400854)', diff saved to https://phabricator.wikimedia.org/P80597 and previous config saved to /var/cache/conftool/dbconfig/20250803-201435-ladsgroup.json
  • 20:14 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 20:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T400854)', diff saved to https://phabricator.wikimedia.org/P80596 and previous config saved to /var/cache/conftool/dbconfig/20250803-201412-ladsgroup.json
  • 19:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P80595 and previous config saved to /var/cache/conftool/dbconfig/20250803-195904-ladsgroup.json
  • 19:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P80594 and previous config saved to /var/cache/conftool/dbconfig/20250803-194357-ladsgroup.json
  • 19:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T400854)', diff saved to https://phabricator.wikimedia.org/P80593 and previous config saved to /var/cache/conftool/dbconfig/20250803-192846-ladsgroup.json
  • 19:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T400854)', diff saved to https://phabricator.wikimedia.org/P80592 and previous config saved to /var/cache/conftool/dbconfig/20250803-192551-ladsgroup.json
  • 19:25 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259 (T400854)', diff saved to https://phabricator.wikimedia.org/P80591 and previous config saved to /var/cache/conftool/dbconfig/20250803-192426-ladsgroup.json
  • 19:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P80590 and previous config saved to /var/cache/conftool/dbconfig/20250803-190919-ladsgroup.json
  • 18:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P80589 and previous config saved to /var/cache/conftool/dbconfig/20250803-185411-ladsgroup.json
  • 18:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1259 (T400854)', diff saved to https://phabricator.wikimedia.org/P80588 and previous config saved to /var/cache/conftool/dbconfig/20250803-183904-ladsgroup.json
  • 18:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1259 (T400854)', diff saved to https://phabricator.wikimedia.org/P80587 and previous config saved to /var/cache/conftool/dbconfig/20250803-183624-ladsgroup.json
  • 18:36 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1259.eqiad.wmnet with reason: Maintenance
  • 18:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T400854)', diff saved to https://phabricator.wikimedia.org/P80586 and previous config saved to /var/cache/conftool/dbconfig/20250803-183601-ladsgroup.json
  • 18:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P80585 and previous config saved to /var/cache/conftool/dbconfig/20250803-182054-ladsgroup.json
  • 18:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20250803-180541-ladsgroup.json
  • 17:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T400854)', diff saved to https://phabricator.wikimedia.org/P80583 and previous config saved to /var/cache/conftool/dbconfig/20250803-175034-ladsgroup.json
  • 17:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1254 (T400854)', diff saved to https://phabricator.wikimedia.org/P80582 and previous config saved to /var/cache/conftool/dbconfig/20250803-174354-ladsgroup.json
  • 17:43 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1254.eqiad.wmnet with reason: Maintenance
  • 17:42 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 17:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T400854)', diff saved to https://phabricator.wikimedia.org/P80581 and previous config saved to /var/cache/conftool/dbconfig/20250803-174235-ladsgroup.json
  • 17:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P80580 and previous config saved to /var/cache/conftool/dbconfig/20250803-172727-ladsgroup.json
  • 17:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P80579 and previous config saved to /var/cache/conftool/dbconfig/20250803-171218-ladsgroup.json
  • 16:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T400854)', diff saved to https://phabricator.wikimedia.org/P80578 and previous config saved to /var/cache/conftool/dbconfig/20250803-165710-ladsgroup.json
  • 16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T400854)', diff saved to https://phabricator.wikimedia.org/P80577 and previous config saved to /var/cache/conftool/dbconfig/20250803-165432-ladsgroup.json
  • 16:54 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 16:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T400854)', diff saved to https://phabricator.wikimedia.org/P80576 and previous config saved to /var/cache/conftool/dbconfig/20250803-165409-ladsgroup.json
  • 16:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P80575 and previous config saved to /var/cache/conftool/dbconfig/20250803-163901-ladsgroup.json
  • 16:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P80574 and previous config saved to /var/cache/conftool/dbconfig/20250803-162354-ladsgroup.json
  • 16:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T400854)', diff saved to https://phabricator.wikimedia.org/P80573 and previous config saved to /var/cache/conftool/dbconfig/20250803-160846-ladsgroup.json
  • 16:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T400854)', diff saved to https://phabricator.wikimedia.org/P80572 and previous config saved to /var/cache/conftool/dbconfig/20250803-160616-ladsgroup.json
  • 16:06 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 16:05 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T400854)', diff saved to https://phabricator.wikimedia.org/P80571 and previous config saved to /var/cache/conftool/dbconfig/20250803-160455-ladsgroup.json
  • 15:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P80570 and previous config saved to /var/cache/conftool/dbconfig/20250803-154947-ladsgroup.json
  • 15:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P80569 and previous config saved to /var/cache/conftool/dbconfig/20250803-153439-ladsgroup.json
  • 15:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T400854)', diff saved to https://phabricator.wikimedia.org/P80568 and previous config saved to /var/cache/conftool/dbconfig/20250803-151932-ladsgroup.json
  • 15:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T400854)', diff saved to https://phabricator.wikimedia.org/P80567 and previous config saved to /var/cache/conftool/dbconfig/20250803-151702-ladsgroup.json
  • 15:16 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 15:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T400854)', diff saved to https://phabricator.wikimedia.org/P80566 and previous config saved to /var/cache/conftool/dbconfig/20250803-151639-ladsgroup.json
  • 15:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P80565 and previous config saved to /var/cache/conftool/dbconfig/20250803-150132-ladsgroup.json
  • 14:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P80564 and previous config saved to /var/cache/conftool/dbconfig/20250803-144624-ladsgroup.json
  • 14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T400854)', diff saved to https://phabricator.wikimedia.org/P80563 and previous config saved to /var/cache/conftool/dbconfig/20250803-143117-ladsgroup.json
  • 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T400854)', diff saved to https://phabricator.wikimedia.org/P80562 and previous config saved to /var/cache/conftool/dbconfig/20250803-142847-ladsgroup.json
  • 14:28 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T400854)', diff saved to https://phabricator.wikimedia.org/P80561 and previous config saved to /var/cache/conftool/dbconfig/20250803-142824-ladsgroup.json
  • 14:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P80560 and previous config saved to /var/cache/conftool/dbconfig/20250803-141316-ladsgroup.json
  • 13:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P80559 and previous config saved to /var/cache/conftool/dbconfig/20250803-135808-ladsgroup.json
  • 13:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T400854)', diff saved to https://phabricator.wikimedia.org/P80558 and previous config saved to /var/cache/conftool/dbconfig/20250803-134300-ladsgroup.json
  • 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T400854)', diff saved to https://phabricator.wikimedia.org/P80557 and previous config saved to /var/cache/conftool/dbconfig/20250803-134019-ladsgroup.json
  • 13:40 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T400854)', diff saved to https://phabricator.wikimedia.org/P80556 and previous config saved to /var/cache/conftool/dbconfig/20250803-134008-ladsgroup.json
  • 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P80555 and previous config saved to /var/cache/conftool/dbconfig/20250803-132500-ladsgroup.json
  • 13:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P80554 and previous config saved to /var/cache/conftool/dbconfig/20250803-130952-ladsgroup.json
  • 12:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T400854)', diff saved to https://phabricator.wikimedia.org/P80553 and previous config saved to /var/cache/conftool/dbconfig/20250803-125444-ladsgroup.json
  • 12:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T400854)', diff saved to https://phabricator.wikimedia.org/P80552 and previous config saved to /var/cache/conftool/dbconfig/20250803-125214-ladsgroup.json
  • 12:52 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 12:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T400854)', diff saved to https://phabricator.wikimedia.org/P80551 and previous config saved to /var/cache/conftool/dbconfig/20250803-125152-ladsgroup.json
  • 12:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P80550 and previous config saved to /var/cache/conftool/dbconfig/20250803-123644-ladsgroup.json
  • 12:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P80549 and previous config saved to /var/cache/conftool/dbconfig/20250803-122136-ladsgroup.json
  • 12:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T400854)', diff saved to https://phabricator.wikimedia.org/P80548 and previous config saved to /var/cache/conftool/dbconfig/20250803-120629-ladsgroup.json
  • 12:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T400854)', diff saved to https://phabricator.wikimedia.org/P80547 and previous config saved to /var/cache/conftool/dbconfig/20250803-120346-ladsgroup.json
  • 12:03 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 12:03 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1156.eqiad.wmnet with reason: Maintenance

2025-08-02

  • 21:49 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2239.codfw.wmnet with reason: Maintenance
  • 21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T400854)', diff saved to https://phabricator.wikimedia.org/P80546 and previous config saved to /var/cache/conftool/dbconfig/20250802-214929-ladsgroup.json
  • 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P80544 and previous config saved to /var/cache/conftool/dbconfig/20250802-213421-ladsgroup.json
  • 21:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P80543 and previous config saved to /var/cache/conftool/dbconfig/20250802-211914-ladsgroup.json
  • 21:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T400854)', diff saved to https://phabricator.wikimedia.org/P80542 and previous config saved to /var/cache/conftool/dbconfig/20250802-210406-ladsgroup.json
  • 20:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2227 (T400854)', diff saved to https://phabricator.wikimedia.org/P80541 and previous config saved to /var/cache/conftool/dbconfig/20250802-204951-ladsgroup.json
  • 20:49 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2227.codfw.wmnet with reason: Maintenance
  • 20:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T400854)', diff saved to https://phabricator.wikimedia.org/P80540 and previous config saved to /var/cache/conftool/dbconfig/20250802-204928-ladsgroup.json
  • 20:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P80539 and previous config saved to /var/cache/conftool/dbconfig/20250802-203421-ladsgroup.json
  • 20:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P80538 and previous config saved to /var/cache/conftool/dbconfig/20250802-201913-ladsgroup.json
  • 20:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T400854)', diff saved to https://phabricator.wikimedia.org/P80537 and previous config saved to /var/cache/conftool/dbconfig/20250802-200405-ladsgroup.json
  • 19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T400854)', diff saved to https://phabricator.wikimedia.org/P80536 and previous config saved to /var/cache/conftool/dbconfig/20250802-194953-ladsgroup.json
  • 19:49 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T400854)', diff saved to https://phabricator.wikimedia.org/P80535 and previous config saved to /var/cache/conftool/dbconfig/20250802-194931-ladsgroup.json
  • 19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P80534 and previous config saved to /var/cache/conftool/dbconfig/20250802-193423-ladsgroup.json
  • 19:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P80533 and previous config saved to /var/cache/conftool/dbconfig/20250802-191915-ladsgroup.json
  • 19:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T400854)', diff saved to https://phabricator.wikimedia.org/P80532 and previous config saved to /var/cache/conftool/dbconfig/20250802-190408-ladsgroup.json
  • 18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T400854)', diff saved to https://phabricator.wikimedia.org/P80531 and previous config saved to /var/cache/conftool/dbconfig/20250802-184952-ladsgroup.json
  • 18:49 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T400854)', diff saved to https://phabricator.wikimedia.org/P80530 and previous config saved to /var/cache/conftool/dbconfig/20250802-184929-ladsgroup.json
  • 18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P80529 and previous config saved to /var/cache/conftool/dbconfig/20250802-183421-ladsgroup.json
  • 18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P80528 and previous config saved to /var/cache/conftool/dbconfig/20250802-181914-ladsgroup.json
  • 18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T400854)', diff saved to https://phabricator.wikimedia.org/P80527 and previous config saved to /var/cache/conftool/dbconfig/20250802-180406-ladsgroup.json
  • 17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T400854)', diff saved to https://phabricator.wikimedia.org/P80526 and previous config saved to /var/cache/conftool/dbconfig/20250802-174952-ladsgroup.json
  • 17:49 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T400854)', diff saved to https://phabricator.wikimedia.org/P80525 and previous config saved to /var/cache/conftool/dbconfig/20250802-174929-ladsgroup.json
  • 17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P80524 and previous config saved to /var/cache/conftool/dbconfig/20250802-173422-ladsgroup.json
  • 17:33 hnowlan: clean up some misbehaving thumbor pods
  • 17:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P80523 and previous config saved to /var/cache/conftool/dbconfig/20250802-171914-ladsgroup.json
  • 17:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T400854)', diff saved to https://phabricator.wikimedia.org/P80522 and previous config saved to /var/cache/conftool/dbconfig/20250802-170407-ladsgroup.json
  • 16:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T400854)', diff saved to https://phabricator.wikimedia.org/P80521 and previous config saved to /var/cache/conftool/dbconfig/20250802-165012-ladsgroup.json
  • 16:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 16:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T400854)', diff saved to https://phabricator.wikimedia.org/P80520 and previous config saved to /var/cache/conftool/dbconfig/20250802-164949-ladsgroup.json
  • 16:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P80519 and previous config saved to /var/cache/conftool/dbconfig/20250802-163441-ladsgroup.json
  • 16:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P80518 and previous config saved to /var/cache/conftool/dbconfig/20250802-161933-ladsgroup.json
  • 16:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T400854)', diff saved to https://phabricator.wikimedia.org/P80517 and previous config saved to /var/cache/conftool/dbconfig/20250802-160426-ladsgroup.json
  • 15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T400854)', diff saved to https://phabricator.wikimedia.org/P80516 and previous config saved to /var/cache/conftool/dbconfig/20250802-155032-ladsgroup.json
  • 15:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T400854)', diff saved to https://phabricator.wikimedia.org/P80515 and previous config saved to /var/cache/conftool/dbconfig/20250802-155008-ladsgroup.json
  • 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P80514 and previous config saved to /var/cache/conftool/dbconfig/20250802-153501-ladsgroup.json
  • 15:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P80513 and previous config saved to /var/cache/conftool/dbconfig/20250802-151953-ladsgroup.json
  • 15:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T400854)', diff saved to https://phabricator.wikimedia.org/P80512 and previous config saved to /var/cache/conftool/dbconfig/20250802-150446-ladsgroup.json
  • 14:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T400854)', diff saved to https://phabricator.wikimedia.org/P80511 and previous config saved to /var/cache/conftool/dbconfig/20250802-145049-ladsgroup.json
  • 14:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 14:47 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 14:43 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 14:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T400854)', diff saved to https://phabricator.wikimedia.org/P80510 and previous config saved to /var/cache/conftool/dbconfig/20250802-144311-ladsgroup.json
  • 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P80509 and previous config saved to /var/cache/conftool/dbconfig/20250802-142803-ladsgroup.json
  • 14:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P80508 and previous config saved to /var/cache/conftool/dbconfig/20250802-141256-ladsgroup.json
  • 13:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T400854)', diff saved to https://phabricator.wikimedia.org/P80507 and previous config saved to /var/cache/conftool/dbconfig/20250802-135748-ladsgroup.json
  • 13:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T400854)', diff saved to https://phabricator.wikimedia.org/P80506 and previous config saved to /var/cache/conftool/dbconfig/20250802-135234-ladsgroup.json
  • 13:52 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 13:52 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 13:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T400854)', diff saved to https://phabricator.wikimedia.org/P80505 and previous config saved to /var/cache/conftool/dbconfig/20250802-135152-ladsgroup.json
  • 13:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P80504 and previous config saved to /var/cache/conftool/dbconfig/20250802-133645-ladsgroup.json
  • 13:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P80503 and previous config saved to /var/cache/conftool/dbconfig/20250802-132137-ladsgroup.json
  • 13:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T400854)', diff saved to https://phabricator.wikimedia.org/P80502 and previous config saved to /var/cache/conftool/dbconfig/20250802-130629-ladsgroup.json
  • 13:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T400854)', diff saved to https://phabricator.wikimedia.org/P80501 and previous config saved to /var/cache/conftool/dbconfig/20250802-130143-ladsgroup.json
  • 13:01 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T400854)', diff saved to https://phabricator.wikimedia.org/P80500 and previous config saved to /var/cache/conftool/dbconfig/20250802-130120-ladsgroup.json
  • 12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P80499 and previous config saved to /var/cache/conftool/dbconfig/20250802-124612-ladsgroup.json
  • 12:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P80498 and previous config saved to /var/cache/conftool/dbconfig/20250802-123105-ladsgroup.json
  • 12:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T400854)', diff saved to https://phabricator.wikimedia.org/P80497 and previous config saved to /var/cache/conftool/dbconfig/20250802-121557-ladsgroup.json
  • 12:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T400854)', diff saved to https://phabricator.wikimedia.org/P80496 and previous config saved to /var/cache/conftool/dbconfig/20250802-121112-ladsgroup.json
  • 12:11 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 12:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T400854)', diff saved to https://phabricator.wikimedia.org/P80495 and previous config saved to /var/cache/conftool/dbconfig/20250802-121050-ladsgroup.json
  • 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P80494 and previous config saved to /var/cache/conftool/dbconfig/20250802-115542-ladsgroup.json
  • 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P80493 and previous config saved to /var/cache/conftool/dbconfig/20250802-114035-ladsgroup.json
  • 11:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T400854)', diff saved to https://phabricator.wikimedia.org/P80492 and previous config saved to /var/cache/conftool/dbconfig/20250802-112527-ladsgroup.json
  • 11:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T400854)', diff saved to https://phabricator.wikimedia.org/P80491 and previous config saved to /var/cache/conftool/dbconfig/20250802-112037-ladsgroup.json
  • 11:20 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 11:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T400854)', diff saved to https://phabricator.wikimedia.org/P80490 and previous config saved to /var/cache/conftool/dbconfig/20250802-112015-ladsgroup.json
  • 11:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P80489 and previous config saved to /var/cache/conftool/dbconfig/20250802-110507-ladsgroup.json
  • 10:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P80488 and previous config saved to /var/cache/conftool/dbconfig/20250802-104959-ladsgroup.json
  • 10:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T400854)', diff saved to https://phabricator.wikimedia.org/P80487 and previous config saved to /var/cache/conftool/dbconfig/20250802-103452-ladsgroup.json
  • 10:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T400854)', diff saved to https://phabricator.wikimedia.org/P80486 and previous config saved to /var/cache/conftool/dbconfig/20250802-103001-ladsgroup.json
  • 10:29 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 10:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T400854)', diff saved to https://phabricator.wikimedia.org/P80485 and previous config saved to /var/cache/conftool/dbconfig/20250802-102938-ladsgroup.json
  • 10:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P80484 and previous config saved to /var/cache/conftool/dbconfig/20250802-101431-ladsgroup.json
  • 09:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P80483 and previous config saved to /var/cache/conftool/dbconfig/20250802-095923-ladsgroup.json
  • 09:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T400854)', diff saved to https://phabricator.wikimedia.org/P80482 and previous config saved to /var/cache/conftool/dbconfig/20250802-094416-ladsgroup.json
  • 09:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T400854)', diff saved to https://phabricator.wikimedia.org/P80481 and previous config saved to /var/cache/conftool/dbconfig/20250802-093924-ladsgroup.json
  • 09:39 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 09:36 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 01:11 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 10m 52s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image

2025-08-01

  • 23:57 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_main on wdqs1022.eqiad.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20250714/ using stat1009.eqiad.wmnet)
  • 21:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T400854)', diff saved to https://phabricator.wikimedia.org/P80480 and previous config saved to /var/cache/conftool/dbconfig/20250801-213802-ladsgroup.json
  • 21:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P80479 and previous config saved to /var/cache/conftool/dbconfig/20250801-212254-ladsgroup.json
  • 21:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228', diff saved to https://phabricator.wikimedia.org/P80478 and previous config saved to /var/cache/conftool/dbconfig/20250801-210746-ladsgroup.json
  • 20:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2228 (T400854)', diff saved to https://phabricator.wikimedia.org/P80477 and previous config saved to /var/cache/conftool/dbconfig/20250801-205239-ladsgroup.json
  • 20:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2228 (T400854)', diff saved to https://phabricator.wikimedia.org/P80476 and previous config saved to /var/cache/conftool/dbconfig/20250801-204903-ladsgroup.json
  • 20:48 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2228.codfw.wmnet with reason: Maintenance
  • 20:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T400854)', diff saved to https://phabricator.wikimedia.org/P80475 and previous config saved to /var/cache/conftool/dbconfig/20250801-204840-ladsgroup.json
  • 20:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P80474 and previous config saved to /var/cache/conftool/dbconfig/20250801-203332-ladsgroup.json
  • 20:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223', diff saved to https://phabricator.wikimedia.org/P80473 and previous config saved to /var/cache/conftool/dbconfig/20250801-201825-ladsgroup.json
  • 20:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2223 (T400854)', diff saved to https://phabricator.wikimedia.org/P80472 and previous config saved to /var/cache/conftool/dbconfig/20250801-200317-ladsgroup.json
  • 19:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2223 (T400854)', diff saved to https://phabricator.wikimedia.org/P80471 and previous config saved to /var/cache/conftool/dbconfig/20250801-195940-ladsgroup.json
  • 19:59 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2223.codfw.wmnet with reason: Maintenance
  • 19:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T400854)', diff saved to https://phabricator.wikimedia.org/P80470 and previous config saved to /var/cache/conftool/dbconfig/20250801-195917-ladsgroup.json
  • 19:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P80468 and previous config saved to /var/cache/conftool/dbconfig/20250801-194409-ladsgroup.json
  • 19:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P80467 and previous config saved to /var/cache/conftool/dbconfig/20250801-192901-ladsgroup.json
  • 19:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T400854)', diff saved to https://phabricator.wikimedia.org/P80466 and previous config saved to /var/cache/conftool/dbconfig/20250801-191354-ladsgroup.json
  • 19:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T400854)', diff saved to https://phabricator.wikimedia.org/P80465 and previous config saved to /var/cache/conftool/dbconfig/20250801-191016-ladsgroup.json
  • 19:10 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 19:08 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T400854)', diff saved to https://phabricator.wikimedia.org/P80464 and previous config saved to /var/cache/conftool/dbconfig/20250801-190817-ladsgroup.json
  • 18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P80463 and previous config saved to /var/cache/conftool/dbconfig/20250801-185310-ladsgroup.json
  • 18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P80462 and previous config saved to /var/cache/conftool/dbconfig/20250801-183802-ladsgroup.json
  • 18:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T400854)', diff saved to https://phabricator.wikimedia.org/P80461 and previous config saved to /var/cache/conftool/dbconfig/20250801-182254-ladsgroup.json
  • 18:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T400854)', diff saved to https://phabricator.wikimedia.org/P80460 and previous config saved to /var/cache/conftool/dbconfig/20250801-182017-ladsgroup.json
  • 18:20 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T400854)', diff saved to https://phabricator.wikimedia.org/P80459 and previous config saved to /var/cache/conftool/dbconfig/20250801-181954-ladsgroup.json
  • 18:16 cjming@deploy1003: Finished scap sync-world: Backport for Revert^2 "MetricsPlatform: Disable synchronous configs fetching" (duration: 09m 13s)
  • 18:10 cjming@deploy1003: cjming: Continuing with sync
  • 18:09 cjming@deploy1003: cjming: Backport for Revert^2 "MetricsPlatform: Disable synchronous configs fetching" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 18:07 cjming@deploy1003: Started scap sync-world: Backport for Revert^2 "MetricsPlatform: Disable synchronous configs fetching"
  • 18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P80458 and previous config saved to /var/cache/conftool/dbconfig/20250801-180447-ladsgroup.json
  • 17:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P80457 and previous config saved to /var/cache/conftool/dbconfig/20250801-174939-ladsgroup.json
  • 17:35 cjming@deploy1003: Finished scap sync-world: Backport for Enable AA test on all wikis (T399486) (duration: 08m 06s)
  • 17:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T400854)', diff saved to https://phabricator.wikimedia.org/P80456 and previous config saved to /var/cache/conftool/dbconfig/20250801-173431-ladsgroup.json
  • 17:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T400854)', diff saved to https://phabricator.wikimedia.org/P80455 and previous config saved to /var/cache/conftool/dbconfig/20250801-173056-ladsgroup.json
  • 17:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 17:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T400854)', diff saved to https://phabricator.wikimedia.org/P80454 and previous config saved to /var/cache/conftool/dbconfig/20250801-173033-ladsgroup.json
  • 17:30 cjming@deploy1003: ksarabia, cjming: Continuing with sync
  • 17:29 cjming@deploy1003: ksarabia, cjming: Backport for Enable AA test on all wikis (T399486) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 17:27 cjming@deploy1003: Started scap sync-world: Backport for Enable AA test on all wikis (T399486)
  • 17:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P80453 and previous config saved to /var/cache/conftool/dbconfig/20250801-171525-ladsgroup.json
  • 17:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P80452 and previous config saved to /var/cache/conftool/dbconfig/20250801-170018-ladsgroup.json
  • 16:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T400854)', diff saved to https://phabricator.wikimedia.org/P80451 and previous config saved to /var/cache/conftool/dbconfig/20250801-164510-ladsgroup.json
  • 16:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T400854)', diff saved to https://phabricator.wikimedia.org/P80450 and previous config saved to /var/cache/conftool/dbconfig/20250801-164134-ladsgroup.json
  • 16:41 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 16:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T400854)', diff saved to https://phabricator.wikimedia.org/P80449 and previous config saved to /var/cache/conftool/dbconfig/20250801-164111-ladsgroup.json
  • 16:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P80448 and previous config saved to /var/cache/conftool/dbconfig/20250801-162603-ladsgroup.json
  • 16:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P80447 and previous config saved to /var/cache/conftool/dbconfig/20250801-161056-ladsgroup.json
  • 16:08 jly@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:08 jly@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:08 jly@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:08 jly@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:08 jly@deploy1003: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:07 jly@deploy1003: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:07 jly@deploy1003: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 16:07 jly@deploy1003: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 16:07 jly@deploy1003: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 16:07 jly@deploy1003: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T400854)', diff saved to https://phabricator.wikimedia.org/P80446 and previous config saved to /var/cache/conftool/dbconfig/20250801-155548-ladsgroup.json
  • 15:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T400854)', diff saved to https://phabricator.wikimedia.org/P80445 and previous config saved to /var/cache/conftool/dbconfig/20250801-155212-ladsgroup.json
  • 15:52 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 15:51 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T400854)', diff saved to https://phabricator.wikimedia.org/P80444 and previous config saved to /var/cache/conftool/dbconfig/20250801-155024-ladsgroup.json
  • 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P80442 and previous config saved to /var/cache/conftool/dbconfig/20250801-153516-ladsgroup.json
  • 15:30 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:22 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P80441 and previous config saved to /var/cache/conftool/dbconfig/20250801-152009-ladsgroup.json
  • 15:13 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:12 ayounsi@cumin1003: END (FAIL) - Cookbook sre.network.provision (exit_code=99) for device lsw1-e2-codfw.mgmt.codfw.wmnet
  • 15:12 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:12 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for lsw1-e2-codfw - ayounsi@cumin1003"
  • 15:12 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove management record for lsw1-e2-codfw - ayounsi@cumin1003"
  • 15:09 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 15:08 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 15:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T400854)', diff saved to https://phabricator.wikimedia.org/P80440 and previous config saved to /var/cache/conftool/dbconfig/20250801-150501-ladsgroup.json
  • 15:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T400854)', diff saved to https://phabricator.wikimedia.org/P80439 and previous config saved to /var/cache/conftool/dbconfig/20250801-150228-ladsgroup.json
  • 15:02 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 15:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T400854)', diff saved to https://phabricator.wikimedia.org/P80438 and previous config saved to /var/cache/conftool/dbconfig/20250801-150119-ladsgroup.json
  • 14:53 cjming@deploy1003: Finished scap sync-world: Backport for Revert "MetricsPlatform: Disable synchronous configs fetching" (duration: 08m 50s)
  • 14:48 cjming@deploy1003: cjming: Continuing with sync
  • 14:46 cjming@deploy1003: cjming: Backport for Revert "MetricsPlatform: Disable synchronous configs fetching" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 14:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P80437 and previous config saved to /var/cache/conftool/dbconfig/20250801-144611-ladsgroup.json
  • 14:44 cjming@deploy1003: Started scap sync-world: Backport for Revert "MetricsPlatform: Disable synchronous configs fetching"
  • 14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P80436 and previous config saved to /var/cache/conftool/dbconfig/20250801-143104-ladsgroup.json
  • 14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T400854)', diff saved to https://phabricator.wikimedia.org/P80435 and previous config saved to /var/cache/conftool/dbconfig/20250801-141553-ladsgroup.json
  • 14:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T400854)', diff saved to https://phabricator.wikimedia.org/P80434 and previous config saved to /var/cache/conftool/dbconfig/20250801-141320-ladsgroup.json
  • 14:13 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 14:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T400854)', diff saved to https://phabricator.wikimedia.org/P80433 and previous config saved to /var/cache/conftool/dbconfig/20250801-141308-ladsgroup.json
  • 14:05 elukey: upgrade redis-server and tools package on idm nodes for security upgrades
  • 13:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P80432 and previous config saved to /var/cache/conftool/dbconfig/20250801-135800-ladsgroup.json
  • 13:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P80431 and previous config saved to /var/cache/conftool/dbconfig/20250801-134253-ladsgroup.json
  • 13:37 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:37 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e2-codfw - ayounsi@cumin1003"
  • 13:37 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e2-codfw - ayounsi@cumin1003"
  • 13:33 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
  • 13:33 ayounsi@cumin1003: START - Cookbook sre.network.provision for device lsw1-e2-codfw.mgmt.codfw.wmnet
  • 13:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T400854)', diff saved to https://phabricator.wikimedia.org/P80430 and previous config saved to /var/cache/conftool/dbconfig/20250801-132745-ladsgroup.json
  • 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T400854)', diff saved to https://phabricator.wikimedia.org/P80429 and previous config saved to /var/cache/conftool/dbconfig/20250801-132514-ladsgroup.json
  • 13:25 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 13:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T400854)', diff saved to https://phabricator.wikimedia.org/P80428 and previous config saved to /var/cache/conftool/dbconfig/20250801-132451-ladsgroup.json
  • 13:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P80427 and previous config saved to /var/cache/conftool/dbconfig/20250801-130943-ladsgroup.json
  • 12:58 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 274685
  • 12:57 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 274685
  • 12:57 Amir1: re-running recountCategories.php on all wikis except s4 and s1 (T400987)
  • 12:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263252
  • 12:57 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 263252
  • 12:57 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 37662
  • 12:56 jiji@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 12:56 jiji@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 12:56 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 37662
  • 12:55 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 5400
  • 12:54 ladsgroup@deploy1003: Finished scap sync-world: Backport for recountCategories: Avoid escpaing column name (T400987) (duration: 08m 36s)
  • 12:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P80426 and previous config saved to /var/cache/conftool/dbconfig/20250801-125436-ladsgroup.json
  • 12:53 ayounsi@cumin1003: START - Cookbook sre.network.peering with action 'email' for AS: 5400
  • 12:49 ladsgroup@deploy1003: ladsgroup: Continuing with sync
  • 12:48 ladsgroup@deploy1003: ladsgroup: Backport for recountCategories: Avoid escpaing column name (T400987) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 12:47 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:46 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:46 ladsgroup@deploy1003: Started scap sync-world: Backport for recountCategories: Avoid escpaing column name (T400987)
  • 12:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T400854)', diff saved to https://phabricator.wikimedia.org/P80424 and previous config saved to /var/cache/conftool/dbconfig/20250801-123928-ladsgroup.json
  • 12:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T400854)', diff saved to https://phabricator.wikimedia.org/P80423 and previous config saved to /var/cache/conftool/dbconfig/20250801-123057-ladsgroup.json
  • 12:30 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T400854)', diff saved to https://phabricator.wikimedia.org/P80422 and previous config saved to /var/cache/conftool/dbconfig/20250801-123034-ladsgroup.json
  • 12:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P80421 and previous config saved to /var/cache/conftool/dbconfig/20250801-121526-ladsgroup.json
  • 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P80420 and previous config saved to /var/cache/conftool/dbconfig/20250801-120019-ladsgroup.json
  • 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T400854)', diff saved to https://phabricator.wikimedia.org/P80419 and previous config saved to /var/cache/conftool/dbconfig/20250801-114511-ladsgroup.json
  • 11:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T400854)', diff saved to https://phabricator.wikimedia.org/P80418 and previous config saved to /var/cache/conftool/dbconfig/20250801-114238-ladsgroup.json
  • 11:42 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:42 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 11:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T400854)', diff saved to https://phabricator.wikimedia.org/P80417 and previous config saved to /var/cache/conftool/dbconfig/20250801-114155-ladsgroup.json
  • 11:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P80415 and previous config saved to /var/cache/conftool/dbconfig/20250801-112647-ladsgroup.json
  • 11:24 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 11:18 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159', diff saved to https://phabricator.wikimedia.org/P80414 and previous config saved to /var/cache/conftool/dbconfig/20250801-111139-ladsgroup.json
  • 10:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1159 (T400854)', diff saved to https://phabricator.wikimedia.org/P80413 and previous config saved to /var/cache/conftool/dbconfig/20250801-105631-ladsgroup.json
  • 10:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1159 (T400854)', diff saved to https://phabricator.wikimedia.org/P80412 and previous config saved to /var/cache/conftool/dbconfig/20250801-105400-ladsgroup.json
  • 10:53 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1159.eqiad.wmnet with reason: Maintenance
  • 10:18 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bookworm
  • 10:14 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm
  • 10:01 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bookworm
  • 09:52 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm
  • 09:44 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bookworm
  • 09:38 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bookworm
  • 09:38 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye
  • 09:32 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 09:02 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2043.codfw.wmnet with OS bullseye
  • 08:55 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS bullseye
  • 08:06 bwojtowicz@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 08:01 bwojtowicz@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 07:55 bwojtowicz@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 07:42 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
  • 07:41 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
  • 06:04 tstarling@deploy1003: Finished scap sync-world: Backport for Enable sitemaps API (T400023) (duration: 49m 59s)
  • 05:59 tstarling@deploy1003: tstarling: Continuing with sync
  • 05:16 tstarling@deploy1003: tstarling: Backport for Enable sitemaps API (T400023) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
  • 05:14 tstarling@deploy1003: Started scap sync-world: Backport for Enable sitemaps API (T400023)
  • 01:11 mwpresync@deploy1003: Finished scap build-images: Publishing wmf/next image (duration: 10m 44s)
  • 01:00 mwpresync@deploy1003: Started scap build-images: Publishing wmf/next image
  • 00:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T400854)', diff saved to https://phabricator.wikimedia.org/P80408 and previous config saved to /var/cache/conftool/dbconfig/20250801-005907-ladsgroup.json
  • 00:51 eileen: * civicrm upgraded from 82a5306d to f202b616
  • 00:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P80407 and previous config saved to /var/cache/conftool/dbconfig/20250801-004359-ladsgroup.json
  • 00:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P80406 and previous config saved to /var/cache/conftool/dbconfig/20250801-002852-ladsgroup.json
  • 00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T400854)', diff saved to https://phabricator.wikimedia.org/P80405 and previous config saved to /var/cache/conftool/dbconfig/20250801-001345-ladsgroup.json
  • 00:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2224 (T400854)', diff saved to https://phabricator.wikimedia.org/P80404 and previous config saved to /var/cache/conftool/dbconfig/20250801-001119-ladsgroup.json
  • 00:11 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2224.codfw.wmnet with reason: Maintenance
  • 00:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T400854)', diff saved to https://phabricator.wikimedia.org/P80403 and previous config saved to /var/cache/conftool/dbconfig/20250801-001055-ladsgroup.json



Other archives

2000s

2010s

2020-2024

2025-present