Server Admin Log/Archive 102
Appearance
2026-02-28
- 16:33 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 16:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 (T418465)', diff saved to https://phabricator.wikimedia.org/P89270 and previous config saved to /var/cache/conftool/dbconfig/20260228-163249-marostegui.json
- 16:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P89269 and previous config saved to /var/cache/conftool/dbconfig/20260228-161741-marostegui.json
- 16:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259', diff saved to https://phabricator.wikimedia.org/P89268 and previous config saved to /var/cache/conftool/dbconfig/20260228-160233-marostegui.json
- 15:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1259 (T418465)', diff saved to https://phabricator.wikimedia.org/P89267 and previous config saved to /var/cache/conftool/dbconfig/20260228-154725-marostegui.json
- 15:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1259 (T418465)', diff saved to https://phabricator.wikimedia.org/P89266 and previous config saved to /var/cache/conftool/dbconfig/20260228-154129-marostegui.json
- 15:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1259.eqiad.wmnet with reason: Maintenance
- 15:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T418465)', diff saved to https://phabricator.wikimedia.org/P89265 and previous config saved to /var/cache/conftool/dbconfig/20260228-154104-marostegui.json
- 15:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P89264 and previous config saved to /var/cache/conftool/dbconfig/20260228-152556-marostegui.json
- 15:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254', diff saved to https://phabricator.wikimedia.org/P89263 and previous config saved to /var/cache/conftool/dbconfig/20260228-151048-marostegui.json
- 14:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1254 (T418465)', diff saved to https://phabricator.wikimedia.org/P89262 and previous config saved to /var/cache/conftool/dbconfig/20260228-145540-marostegui.json
- 14:50 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1254 (T418465)', diff saved to https://phabricator.wikimedia.org/P89261 and previous config saved to /var/cache/conftool/dbconfig/20260228-145003-marostegui.json
- 14:49 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1254.eqiad.wmnet with reason: Maintenance
- 14:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T418465)', diff saved to https://phabricator.wikimedia.org/P89260 and previous config saved to /var/cache/conftool/dbconfig/20260228-144905-marostegui.json
- 14:46 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 14:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T418465)', diff saved to https://phabricator.wikimedia.org/P89259 and previous config saved to /var/cache/conftool/dbconfig/20260228-144602-marostegui.json
- 14:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P89258 and previous config saved to /var/cache/conftool/dbconfig/20260228-143358-marostegui.json
- 14:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P89257 and previous config saved to /var/cache/conftool/dbconfig/20260228-143055-marostegui.json
- 14:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2238', diff saved to https://phabricator.wikimedia.org/P89256 and previous config saved to /var/cache/conftool/dbconfig/20260228-141849-marostegui.json
- 14:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P89255 and previous config saved to /var/cache/conftool/dbconfig/20260228-141546-marostegui.json
- 14:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2238 (T418465)', diff saved to https://phabricator.wikimedia.org/P89254 and previous config saved to /var/cache/conftool/dbconfig/20260228-140341-marostegui.json
- 14:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T418465)', diff saved to https://phabricator.wikimedia.org/P89253 and previous config saved to /var/cache/conftool/dbconfig/20260228-140038-marostegui.json
- 13:58 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2238 (T418465)', diff saved to https://phabricator.wikimedia.org/P89252 and previous config saved to /var/cache/conftool/dbconfig/20260228-135759-marostegui.json
- 13:57 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2238.codfw.wmnet with reason: Maintenance
- 13:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T418465)', diff saved to https://phabricator.wikimedia.org/P89251 and previous config saved to /var/cache/conftool/dbconfig/20260228-135734-marostegui.json
- 13:54 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1233 (T418465)', diff saved to https://phabricator.wikimedia.org/P89250 and previous config saved to /var/cache/conftool/dbconfig/20260228-135446-marostegui.json
- 13:54 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1233.eqiad.wmnet with reason: Maintenance
- 13:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T418465)', diff saved to https://phabricator.wikimedia.org/P89249 and previous config saved to /var/cache/conftool/dbconfig/20260228-135421-marostegui.json
- 13:42 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P89248 and previous config saved to /var/cache/conftool/dbconfig/20260228-134227-marostegui.json
- 13:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P89247 and previous config saved to /var/cache/conftool/dbconfig/20260228-133913-marostegui.json
- 13:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2226', diff saved to https://phabricator.wikimedia.org/P89246 and previous config saved to /var/cache/conftool/dbconfig/20260228-132718-marostegui.json
- 13:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P89245 and previous config saved to /var/cache/conftool/dbconfig/20260228-132404-marostegui.json
- 13:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2226 (T418465)', diff saved to https://phabricator.wikimedia.org/P89244 and previous config saved to /var/cache/conftool/dbconfig/20260228-131210-marostegui.json
- 13:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2226 (T418465)', diff saved to https://phabricator.wikimedia.org/P89243 and previous config saved to /var/cache/conftool/dbconfig/20260228-130938-marostegui.json
- 13:09 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2226.codfw.wmnet with reason: Maintenance
- 13:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T418465)', diff saved to https://phabricator.wikimedia.org/P89242 and previous config saved to /var/cache/conftool/dbconfig/20260228-130913-marostegui.json
- 13:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T418465)', diff saved to https://phabricator.wikimedia.org/P89241 and previous config saved to /var/cache/conftool/dbconfig/20260228-130857-marostegui.json
- 13:03 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1229 (T418465)', diff saved to https://phabricator.wikimedia.org/P89240 and previous config saved to /var/cache/conftool/dbconfig/20260228-130308-marostegui.json
- 13:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1229.eqiad.wmnet with reason: Maintenance
- 12:59 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 12:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T418465)', diff saved to https://phabricator.wikimedia.org/P89239 and previous config saved to /var/cache/conftool/dbconfig/20260228-125843-marostegui.json
- 12:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P89238 and previous config saved to /var/cache/conftool/dbconfig/20260228-125405-marostegui.json
- 12:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P89237 and previous config saved to /var/cache/conftool/dbconfig/20260228-124335-marostegui.json
- 12:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2225', diff saved to https://phabricator.wikimedia.org/P89236 and previous config saved to /var/cache/conftool/dbconfig/20260228-123857-marostegui.json
- 12:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P89235 and previous config saved to /var/cache/conftool/dbconfig/20260228-122827-marostegui.json
- 12:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2225 (T418465)', diff saved to https://phabricator.wikimedia.org/P89234 and previous config saved to /var/cache/conftool/dbconfig/20260228-122348-marostegui.json
- 12:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2225 (T418465)', diff saved to https://phabricator.wikimedia.org/P89233 and previous config saved to /var/cache/conftool/dbconfig/20260228-121753-marostegui.json
- 12:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2225.codfw.wmnet with reason: Maintenance
- 12:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T418465)', diff saved to https://phabricator.wikimedia.org/P89232 and previous config saved to /var/cache/conftool/dbconfig/20260228-121727-marostegui.json
- 12:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T418465)', diff saved to https://phabricator.wikimedia.org/P89231 and previous config saved to /var/cache/conftool/dbconfig/20260228-121318-marostegui.json
- 12:11 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1197 (T418465)', diff saved to https://phabricator.wikimedia.org/P89230 and previous config saved to /var/cache/conftool/dbconfig/20260228-121102-marostegui.json
- 12:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1197.eqiad.wmnet with reason: Maintenance
- 12:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T418465)', diff saved to https://phabricator.wikimedia.org/P89229 and previous config saved to /var/cache/conftool/dbconfig/20260228-121037-marostegui.json
- 12:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P89228 and previous config saved to /var/cache/conftool/dbconfig/20260228-120220-marostegui.json
- 11:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P89227 and previous config saved to /var/cache/conftool/dbconfig/20260228-115530-marostegui.json
- 11:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P89226 and previous config saved to /var/cache/conftool/dbconfig/20260228-114711-marostegui.json
- 11:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P89225 and previous config saved to /var/cache/conftool/dbconfig/20260228-114021-marostegui.json
- 11:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T418465)', diff saved to https://phabricator.wikimedia.org/P89224 and previous config saved to /var/cache/conftool/dbconfig/20260228-113203-marostegui.json
- 11:29 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2204 (T418465)', diff saved to https://phabricator.wikimedia.org/P89223 and previous config saved to /var/cache/conftool/dbconfig/20260228-112931-marostegui.json
- 11:29 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2204.codfw.wmnet with reason: Maintenance
- 11:25 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 11:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T418465)', diff saved to https://phabricator.wikimedia.org/P89222 and previous config saved to /var/cache/conftool/dbconfig/20260228-112513-marostegui.json
- 11:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T418465)', diff saved to https://phabricator.wikimedia.org/P89221 and previous config saved to /var/cache/conftool/dbconfig/20260228-112503-marostegui.json
- 11:22 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1188 (T418465)', diff saved to https://phabricator.wikimedia.org/P89220 and previous config saved to /var/cache/conftool/dbconfig/20260228-112256-marostegui.json
- 11:22 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1188.eqiad.wmnet with reason: Maintenance
- 11:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T418465)', diff saved to https://phabricator.wikimedia.org/P89219 and previous config saved to /var/cache/conftool/dbconfig/20260228-112231-marostegui.json
- 11:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P89218 and previous config saved to /var/cache/conftool/dbconfig/20260228-110954-marostegui.json
- 11:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P89217 and previous config saved to /var/cache/conftool/dbconfig/20260228-110723-marostegui.json
- 10:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P89216 and previous config saved to /var/cache/conftool/dbconfig/20260228-105446-marostegui.json
- 10:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P89215 and previous config saved to /var/cache/conftool/dbconfig/20260228-105214-marostegui.json
- 10:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T418465)', diff saved to https://phabricator.wikimedia.org/P89214 and previous config saved to /var/cache/conftool/dbconfig/20260228-103938-marostegui.json
- 10:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T418465)', diff saved to https://phabricator.wikimedia.org/P89213 and previous config saved to /var/cache/conftool/dbconfig/20260228-103706-marostegui.json
- 10:33 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2189 (T418465)', diff saved to https://phabricator.wikimedia.org/P89212 and previous config saved to /var/cache/conftool/dbconfig/20260228-103352-marostegui.json
- 10:33 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2189.codfw.wmnet with reason: Maintenance
- 10:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T418465)', diff saved to https://phabricator.wikimedia.org/P89211 and previous config saved to /var/cache/conftool/dbconfig/20260228-103327-marostegui.json
- 10:29 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1182 (T418465)', diff saved to https://phabricator.wikimedia.org/P89210 and previous config saved to /var/cache/conftool/dbconfig/20260228-102952-marostegui.json
- 10:29 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1182.eqiad.wmnet with reason: Maintenance
- 10:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T418465)', diff saved to https://phabricator.wikimedia.org/P89209 and previous config saved to /var/cache/conftool/dbconfig/20260228-102927-marostegui.json
- 10:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P89208 and previous config saved to /var/cache/conftool/dbconfig/20260228-101818-marostegui.json
- 10:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P89207 and previous config saved to /var/cache/conftool/dbconfig/20260228-101419-marostegui.json
- 10:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P89206 and previous config saved to /var/cache/conftool/dbconfig/20260228-100310-marostegui.json
- 09:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P89205 and previous config saved to /var/cache/conftool/dbconfig/20260228-095911-marostegui.json
- 09:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T418465)', diff saved to https://phabricator.wikimedia.org/P89204 and previous config saved to /var/cache/conftool/dbconfig/20260228-094802-marostegui.json
- 09:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T418465)', diff saved to https://phabricator.wikimedia.org/P89203 and previous config saved to /var/cache/conftool/dbconfig/20260228-094402-marostegui.json
- 09:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2175 (T418465)', diff saved to https://phabricator.wikimedia.org/P89202 and previous config saved to /var/cache/conftool/dbconfig/20260228-094157-marostegui.json
- 09:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2175.codfw.wmnet with reason: Maintenance
- 09:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1162 (T418465)', diff saved to https://phabricator.wikimedia.org/P89201 and previous config saved to /var/cache/conftool/dbconfig/20260228-094146-marostegui.json
- 09:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1162.eqiad.wmnet with reason: Maintenance
- 09:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T418465)', diff saved to https://phabricator.wikimedia.org/P89200 and previous config saved to /var/cache/conftool/dbconfig/20260228-094133-marostegui.json
- 09:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T418465)', diff saved to https://phabricator.wikimedia.org/P89199 and previous config saved to /var/cache/conftool/dbconfig/20260228-094122-marostegui.json
- 09:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P89198 and previous config saved to /var/cache/conftool/dbconfig/20260228-092625-marostegui.json
- 09:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P89197 and previous config saved to /var/cache/conftool/dbconfig/20260228-092614-marostegui.json
- 09:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P89196 and previous config saved to /var/cache/conftool/dbconfig/20260228-091116-marostegui.json
- 09:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P89195 and previous config saved to /var/cache/conftool/dbconfig/20260228-091105-marostegui.json
- 08:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T418465)', diff saved to https://phabricator.wikimedia.org/P89194 and previous config saved to /var/cache/conftool/dbconfig/20260228-085608-marostegui.json
- 08:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T418465)', diff saved to https://phabricator.wikimedia.org/P89193 and previous config saved to /var/cache/conftool/dbconfig/20260228-085557-marostegui.json
- 08:50 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1156 (T418465)', diff saved to https://phabricator.wikimedia.org/P89192 and previous config saved to /var/cache/conftool/dbconfig/20260228-084957-marostegui.json
- 08:49 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2148.codfw.wmnet with reason: Maintenance
- 08:49 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 08:49 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1156.eqiad.wmnet with reason: Maintenance
- 06:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2209.codfw.wmnet with reason: Maintenance
- 06:31 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 58s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-27
- 23:02 cscott@deploy2002: Finished scap sync-world: Backport for Ensure that Parsoid canonical HTML is not language converted (T418549) (duration: 08m 47s)
- 22:58 cscott@deploy2002: cscott: Continuing with sync
- 22:55 cscott@deploy2002: cscott: Backport for Ensure that Parsoid canonical HTML is not language converted (T418549) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:53 cscott@deploy2002: Started scap sync-world: Backport for Ensure that Parsoid canonical HTML is not language converted (T418549)
- 21:38 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2058.codfw.wmnet with OS trixie
- 21:36 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2057.codfw.wmnet with OS trixie
- 21:17 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2058.codfw.wmnet with reason: host reimage
- 21:15 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2057.codfw.wmnet with reason: host reimage
- 21:14 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 21:14 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 21:12 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2056.codfw.wmnet with OS trixie
- 21:11 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2058.codfw.wmnet with reason: host reimage
- 21:11 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2055.codfw.wmnet with OS trixie
- 21:09 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2057.codfw.wmnet with reason: host reimage
- 21:08 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2054.codfw.wmnet with OS trixie
- 21:00 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 21:00 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 20:57 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp2058.codfw.wmnet with OS trixie
- 20:55 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp2057.codfw.wmnet with OS trixie
- 20:51 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2056.codfw.wmnet with reason: host reimage
- 20:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2239.codfw.wmnet with reason: Maintenance
- 20:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T418465)', diff saved to https://phabricator.wikimedia.org/P89190 and previous config saved to /var/cache/conftool/dbconfig/20260227-205031-marostegui.json
- 20:48 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2055.codfw.wmnet with reason: host reimage
- 20:44 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2054.codfw.wmnet with reason: host reimage
- 20:41 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2056.codfw.wmnet with reason: host reimage
- 20:40 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2055.codfw.wmnet with reason: host reimage
- 20:39 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2054.codfw.wmnet with reason: host reimage
- 20:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P89189 and previous config saved to /var/cache/conftool/dbconfig/20260227-203523-marostegui.json
- 20:27 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp2056.codfw.wmnet with OS trixie
- 20:26 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp2055.codfw.wmnet with OS trixie
- 20:25 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp2054.codfw.wmnet with OS trixie
- 20:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P89188 and previous config saved to /var/cache/conftool/dbconfig/20260227-202015-marostegui.json
- 20:07 ryankemper: [WDQS] `ryankemper@wdqs1014:~$ sudo systemctl restart wdqs-blazegraph` (lag was high, see https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&from=2026-02-27T18:05:46.506Z&to=2026-02-27T20:03:42.806Z&timezone=utc&var-cluster_name=wdqs-main&var-graph_type=%289102%7C919%5B35%5D%29&refresh=1m&viewPanel=panel-8)
- 20:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T418465)', diff saved to https://phabricator.wikimedia.org/P89187 and previous config saved to /var/cache/conftool/dbconfig/20260227-200507-marostegui.json
- 19:59 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:59 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:57 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:57 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2227 (T418465)', diff saved to https://phabricator.wikimedia.org/P89186 and previous config saved to /var/cache/conftool/dbconfig/20260227-194051-marostegui.json
- 19:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2227.codfw.wmnet with reason: Maintenance
- 19:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T418465)', diff saved to https://phabricator.wikimedia.org/P89185 and previous config saved to /var/cache/conftool/dbconfig/20260227-194026-marostegui.json
- 19:32 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:31 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:30 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:30 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P89183 and previous config saved to /var/cache/conftool/dbconfig/20260227-192517-marostegui.json
- 19:18 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:18 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:16 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:16 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:16 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:16 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:15 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:15 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:15 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:15 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P89182 and previous config saved to /var/cache/conftool/dbconfig/20260227-191009-marostegui.json
- 18:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T418465)', diff saved to https://phabricator.wikimedia.org/P89181 and previous config saved to /var/cache/conftool/dbconfig/20260227-185500-marostegui.json
- 18:53 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 18:53 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 18:34 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 18:34 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 18:33 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 18:32 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 18:30 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2205 (T418465)', diff saved to https://phabricator.wikimedia.org/P89180 and previous config saved to /var/cache/conftool/dbconfig/20260227-183038-marostegui.json
- 18:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2205.codfw.wmnet with reason: Maintenance
- 18:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T418465)', diff saved to https://phabricator.wikimedia.org/P89179 and previous config saved to /var/cache/conftool/dbconfig/20260227-183013-marostegui.json
- 18:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 18:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 18:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T418465)', diff saved to https://phabricator.wikimedia.org/P89178 and previous config saved to /var/cache/conftool/dbconfig/20260227-181632-marostegui.json
- 18:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P89177 and previous config saved to /var/cache/conftool/dbconfig/20260227-181505-marostegui.json
- 18:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P89176 and previous config saved to /var/cache/conftool/dbconfig/20260227-180123-marostegui.json
- 18:00 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 18:00 ebernhardson@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 17:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P89175 and previous config saved to /var/cache/conftool/dbconfig/20260227-175957-marostegui.json
- 17:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P89174 and previous config saved to /var/cache/conftool/dbconfig/20260227-174615-marostegui.json
- 17:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T418465)', diff saved to https://phabricator.wikimedia.org/P89173 and previous config saved to /var/cache/conftool/dbconfig/20260227-174448-marostegui.json
- 17:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T418465)', diff saved to https://phabricator.wikimedia.org/P89172 and previous config saved to /var/cache/conftool/dbconfig/20260227-173107-marostegui.json
- 17:21 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1223 (T418465)', diff saved to https://phabricator.wikimedia.org/P89171 and previous config saved to /var/cache/conftool/dbconfig/20260227-172111-marostegui.json
- 17:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 17:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T418465)', diff saved to https://phabricator.wikimedia.org/P89170 and previous config saved to /var/cache/conftool/dbconfig/20260227-172046-marostegui.json
- 17:20 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2194 (T418465)', diff saved to https://phabricator.wikimedia.org/P89169 and previous config saved to /var/cache/conftool/dbconfig/20260227-172036-marostegui.json
- 17:20 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2194.codfw.wmnet with reason: Maintenance
- 17:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T418465)', diff saved to https://phabricator.wikimedia.org/P89168 and previous config saved to /var/cache/conftool/dbconfig/20260227-172011-marostegui.json
- 17:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P89167 and previous config saved to /var/cache/conftool/dbconfig/20260227-170530-marostegui.json
- 17:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P89166 and previous config saved to /var/cache/conftool/dbconfig/20260227-170503-marostegui.json
- 16:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P89165 and previous config saved to /var/cache/conftool/dbconfig/20260227-165022-marostegui.json
- 16:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P89164 and previous config saved to /var/cache/conftool/dbconfig/20260227-164954-marostegui.json
- 16:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T418465)', diff saved to https://phabricator.wikimedia.org/P89163 and previous config saved to /var/cache/conftool/dbconfig/20260227-163514-marostegui.json
- 16:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T418465)', diff saved to https://phabricator.wikimedia.org/P89162 and previous config saved to /var/cache/conftool/dbconfig/20260227-163446-marostegui.json
- 16:28 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1212 (T418465)', diff saved to https://phabricator.wikimedia.org/P89161 and previous config saved to /var/cache/conftool/dbconfig/20260227-162806-marostegui.json
- 16:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on 6 hosts with reason: Maintenance
- 16:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 16:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T418465)', diff saved to https://phabricator.wikimedia.org/P89160 and previous config saved to /var/cache/conftool/dbconfig/20260227-162741-marostegui.json
- 16:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P89159 and previous config saved to /var/cache/conftool/dbconfig/20260227-161233-marostegui.json
- 16:11 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2190 (T418465)', diff saved to https://phabricator.wikimedia.org/P89158 and previous config saved to /var/cache/conftool/dbconfig/20260227-161127-marostegui.json
- 16:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 16:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T418465)', diff saved to https://phabricator.wikimedia.org/P89157 and previous config saved to /var/cache/conftool/dbconfig/20260227-161103-marostegui.json
- 15:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P89156 and previous config saved to /var/cache/conftool/dbconfig/20260227-155724-marostegui.json
- 15:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P89155 and previous config saved to /var/cache/conftool/dbconfig/20260227-155554-marostegui.json
- 15:55 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 15:54 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 15:51 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 15:51 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 15:42 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T418465)', diff saved to https://phabricator.wikimedia.org/P89154 and previous config saved to /var/cache/conftool/dbconfig/20260227-154216-marostegui.json
- 15:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P89153 and previous config saved to /var/cache/conftool/dbconfig/20260227-154046-marostegui.json
- 15:40 tgr@deploy2002: Finished scap sync-world: Backport for session: Log stack trace for JWT errors, tests: Fix missing JWT issuer for CentralAuthSessionProvider (T418487 T415007), session: Log stack trace for JWT errors (duration: 08m 11s)
- 15:36 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1198 (T418465)', diff saved to https://phabricator.wikimedia.org/P89152 and previous config saved to /var/cache/conftool/dbconfig/20260227-153606-marostegui.json
- 15:36 tgr@deploy2002: tgr: Continuing with sync
- 15:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 15:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T418465)', diff saved to https://phabricator.wikimedia.org/P89151 and previous config saved to /var/cache/conftool/dbconfig/20260227-153541-marostegui.json
- 15:33 tgr@deploy2002: tgr: Backport for session: Log stack trace for JWT errors, tests: Fix missing JWT issuer for CentralAuthSessionProvider (T418487 T415007), session: Log stack trace for JWT errors synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:32 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:31 tgr@deploy2002: Started scap sync-world: Backport for session: Log stack trace for JWT errors, tests: Fix missing JWT issuer for CentralAuthSessionProvider (T418487 T415007), session: Log stack trace for JWT errors
- 15:31 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:31 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 15:30 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 15:26 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 15:26 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 15:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T418465)', diff saved to https://phabricator.wikimedia.org/P89150 and previous config saved to /var/cache/conftool/dbconfig/20260227-152538-marostegui.json
- 15:24 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 15:24 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 15:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P89149 and previous config saved to /var/cache/conftool/dbconfig/20260227-152033-marostegui.json
- 15:06 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 15:05 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 15:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P89148 and previous config saved to /var/cache/conftool/dbconfig/20260227-150525-marostegui.json
- 15:05 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 15:05 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 15:03 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2177 (T418465)', diff saved to https://phabricator.wikimedia.org/P89147 and previous config saved to /var/cache/conftool/dbconfig/20260227-150322-marostegui.json
- 15:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 15:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T418465)', diff saved to https://phabricator.wikimedia.org/P89146 and previous config saved to /var/cache/conftool/dbconfig/20260227-150257-marostegui.json
- 14:53 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 14:53 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 14:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T418465)', diff saved to https://phabricator.wikimedia.org/P89144 and previous config saved to /var/cache/conftool/dbconfig/20260227-145016-marostegui.json
- 14:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P89143 and previous config saved to /var/cache/conftool/dbconfig/20260227-144749-marostegui.json
- 14:45 tgr_: emergency deploy for T415007#11658252 done
- 14:44 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1175 (T418465)', diff saved to https://phabricator.wikimedia.org/P89142 and previous config saved to /var/cache/conftool/dbconfig/20260227-144407-marostegui.json
- 14:43 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 14:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T418465)', diff saved to https://phabricator.wikimedia.org/P89141 and previous config saved to /var/cache/conftool/dbconfig/20260227-144353-marostegui.json
- 14:41 tgr@deploy2002: Finished scap sync-world: Backport for Revert "Enable JWT session cookie for bot passwords (all wikis)" (T415007) (duration: 13m 32s)
- 14:37 tgr@deploy2002: tgr: Continuing with sync
- 14:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P89140 and previous config saved to /var/cache/conftool/dbconfig/20260227-143241-marostegui.json
- 14:29 tgr@deploy2002: tgr: Backport for Revert "Enable JWT session cookie for bot passwords (all wikis)" (T415007) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P89139 and previous config saved to /var/cache/conftool/dbconfig/20260227-142844-marostegui.json
- 14:27 tgr@deploy2002: Started scap sync-world: Backport for Revert "Enable JWT session cookie for bot passwords (all wikis)" (T415007)
- 14:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T418465)', diff saved to https://phabricator.wikimedia.org/P89138 and previous config saved to /var/cache/conftool/dbconfig/20260227-141732-marostegui.json
- 14:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P89137 and previous config saved to /var/cache/conftool/dbconfig/20260227-141336-marostegui.json
- 14:10 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 14:10 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 13:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T418465)', diff saved to https://phabricator.wikimedia.org/P89136 and previous config saved to /var/cache/conftool/dbconfig/20260227-135827-marostegui.json
- 13:55 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2156 (T418465)', diff saved to https://phabricator.wikimedia.org/P89135 and previous config saved to /var/cache/conftool/dbconfig/20260227-135516-marostegui.json
- 13:55 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 13:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T418465)', diff saved to https://phabricator.wikimedia.org/P89134 and previous config saved to /var/cache/conftool/dbconfig/20260227-135451-marostegui.json
- 13:48 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1166 (T418465)', diff saved to https://phabricator.wikimedia.org/P89133 and previous config saved to /var/cache/conftool/dbconfig/20260227-134855-marostegui.json
- 13:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 13:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T418465)', diff saved to https://phabricator.wikimedia.org/P89132 and previous config saved to /var/cache/conftool/dbconfig/20260227-134831-marostegui.json
- 13:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P89131 and previous config saved to /var/cache/conftool/dbconfig/20260227-133943-marostegui.json
- 13:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P89130 and previous config saved to /var/cache/conftool/dbconfig/20260227-133323-marostegui.json
- 13:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P89129 and previous config saved to /var/cache/conftool/dbconfig/20260227-132434-marostegui.json
- 13:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P89128 and previous config saved to /var/cache/conftool/dbconfig/20260227-131815-marostegui.json
- 13:15 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:13 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T418465)', diff saved to https://phabricator.wikimedia.org/P89127 and previous config saved to /var/cache/conftool/dbconfig/20260227-130926-marostegui.json
- 13:06 jelto@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
- 13:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T418465)', diff saved to https://phabricator.wikimedia.org/P89126 and previous config saved to /var/cache/conftool/dbconfig/20260227-130306-marostegui.json
- 13:00 jelto@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
- 12:56 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1157 (T418465)', diff saved to https://phabricator.wikimedia.org/P89125 and previous config saved to /var/cache/conftool/dbconfig/20260227-125654-marostegui.json
- 12:56 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 12:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 12:47 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2149 (T418465)', diff saved to https://phabricator.wikimedia.org/P89124 and previous config saved to /var/cache/conftool/dbconfig/20260227-124711-marostegui.json
- 12:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 12:47 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1149.eqiad.wmnet
- 12:43 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2229.codfw.wmnet with reason: Maintenance
- 12:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T418465)', diff saved to https://phabricator.wikimedia.org/P89123 and previous config saved to /var/cache/conftool/dbconfig/20260227-124029-marostegui.json
- 12:35 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1149.eqiad.wmnet
- 12:35 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1148.eqiad.wmnet
- 12:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P89122 and previous config saved to /var/cache/conftool/dbconfig/20260227-122521-marostegui.json
- 12:24 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1148.eqiad.wmnet
- 12:24 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1147.eqiad.wmnet
- 12:15 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1147.eqiad.wmnet
- 12:15 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1146.eqiad.wmnet
- 12:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224', diff saved to https://phabricator.wikimedia.org/P89121 and previous config saved to /var/cache/conftool/dbconfig/20260227-121012-marostegui.json
- 12:06 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1146.eqiad.wmnet
- 12:06 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1145.eqiad.wmnet
- 11:55 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1145.eqiad.wmnet
- 11:55 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1144.eqiad.wmnet
- 11:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2224 (T418465)', diff saved to https://phabricator.wikimedia.org/P89120 and previous config saved to /var/cache/conftool/dbconfig/20260227-115504-marostegui.json
- 11:50 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2224 (T418465)', diff saved to https://phabricator.wikimedia.org/P89119 and previous config saved to /var/cache/conftool/dbconfig/20260227-115026-marostegui.json
- 11:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2224.codfw.wmnet with reason: Maintenance
- 11:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T418465)', diff saved to https://phabricator.wikimedia.org/P89118 and previous config saved to /var/cache/conftool/dbconfig/20260227-115012-marostegui.json
- 11:43 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1144.eqiad.wmnet
- 11:43 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1143.eqiad.wmnet
- 11:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P89117 and previous config saved to /var/cache/conftool/dbconfig/20260227-113504-marostegui.json
- 11:34 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1143.eqiad.wmnet
- 11:34 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1142.eqiad.wmnet
- 11:25 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1142.eqiad.wmnet
- 11:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P89116 and previous config saved to /var/cache/conftool/dbconfig/20260227-111956-marostegui.json
- 11:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T418465)', diff saved to https://phabricator.wikimedia.org/P89115 and previous config saved to /var/cache/conftool/dbconfig/20260227-110447-marostegui.json
- 11:02 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 11:01 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 11:00 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 11:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2217 (T418465)', diff saved to https://phabricator.wikimedia.org/P89114 and previous config saved to /var/cache/conftool/dbconfig/20260227-110011-marostegui.json
- 11:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2217.codfw.wmnet with reason: Maintenance
- 10:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T418465)', diff saved to https://phabricator.wikimedia.org/P89113 and previous config saved to /var/cache/conftool/dbconfig/20260227-105947-marostegui.json
- 10:56 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 10:51 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 10:50 elukey@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 10:49 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:49 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 10:48 aokoth@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
- 10:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P89112 and previous config saved to /var/cache/conftool/dbconfig/20260227-104439-marostegui.json
- 10:42 aokoth@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
- 10:40 aokoth@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
- 10:34 aokoth@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
- 10:33 aokoth@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host gitlab1003.wikimedia.org
- 10:33 aokoth@cumin1003: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
- 10:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P89111 and previous config saved to /var/cache/conftool/dbconfig/20260227-102931-marostegui.json
- 10:21 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host dse-k8s-worker1026.eqiad.wmnet
- 10:18 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Security Release - T418483
- 10:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T418465)', diff saved to https://phabricator.wikimedia.org/P89110 and previous config saved to /var/cache/conftool/dbconfig/20260227-101422-marostegui.json
- 10:10 marostegui: Failover m5-master T401966
- 10:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2214 (T418465)', diff saved to https://phabricator.wikimedia.org/P89108 and previous config saved to /var/cache/conftool/dbconfig/20260227-100944-marostegui.json
- 10:09 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2214.codfw.wmnet with reason: Maintenance
- 10:06 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2197.codfw.wmnet with reason: Maintenance
- 10:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T418465)', diff saved to https://phabricator.wikimedia.org/P89107 and previous config saved to /var/cache/conftool/dbconfig/20260227-100625-marostegui.json
- 09:57 marostegui@dns1004: END - running authdns-update
- 09:56 marostegui@dns1004: START - running authdns-update
- 09:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P89106 and previous config saved to /var/cache/conftool/dbconfig/20260227-095117-marostegui.json
- 09:45 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 09:44 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 09:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P89105 and previous config saved to /var/cache/conftool/dbconfig/20260227-093609-marostegui.json
- 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 09:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 09:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-sre: apply
- 09:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-sre: apply
- 09:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T418465)', diff saved to https://phabricator.wikimedia.org/P89104 and previous config saved to /var/cache/conftool/dbconfig/20260227-092101-marostegui.json
- 09:19 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1027.eqiad.wmnet with OS trixie
- 09:18 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2193 (T418465)', diff saved to https://phabricator.wikimedia.org/P89103 and previous config saved to /var/cache/conftool/dbconfig/20260227-091847-marostegui.json
- 09:18 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2193.codfw.wmnet with reason: Maintenance
- 09:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T418465)', diff saved to https://phabricator.wikimedia.org/P89102 and previous config saved to /var/cache/conftool/dbconfig/20260227-091822-marostegui.json
- 09:06 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Security Release - T418483
- 09:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P89101 and previous config saved to /var/cache/conftool/dbconfig/20260227-090314-marostegui.json
- 08:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
- 08:55 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1027.eqiad.wmnet with reason: host reimage
- 08:51 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 08:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P89100 and previous config saved to /var/cache/conftool/dbconfig/20260227-084806-marostegui.json
- 08:45 Emperor: restart corto on alert1002
- 08:40 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 08:39 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host dbproxy1027.eqiad.wmnet with OS trixie
- 08:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T418465)', diff saved to https://phabricator.wikimedia.org/P89099 and previous config saved to /var/cache/conftool/dbconfig/20260227-083257-marostegui.json
- 08:32 aokoth@cumin1003: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 08:32 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 08:30 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2180 (T418465)', diff saved to https://phabricator.wikimedia.org/P89098 and previous config saved to /var/cache/conftool/dbconfig/20260227-083043-marostegui.json
- 08:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2180.codfw.wmnet with reason: Maintenance
- 08:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T418465)', diff saved to https://phabricator.wikimedia.org/P89097 and previous config saved to /var/cache/conftool/dbconfig/20260227-083018-marostegui.json
- 08:22 aokoth@cumin1003: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 08:22 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 08:19 aokoth@cumin1003: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 08:19 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 08:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P89096 and previous config saved to /var/cache/conftool/dbconfig/20260227-081510-marostegui.json
- 08:14 aokoth@cumin1003: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 08:14 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 08:05 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 08:04 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 08:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P89095 and previous config saved to /var/cache/conftool/dbconfig/20260227-080002-marostegui.json
- 07:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T418465)', diff saved to https://phabricator.wikimedia.org/P89094 and previous config saved to /var/cache/conftool/dbconfig/20260227-074454-marostegui.json
- 07:39 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2169 (T418465)', diff saved to https://phabricator.wikimedia.org/P89092 and previous config saved to /var/cache/conftool/dbconfig/20260227-073957-marostegui.json
- 07:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2169.codfw.wmnet with reason: Maintenance
- 07:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T418465)', diff saved to https://phabricator.wikimedia.org/P89091 and previous config saved to /var/cache/conftool/dbconfig/20260227-073943-marostegui.json
- 07:38 aokoth@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Security Release - T418483
- 07:27 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Security Release - T418483
- 07:25 aokoth@cumin1003: END (ERROR) - Cookbook sre.gitlab.upgrade (exit_code=97) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 07:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P89090 and previous config saved to /var/cache/conftool/dbconfig/20260227-072434-marostegui.json
- 07:19 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 07:18 aokoth@cumin1003: END (ERROR) - Cookbook sre.gitlab.upgrade (exit_code=97) on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 07:15 aokoth@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Security Release - T418483
- 07:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P89089 and previous config saved to /var/cache/conftool/dbconfig/20260227-070926-marostegui.json
- 06:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T418465)', diff saved to https://phabricator.wikimedia.org/P89088 and previous config saved to /var/cache/conftool/dbconfig/20260227-065417-marostegui.json
- 06:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2158 (T418465)', diff saved to https://phabricator.wikimedia.org/P89087 and previous config saved to /var/cache/conftool/dbconfig/20260227-064922-marostegui.json
- 06:49 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2158.codfw.wmnet with reason: Maintenance
- 06:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T418465)', diff saved to https://phabricator.wikimedia.org/P89086 and previous config saved to /var/cache/conftool/dbconfig/20260227-064856-marostegui.json
- 06:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1162: Repooling after switchover
- 06:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P89084 and previous config saved to /var/cache/conftool/dbconfig/20260227-063348-marostegui.json
- 06:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P89082 and previous config saved to /var/cache/conftool/dbconfig/20260227-061840-marostegui.json
- 06:07 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
- 06:05 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1244 T418079', diff saved to https://phabricator.wikimedia.org/P89081 and previous config saved to /var/cache/conftool/dbconfig/20260227-060534-marostegui.json
- 06:04 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1160 to s4 primary T418079', diff saved to https://phabricator.wikimedia.org/P89080 and previous config saved to /var/cache/conftool/dbconfig/20260227-060455-marostegui.json
- 06:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T418465)', diff saved to https://phabricator.wikimedia.org/P89078 and previous config saved to /var/cache/conftool/dbconfig/20260227-060331-marostegui.json
- 06:00 marostegui: revert: Failover m5-master T401966
- 06:00 marostegui: Starting s4 eqiad failover from db1244 to db1160 - T418079
- 05:59 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 42 hosts with reason: Primary switchover s4 T418079
- 05:58 marostegui@dns1006: END - running authdns-update
- 05:58 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1160 with weight 0 T418079', diff saved to https://phabricator.wikimedia.org/P89077 and previous config saved to /var/cache/conftool/dbconfig/20260227-055845-marostegui.json
- 05:58 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2151 (T418465)', diff saved to https://phabricator.wikimedia.org/P89076 and previous config saved to /var/cache/conftool/dbconfig/20260227-055835-marostegui.json
- 05:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2151.codfw.wmnet with reason: Maintenance
- 05:57 marostegui@dns1006: START - running authdns-update
- 05:56 marostegui@dns1006: END - running authdns-update
- 05:55 marostegui: Failover m5-master T401966
- 05:55 marostegui@dns1006: START - running authdns-update
- 05:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1173.eqiad.wmnet with reason: Maintenance
- 05:49 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: Repooling after switchover
- 05:49 marostegui@cumin1003: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) pool db1162: Repooling after switchover
- 05:48 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1162: Repooling after switchover
- 05:48 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1162 T418553', diff saved to https://phabricator.wikimedia.org/P89074 and previous config saved to /var/cache/conftool/dbconfig/20260227-054833-marostegui.json
- 05:47 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1222 to s2 primary T418553', diff saved to https://phabricator.wikimedia.org/P89073 and previous config saved to /var/cache/conftool/dbconfig/20260227-054750-marostegui.json
- 05:47 marostegui: Starting s2 eqiad failover from db1162 to db1222 - T418553
- 05:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T418553
- 05:44 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1222 with weight 0 T418553', diff saved to https://phabricator.wikimedia.org/P89072 and previous config saved to /var/cache/conftool/dbconfig/20260227-054410-marostegui.json
- 02:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 18s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:45 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host ms-fe2024.codfw.wmnet
- 00:01 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ms-fe2024.codfw.wmnet
- 00:01 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host ms-fe2024.codfw.wmnet
2026-02-26
- 23:15 swfrench@deploy2002: Finished scap sync-world: helmfile-only deployment to clear chart version diff (duration: 02m 31s)
- 23:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2051.codfw.wmnet with OS trixie
- 23:13 swfrench@deploy2002: Started scap sync-world: helmfile-only deployment to clear chart version diff
- 23:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2049.codfw.wmnet with OS trixie
- 23:09 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2050.codfw.wmnet with OS trixie
- 23:04 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2053.codfw.wmnet with OS trixie
- 22:58 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ms-fe2024.codfw.wmnet
- 22:58 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2052.codfw.wmnet with OS trixie
- 22:54 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2048.codfw.wmnet with OS trixie
- 22:53 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2051.codfw.wmnet with reason: host reimage
- 22:49 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2049.codfw.wmnet with reason: host reimage
- 22:45 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2050.codfw.wmnet with reason: host reimage
- 22:41 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2053.codfw.wmnet with reason: host reimage
- 22:37 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2052.codfw.wmnet with reason: host reimage
- 22:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2048.codfw.wmnet with reason: host reimage
- 22:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2053.codfw.wmnet with reason: host reimage
- 22:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2052.codfw.wmnet with reason: host reimage
- 22:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2050.codfw.wmnet with reason: host reimage
- 22:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2051.codfw.wmnet with reason: host reimage
- 22:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2049.codfw.wmnet with reason: host reimage
- 22:27 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2048.codfw.wmnet with reason: host reimage
- 22:19 catrope@deploy2002: Finished scap sync-world: Backport for CommonSettings: Set $wgJwtSessionCookieIssuer for bot passwords (T415007), Enable JWT session cookie for bot passwords (all wikis) (T415007) (duration: 11m 48s)
- 22:16 catrope@deploy2002: catrope, d3r1ck01: Continuing with sync
- 22:15 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2053.codfw.wmnet with OS trixie
- 22:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2052.codfw.wmnet with OS trixie
- 22:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2051.codfw.wmnet with OS trixie
- 22:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2050.codfw.wmnet with OS trixie
- 22:14 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2049.codfw.wmnet with OS trixie
- 22:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp2048.codfw.wmnet with OS trixie
- 22:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2043.codfw.wmnet
- 22:09 catrope@deploy2002: catrope, d3r1ck01: Backport for CommonSettings: Set $wgJwtSessionCookieIssuer for bot passwords (T415007), Enable JWT session cookie for bot passwords (all wikis) (T415007) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:09 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2046.codfw.wmnet with OS trixie
- 22:08 catrope@deploy2002: Started scap sync-world: Backport for CommonSettings: Set $wgJwtSessionCookieIssuer for bot passwords (T415007), Enable JWT session cookie for bot passwords (all wikis) (T415007)
- 22:06 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2047.codfw.wmnet with OS trixie
- 22:04 catrope@deploy2002: Finished scap sync-world: Backport for Remove workaround for T370517, no longer needed (T370517) (duration: 07m 03s)
- 22:03 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2043.codfw.wmnet
- 22:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2043.codfw.wmnet
- 22:00 catrope@deploy2002: catrope: Continuing with sync
- 21:59 catrope@deploy2002: catrope: Backport for Remove workaround for T370517, no longer needed (T370517) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:57 catrope@deploy2002: Started scap sync-world: Backport for Remove workaround for T370517, no longer needed (T370517)
- 21:55 catrope@deploy2002: Finished scap sync-world: Backport for Deploy Comparative Reader Research survey on eswiki (T417834), Deploy Comparative Reader Research survey on enwiki (T417829) (duration: 07m 28s)
- 21:54 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2043.codfw.wmnet
- 21:51 tappof: Deployment of the multi-instance Thanos Store Gateway patches for T412924: rollout complete.
- 21:51 catrope@deploy2002: dani, catrope: Continuing with sync
- 21:49 catrope@deploy2002: dani, catrope: Backport for Deploy Comparative Reader Research survey on eswiki (T417834), Deploy Comparative Reader Research survey on enwiki (T417829) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:47 catrope@deploy2002: Started scap sync-world: Backport for Deploy Comparative Reader Research survey on eswiki (T417834), Deploy Comparative Reader Research survey on enwiki (T417829)
- 21:45 catrope@deploy2002: Finished scap sync-world: Backport for Session: Emit JWT cookie in ImmutableSessionProviderWithCookie (T415007), Session: Emit JWT cookie in ImmutableSessionProviderWithCookie (T415007) (duration: 11m 39s)
- 21:41 catrope@deploy2002: catrope, tgr: Continuing with sync
- 21:35 catrope@deploy2002: catrope, tgr: Backport for Session: Emit JWT cookie in ImmutableSessionProviderWithCookie (T415007), Session: Emit JWT cookie in ImmutableSessionProviderWithCookie (T415007) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:34 catrope@deploy2002: Started scap sync-world: Backport for Session: Emit JWT cookie in ImmutableSessionProviderWithCookie (T415007), Session: Emit JWT cookie in ImmutableSessionProviderWithCookie (T415007)
- 21:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2096.codfw.wmnet with OS bullseye
- 21:33 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2095.codfw.wmnet with OS bullseye
- 21:32 catrope@deploy2002: Finished scap sync-world: Backport for EmailAuthHookHandler: Fix LoginNotify being an optional dependency (T418512), EmailAuthHookHandler: Fix LoginNotify being an optional dependency (T418512) (duration: 15m 19s)
- 21:28 catrope@deploy2002: tgr, catrope: Continuing with sync
- 21:26 tappof: Deployment of the multi-instance Thanos Store Gateway patches for T412924: starting the rollout on titan2001
- 21:18 catrope@deploy2002: tgr, catrope: Backport for EmailAuthHookHandler: Fix LoginNotify being an optional dependency (T418512), EmailAuthHookHandler: Fix LoginNotify being an optional dependency (T418512) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:17 catrope@deploy2002: Started scap sync-world: Backport for EmailAuthHookHandler: Fix LoginNotify being an optional dependency (T418512), EmailAuthHookHandler: Fix LoginNotify being an optional dependency (T418512)
- 21:14 catrope@deploy2002: Finished scap sync-world: Backport for Deploy PersonalDashboard to new wikis (T417665) (duration: 09m 57s)
- 21:10 catrope@deploy2002: catrope, suecarmol: Continuing with sync
- 21:06 catrope@deploy2002: catrope, suecarmol: Backport for Deploy PersonalDashboard to new wikis (T417665) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:05 tappof: Deployment of the multi-instance Thanos Store Gateway patches for T412924: starting the rollout on titan2002
- 21:04 catrope@deploy2002: Started scap sync-world: Backport for Deploy PersonalDashboard to new wikis (T417665)
- 21:02 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2047.codfw.wmnet with reason: host reimage
- 20:58 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2046.codfw.wmnet with reason: host reimage
- 20:54 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2047.codfw.wmnet with reason: host reimage
- 20:52 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2046.codfw.wmnet with reason: host reimage
- 20:39 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp2047.codfw.wmnet with OS trixie
- 20:38 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp2046.codfw.wmnet with OS trixie
- 20:23 tappof: Deployment of the multi-instance Thanos Store Gateway patches for T412924: starting the rollout on titan1001
- 20:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 20:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1263 (T415786)', diff saved to https://phabricator.wikimedia.org/P89069 and previous config saved to /var/cache/conftool/dbconfig/20260226-201451-marostegui.json
- 20:13 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2096.codfw.wmnet with OS bullseye
- 20:13 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2095.codfw.wmnet with OS bullseye
- 20:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['ms-be2095']
- 20:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ms-be2095']
- 20:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2043.codfw.wmnet
- 20:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2096.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2095.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:05 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 20:04 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 20:04 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 20:04 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 19:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1263', diff saved to https://phabricator.wikimedia.org/P89068 and previous config saved to /var/cache/conftool/dbconfig/20260226-195943-marostegui.json
- 19:57 brett@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp2043.codfw.wmnet
- 19:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2096.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2095.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1263', diff saved to https://phabricator.wikimedia.org/P89067 and previous config saved to /var/cache/conftool/dbconfig/20260226-194435-marostegui.json
- 19:44 brett@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp2043.codfw.wmnet
- 19:44 brett@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp2043.codfw.wmnet
- 19:35 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-backup1004.eqiad.wmnet with OS trixie
- 19:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1263 (T415786)', diff saved to https://phabricator.wikimedia.org/P89066 and previous config saved to /var/cache/conftool/dbconfig/20260226-192927-marostegui.json
- 19:29 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 19:28 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 19:27 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 19:27 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 19:18 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:18 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 19:17 hoo@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 19:16 hoo@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 19:11 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp2044.codfw.wmnet [reason: NIC firmware issues]
- 19:11 cdobbins@cumin2002: conftool action : set/pooled=no; selector: name=cp2043.codfw.wmnet [reason: NIC firmware issues]
- 19:09 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.17 refs T413808
- 19:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host ms-fe2024.codfw.wmnet
- 19:00 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ms-fe2024.codfw.wmnet
- 18:55 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host ms-fe2024.codfw.wmnet
- 18:51 swfrench@deploy2002: Finished scap sync-world: helmfile-only deploy for mesh module updates - T364245 (duration: 11m 13s)
- 18:50 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ms-fe2024.codfw.wmnet
- 18:50 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host ms-fe2024.codfw.wmnet
- 18:48 swfrench@deploy2002: swfrench: Continuing with sync
- 18:43 swfrench@deploy2002: swfrench: helmfile-only deploy for mesh module updates - T364245 synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:42 swfrench@deploy2002: Started scap sync-world: helmfile-only deploy for mesh module updates - T364245
- 18:39 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ms-fe2024.codfw.wmnet
- 18:39 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host ms-fe2024.codfw.wmnet
- 18:34 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host ms-fe2024.codfw.wmnet
- 18:23 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
- 18:22 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
- 18:22 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
- 18:22 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
- 18:21 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
- 18:21 bd808@deploy2002: helmfile [staging] START helmfile.d/services/developer-portal: apply
- 18:17 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 18:17 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 18:15 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-backup1004.eqiad.wmnet with OS trixie
- 18:12 bd808@deploy2002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
- 18:11 bd808@deploy2002: helmfile [codfw] START helmfile.d/services/toolhub: apply
- 18:11 bd808@deploy2002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
- 18:10 bd808@deploy2002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
- 18:09 bd808@deploy2002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
- 18:09 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 18:09 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 18:09 bd808@deploy2002: helmfile [staging] START helmfile.d/services/toolhub: apply
- 18:05 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2044.codfw.wmnet
- 17:58 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp2043.codfw.wmnet
- 17:30 kharlan@deploy2002: Finished scap sync-world: Backport for hcaptcha: Sanitize values of x_is_browser sent on risk_score events (T418505) (duration: 10m 00s)
- 17:27 kharlan@deploy2002: kharlan: Continuing with sync
- 17:23 kharlan@deploy2002: kharlan: Backport for hcaptcha: Sanitize values of x_is_browser sent on risk_score events (T418505) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:20 kharlan@deploy2002: Started scap sync-world: Backport for hcaptcha: Sanitize values of x_is_browser sent on risk_score events (T418505)
- 17:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2008-dev.codfw.wmnet with OS bookworm
- 17:18 dcausse@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 17:16 dcausse@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 17:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe2024.codfw.wmnet with OS bullseye
- 17:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 17:12 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-backup1004.eqiad.wmnet with OS trixie
- 17:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1225.eqiad.wmnet with reason: Maintenance
- 17:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T418465)', diff saved to https://phabricator.wikimedia.org/P89061 and previous config saved to /var/cache/conftool/dbconfig/20260226-171121-marostegui.json
- 17:03 javiermonton@deploy2002: Finished scap sync-world: Backport for component: mediawiki.page_html_content_change.dev0 (T418467) (duration: 11m 55s)
- 16:59 javiermonton@deploy2002: javiermonton: Continuing with sync
- 16:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P89059 and previous config saved to /var/cache/conftool/dbconfig/20260226-165613-marostegui.json
- 16:53 javiermonton@deploy2002: javiermonton: Backport for component: mediawiki.page_html_content_change.dev0 (T418467) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:53 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts moss-fe[1001-1002].eqiad.wmnet
- 16:53 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:53 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moss-fe[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
- 16:51 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moss-fe[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
- 16:51 javiermonton@deploy2002: Started scap sync-world: Backport for component: mediawiki.page_html_content_change.dev0 (T418467)
- 16:50 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2024
- 16:50 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2024
- 16:49 jhathaway@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on sretest2001.codfw.wmnet with reason: T381919
- 16:47 mvernon@cumin2002: START - Cookbook sre.dns.netbox
- 16:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P89058 and previous config saved to /var/cache/conftool/dbconfig/20260226-164105-marostegui.json
- 16:33 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts moss-fe[1001-1002].eqiad.wmnet
- 16:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T418465)', diff saved to https://phabricator.wikimedia.org/P89056 and previous config saved to /var/cache/conftool/dbconfig/20260226-162556-marostegui.json
- 16:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2024.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2024.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:23 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1201 (T418465)', diff saved to https://phabricator.wikimedia.org/P89055 and previous config saved to /var/cache/conftool/dbconfig/20260226-162346-marostegui.json
- 16:23 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1201.eqiad.wmnet with reason: Maintenance
- 16:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T418465)', diff saved to https://phabricator.wikimedia.org/P89054 and previous config saved to /var/cache/conftool/dbconfig/20260226-162321-marostegui.json
- 16:13 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-backup1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P89053 and previous config saved to /var/cache/conftool/dbconfig/20260226-160812-marostegui.json
- 16:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-backup1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2008-dev.codfw.wmnet with OS bookworm
- 15:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2024.codfw.wmnet with OS bullseye
- 15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2024.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P89052 and previous config saved to /var/cache/conftool/dbconfig/20260226-155304-marostegui.json
- 15:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2024.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:52 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-backup1004.eqiad.wmnet with OS trixie
- 15:51 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:49 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 15:47 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=apus,name=apus-fe1005.eqiad.wmnet
- 15:47 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=apus,name=apus-fe1004.eqiad.wmnet
- 15:47 mvernon@cumin2002: conftool action : set/weight=40; selector: service=apus,name=apus-fe1005.eqiad.wmnet
- 15:47 mvernon@cumin2002: conftool action : set/weight=40; selector: service=apus,name=apus-fe1004.eqiad.wmnet
- 15:44 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:44 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:41 dreamyjazz@deploy2002: Finished scap sync-world: Backport for SI: Populate siu_info in cusi_user from matched signals (T411118), ReassignMentees: Adjust logging level (T418194), ReassignMentees: Adjust logging level (T418194) (duration: 06m 29s)
- 15:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T418465)', diff saved to https://phabricator.wikimedia.org/P89051 and previous config saved to /var/cache/conftool/dbconfig/20260226-153756-marostegui.json
- 15:37 dreamyjazz@deploy2002: dreamyjazz, urbanecm: Continuing with sync
- 15:36 dreamyjazz@deploy2002: dreamyjazz, urbanecm: Backport for SI: Populate siu_info in cusi_user from matched signals (T411118), ReassignMentees: Adjust logging level (T418194), ReassignMentees: Adjust logging level (T418194) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:35 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1187 (T418465)', diff saved to https://phabricator.wikimedia.org/P89050 and previous config saved to /var/cache/conftool/dbconfig/20260226-153545-marostegui.json
- 15:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1187.eqiad.wmnet with reason: Maintenance
- 15:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T418465)', diff saved to https://phabricator.wikimedia.org/P89049 and previous config saved to /var/cache/conftool/dbconfig/20260226-153521-marostegui.json
- 15:34 dreamyjazz@deploy2002: Started scap sync-world: Backport for SI: Populate siu_info in cusi_user from matched signals (T411118), ReassignMentees: Adjust logging level (T418194), ReassignMentees: Adjust logging level (T418194)
- 15:24 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-backup1003.eqiad.wmnet with OS trixie
- 15:24 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 15:24 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 15:23 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1019.eqiad.wmnet with OS trixie
- 15:23 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 15:22 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 15:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P89048 and previous config saved to /var/cache/conftool/dbconfig/20260226-152012-marostegui.json
- 15:19 sukhe: sudo cumin -b1 -s5 "C:bird%do_ipv6=true" "run-puppet-agent --enable 'merging CR 1241003'"
- 15:19 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1017.eqiad.wmnet with OS trixie
- 15:07 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-backup1003.eqiad.wmnet with reason: host reimage
- 15:05 sukhe: sudo cumin "C:bird%do_ipv6=true" "disable-puppet 'merging CR 1241003'"
- 15:05 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1018.eqiad.wmnet with OS trixie
- 15:05 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 15:05 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 15:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P89047 and previous config saved to /var/cache/conftool/dbconfig/20260226-150504-marostegui.json
- 15:03 moritzm: upgrade conf* nodes to facter 4 T381538
- 15:03 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:03 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:03 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-backup1003.eqiad.wmnet with reason: host reimage
- 15:02 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:02 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:02 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1019.eqiad.wmnet with reason: host reimage
- 14:58 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1017.eqiad.wmnet with reason: host reimage
- 14:52 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1019.eqiad.wmnet with reason: host reimage
- 14:52 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1017.eqiad.wmnet with reason: host reimage
- 14:50 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert "Add configurations for graphql usage survey and its pipeline tests" (duration: 06m 41s)
- 14:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T418465)', diff saved to https://phabricator.wikimedia.org/P89046 and previous config saved to /var/cache/conftool/dbconfig/20260226-144956-marostegui.json
- 14:47 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1180 (T418465)', diff saved to https://phabricator.wikimedia.org/P89045 and previous config saved to /var/cache/conftool/dbconfig/20260226-144746-marostegui.json
- 14:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1180.eqiad.wmnet with reason: Maintenance
- 14:47 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-backup1004.eqiad.wmnet with OS trixie
- 14:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T418465)', diff saved to https://phabricator.wikimedia.org/P89044 and previous config saved to /var/cache/conftool/dbconfig/20260226-144721-marostegui.json
- 14:47 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-backup1003.eqiad.wmnet with OS trixie
- 14:46 urbanecm@deploy2002: trainbranchbot, urbanecm: Continuing with sync
- 14:46 urbanecm@deploy2002: trainbranchbot, urbanecm: Backport for Revert "Add configurations for graphql usage survey and its pipeline tests" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:44 urbanecm@deploy2002: Started scap sync-world: Backport for Revert "Add configurations for graphql usage survey and its pipeline tests"
- 14:43 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1018.eqiad.wmnet with reason: host reimage
- 14:41 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 14:40 urbanecm@deploy2002: Sync cancelled.
- 14:38 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1018.eqiad.wmnet with reason: host reimage
- 14:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 14:37 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 14:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 14:37 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host backup1019.eqiad.wmnet with OS trixie
- 14:37 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host backup1017.eqiad.wmnet with OS trixie
- 14:36 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 14:35 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1016.eqiad.wmnet with OS trixie
- 14:35 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:35 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1020.eqiad.wmnet with OS trixie
- 14:35 jclark@cumin1003: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:35 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 14:35 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 14:34 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 14:34 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 14:34 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:33 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 14:33 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 14:33 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 14:33 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 14:32 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.update-replication (exit_code=99)
- 14:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P89043 and previous config saved to /var/cache/conftool/dbconfig/20260226-143213-marostegui.json
- 14:31 urbanecm@deploy2002: urbanecm, itamar: Backport for Add configurations for graphql usage survey and its pipeline tests (T414476) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:30 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 14:29 urbanecm@deploy2002: Started scap sync-world: Backport for Add configurations for graphql usage survey and its pipeline tests (T414476)
- 14:29 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 14:29 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 14:29 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:29 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 14:28 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 14:26 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] Lower wgGEMentorshipReassignMenteesBatchSize to 2500 (T418194) (duration: 06m 20s)
- 14:24 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host backup1018.eqiad.wmnet with OS trixie
- 14:22 urbanecm@deploy2002: urbanecm: Continuing with sync
- 14:22 urbanecm@deploy2002: urbanecm: Backport for [Growth] Lower wgGEMentorshipReassignMenteesBatchSize to 2500 (T418194) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:20 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1017.eqiad.wmnet with OS trixie
- 14:20 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:20 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] Lower wgGEMentorshipReassignMenteesBatchSize to 2500 (T418194)
- 14:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P89042 and previous config saved to /var/cache/conftool/dbconfig/20260226-141705-marostegui.json
- 14:16 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert^2 "Remove deprecated IRS v2 configurations" (T413951), zhwiki: drop event organizer's duplicated right to remove eventparticipant from self (T418089) (duration: 07m 48s)
- 14:16 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 14:14 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:12 urbanecm@deploy2002: stran, urbanecm, 1f616emo: Continuing with sync
- 14:12 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1016.eqiad.wmnet with reason: host reimage
- 14:10 urbanecm@deploy2002: stran, urbanecm, 1f616emo: Backport for Revert^2 "Remove deprecated IRS v2 configurations" (T413951), zhwiki: drop event organizer's duplicated right to remove eventparticipant from self (T418089) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:08 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1020.eqiad.wmnet with reason: host reimage
- 14:08 urbanecm@deploy2002: Started scap sync-world: Backport for Revert^2 "Remove deprecated IRS v2 configurations" (T413951), zhwiki: drop event organizer's duplicated right to remove eventparticipant from self (T418089)
- 14:07 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1016.eqiad.wmnet with reason: host reimage
- 14:03 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1020.eqiad.wmnet with reason: host reimage
- 14:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T418465)', diff saved to https://phabricator.wikimedia.org/P89041 and previous config saved to /var/cache/conftool/dbconfig/20260226-140157-marostegui.json
- 13:59 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1168 (T418465)', diff saved to https://phabricator.wikimedia.org/P89040 and previous config saved to /var/cache/conftool/dbconfig/20260226-135946-marostegui.json
- 13:59 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1168.eqiad.wmnet with reason: Maintenance
- 13:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T418465)', diff saved to https://phabricator.wikimedia.org/P89039 and previous config saved to /var/cache/conftool/dbconfig/20260226-135922-marostegui.json
- 13:59 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1017.eqiad.wmnet with reason: host reimage
- 13:54 marostegui: Deploy schema change on x1 on the master with replication enable T418480
- 13:53 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1018.eqiad.wmnet with OS trixie
- 13:53 jclark@cumin1003: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on backup1019.eqiad.wmnet with reason: host reimage
- 13:52 vgutierrez@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp[7001,7009].*} and A:cp - 3.0.17 upgrade (T417253)
- 13:52 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host backup1016.eqiad.wmnet with OS trixie
- 13:52 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1019.eqiad.wmnet with reason: host reimage
- 13:52 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1017.eqiad.wmnet with reason: host reimage
- 13:51 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1016.eqiad.wmnet with OS trixie
- 13:46 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host backup1020.eqiad.wmnet with OS trixie
- 13:44 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P89038 and previous config saved to /var/cache/conftool/dbconfig/20260226-134414-marostegui.json
- 13:43 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1016.eqiad.wmnet with reason: host reimage
- 13:41 vgutierrez@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp[7001,7009].*} and A:cp - 3.0.17 upgrade (T417253)
- 13:39 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1016.eqiad.wmnet with reason: host reimage
- 13:38 vgutierrez: fetch haproxy 3.0.17 on thirdparty/haproxy30 bullseye-wikimedia (apt.wm.o)
- 13:35 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host backup1019.eqiad.wmnet with OS trixie
- 13:35 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host backup1018.eqiad.wmnet with OS trixie
- 13:34 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host backup1017.eqiad.wmnet with OS trixie
- 13:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P89036 and previous config saved to /var/cache/conftool/dbconfig/20260226-132905-marostegui.json
- 13:25 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be1097.eqiad.wmnet with OS bullseye
- 13:23 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:20 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 13:20 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ms-be1097
- 13:20 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1097
- 13:18 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1097
- 13:17 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1097
- 13:15 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T418465)', diff saved to https://phabricator.wikimedia.org/P89035 and previous config saved to /var/cache/conftool/dbconfig/20260226-131357-marostegui.json
- 13:13 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:13 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 13:12 urbanecm@deploy2002: Finished scap sync-world: Backport for ReassignMentees: Log more information (T418194), ReassignMentees: Log more information (T418194) (duration: 11m 00s)
- 13:11 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1165 (T418465)', diff saved to https://phabricator.wikimedia.org/P89034 and previous config saved to /var/cache/conftool/dbconfig/20260226-131147-marostegui.json
- 13:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 13:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1165.eqiad.wmnet with reason: Maintenance
- 13:10 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be1096.eqiad.wmnet with OS bullseye
- 13:10 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 13:08 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 13:08 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:08 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt apus-fe1004,5 - jclark@cumin1003"
- 13:05 urbanecm@deploy2002: urbanecm: Continuing with sync
- 13:05 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt apus-fe1004,5 - jclark@cumin1003"
- 13:05 urbanecm@deploy2002: urbanecm: Backport for ReassignMentees: Log more information (T418194), ReassignMentees: Log more information (T418194) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:03 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 13:03 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
- 13:01 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 13:01 urbanecm@deploy2002: Started scap sync-world: Backport for ReassignMentees: Log more information (T418194), ReassignMentees: Log more information (T418194)
- 12:57 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be1097
- 12:56 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ms-be1097
- 12:56 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:55 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-fe1005.eqiad.wmnet with OS bookworm
- 12:54 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 12:54 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 12:53 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 12:48 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1097.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:43 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-be1097.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:43 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be1096.eqiad.wmnet with reason: host reimage
- 12:43 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be1097.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:42 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-be1097.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:38 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 12:38 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be1096.eqiad.wmnet with reason: host reimage
- 12:35 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host backup1016.eqiad.wmnet with OS trixie
- 12:34 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-fe1004.eqiad.wmnet with OS bookworm
- 12:34 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 12:34 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 12:17 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-fe1005.eqiad.wmnet with reason: host reimage
- 12:15 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1097.eqiad.wmnet with OS bullseye
- 12:13 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-be1096.eqiad.wmnet with OS bullseye
- 12:13 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-fe1004.eqiad.wmnet with reason: host reimage
- 12:12 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on apus-fe1005.eqiad.wmnet with reason: host reimage
- 12:11 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on apus-fe1004.eqiad.wmnet with reason: host reimage
- 11:56 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
- 11:56 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab replica
- 11:55 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host apus-fe1005.eqiad.wmnet with OS bookworm
- 11:54 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host apus-fe1004.eqiad.wmnet with OS bookworm
- 11:52 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 11:51 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 11:46 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab replica
- 11:46 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
- 11:43 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-backup1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:37 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
- 11:28 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:26 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas-eqiad
- 11:26 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:25 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas-eqiad
- 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas-codfw
- 11:24 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas-codfw
- 11:23 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 11:23 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-backup1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:23 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 11:22 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 11:21 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 11:19 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 11:19 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 11:19 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 11:17 tappof: Deployment of the multi-instance Thanos Store Gateway patches for T412924: running tests on titan1002
- 11:13 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-backup1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:13 jclark@cumin1003: START - Cookbook sre.hosts.provision for host apus-fe1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:08 jclark@cumin1003: START - Cookbook sre.hosts.provision for host apus-fe1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:08 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-backup1003.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:07 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:07 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt apus-fe1004,5 - jclark@cumin1003"
- 11:07 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt apus-fe1004,5 - jclark@cumin1003"
- 11:05 jmm@cumin2002: END (PASS) - Cookbook sre.elasticsearch.restart-nginx (exit_code=0) rolling restart_daemons on A:cloudelastic
- 11:03 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 11:01 jmm@cumin2002: START - Cookbook sre.elasticsearch.restart-nginx rolling restart_daemons on A:cloudelastic
- 10:57 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wdqs-all
- 10:48 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 10:47 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 10:47 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 10:44 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wdqs-all
- 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.wdqs.restart-nginx-envoy (exit_code=0) rolling restart_daemons on A:wcqs-public
- 10:43 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 10:41 jmm@cumin2002: START - Cookbook sre.wdqs.restart-nginx-envoy rolling restart_daemons on A:wcqs-public
- 10:41 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 10:40 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 10:40 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 10:39 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7001.*
- 10:39 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 10:39 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp7001*} and A:cp - 3.0 upgrade ()
- 10:34 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp7001*} and A:cp - 3.0 upgrade ()
- 10:33 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7001.*
- 10:33 fabfur: depooling cp7001 to upgrade haproxy (T417253)
- 10:32 elukey@deploy2002: Finished scap sync-world: Test new Docker Registry backend (duration: 43m 02s)
- 10:30 ammarpad@deploy2002: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=wikidatawiki --logwiki=metawiki 'Luftlewis 1' 'Renamed user 4f8e749b4f28ee9e6ebc680c8c3c943d' # T418435
- 10:18 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 09:57 jmm@puppetserver1001: conftool action : set/pooled=true; selector: dnsdisc=pki,name=codfw
- 09:51 elukey@deploy2002: Started scap sync-world: Test new Docker Registry backend
- 09:47 elukey: move the Docker Registry's /v2/restricted (MediaWiki Docker image prefix) to s3/apus - T390251
- 09:44 jmm@puppetserver1001: conftool action : set/pooled=false; selector: dnsdisc=pki,name=codfw
- 09:43 urbanecm@deploy2002: mwscript-k8s job started: foreachwikiindblist growthexperiments WikimediaMaintenance:createExtensionTables.php growthexperiments
- 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1028.eqiad.wmnet with OS trixie
- 09:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1028.eqiad.wmnet with reason: host reimage
- 09:17 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
- 09:15 hashar@deploy2002: Finished deploy [gerrit/gerrit@74473c2]: wm-checks-api: add Rerun command for codehealth + inline documentation (duration: 00m 14s)
- 09:15 hashar@deploy2002: Started deploy [gerrit/gerrit@74473c2]: wm-checks-api: add Rerun command for codehealth + inline documentation
- 09:13 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1028.eqiad.wmnet with reason: host reimage
- 09:10 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 09:09 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
- 09:06 mvernon@cumin2002: END (FAIL) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=1) rolling restart_daemons on A:swift-fe
- 09:05 mvernon@cumin1003: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling restart_daemons on A:thanos-fe
- 09:01 mvernon@cumin1003: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling restart_daemons on A:thanos-fe
- 09:01 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host dbproxy1028.eqiad.wmnet with OS trixie
- 09:00 moritzm: restart FPM on Phabricator hosts to pick up OpenSSL updates
- 09:00 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe
- 07:50 root@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Hokwelum out of all services on: 2432 hosts
- 07:40 moritzm: installing openssl security updates
- 06:16 moritzm: updated thirdparty/node22 to node 20.20.0
- 06:08 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1263 (T415786)', diff saved to https://phabricator.wikimedia.org/P89032 and previous config saved to /var/cache/conftool/dbconfig/20260226-060809-marostegui.json
- 06:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1263.eqiad.wmnet with reason: Maintenance
- 06:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1262 (T415786)', diff saved to https://phabricator.wikimedia.org/P89031 and previous config saved to /var/cache/conftool/dbconfig/20260226-060755-marostegui.json
- 05:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1262', diff saved to https://phabricator.wikimedia.org/P89030 and previous config saved to /var/cache/conftool/dbconfig/20260226-055246-marostegui.json
- 05:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1262', diff saved to https://phabricator.wikimedia.org/P89029 and previous config saved to /var/cache/conftool/dbconfig/20260226-053739-marostegui.json
- 05:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1262 (T415786)', diff saved to https://phabricator.wikimedia.org/P89028 and previous config saved to /var/cache/conftool/dbconfig/20260226-052230-marostegui.json
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 12s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-25
- 23:36 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 23:35 swfrench@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 23:32 urbanecm@deploy2002: Finished scap sync-world: Backport for SECURITY: ReassignMentees: Handle hidden users correctly (T418222), SECURITY: ReassignMentees: Handle hidden users correctly (T418222) (duration: 07m 01s)
- 23:28 urbanecm@deploy2002: urbanecm: Continuing with sync
- 23:27 urbanecm@deploy2002: urbanecm: Backport for SECURITY: ReassignMentees: Handle hidden users correctly (T418222), SECURITY: ReassignMentees: Handle hidden users correctly (T418222) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:25 urbanecm@deploy2002: Started scap sync-world: Backport for SECURITY: ReassignMentees: Handle hidden users correctly (T418222), SECURITY: ReassignMentees: Handle hidden users correctly (T418222)
- 23:20 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 23:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 23:17 swfrench@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 23:15 swfrench@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 23:15 swfrench@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 23:14 swfrench@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 23:08 urbanecm@deploy2002: Finished scap sync-world: Backport for tests: Introduce MentorRemoverTest (duration: 07m 12s)
- 23:04 urbanecm@deploy2002: urbanecm: Continuing with sync
- 23:03 urbanecm@deploy2002: urbanecm: Backport for tests: Introduce MentorRemoverTest synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe2024.codfw.wmnet with OS bullseye
- 23:01 urbanecm@deploy2002: Started scap sync-world: Backport for tests: Introduce MentorRemoverTest
- 22:48 cjming: end of UTC late backport window
- 22:43 cjming@deploy2002: Finished scap sync-world: Backport for GetSecurityLogContextHandler: Add IP reputation country code (T415354), GetSecurityLogContextHandler: Add IP reputation country code (T415354) (duration: 08m 11s)
- 22:39 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:39 cjming@deploy2002: cjming, kharlan: Continuing with sync
- 22:37 cjming@deploy2002: cjming, kharlan: Backport for GetSecurityLogContextHandler: Add IP reputation country code (T415354), GetSecurityLogContextHandler: Add IP reputation country code (T415354) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:37 vriley@cumin1003: START - Cookbook sre.dns.netbox
- 22:36 vriley@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host frdb1008
- 22:36 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host frdb1008
- 22:35 cjming@deploy2002: Started scap sync-world: Backport for GetSecurityLogContextHandler: Add IP reputation country code (T415354), GetSecurityLogContextHandler: Add IP reputation country code (T415354)
- 22:31 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
- 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2023.codfw.wmnet with OS bullseye
- 22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:14 jhathaway@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
- 22:12 cjming@deploy2002: Finished scap sync-world: Backport for zhwiki: remove accountcreator usergroup (T418089) (duration: 06m 59s)
- 22:11 jhathaway@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
- 22:09 cjming@deploy2002: cjming, anzx: Continuing with sync
- 22:08 cjming@deploy2002: cjming, anzx: Backport for zhwiki: remove accountcreator usergroup (T418089) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:06 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply
- 22:06 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply
- 22:05 cjming@deploy2002: Started scap sync-world: Backport for zhwiki: remove accountcreator usergroup (T418089)
- 22:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2023.codfw.wmnet with reason: host reimage
- 21:59 jhathaway@cumin2002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
- 21:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2024.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2023.codfw.wmnet with reason: host reimage
- 21:57 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2044.codfw.wmnet with OS trixie
- 21:56 mutante: clouddumps1001/1002: removing 2 old dump files and renaming one for T417824
- 21:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2024.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2024.codfw.wmnet with OS bullseye
- 21:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2023.codfw.wmnet with OS bullseye
- 21:41 cjming@deploy2002: Finished scap sync-world: Backport for JS SDK: Added `Instrument#submitClick` for backwards compatibility (duration: 06m 28s)
- 21:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2024.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2023.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:37 cjming@deploy2002: cjming, phuedx: Continuing with sync
- 21:37 cjming@deploy2002: cjming, phuedx: Backport for JS SDK: Added `Instrument#submitClick` for backwards compatibility synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:35 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
- 21:35 cjming@deploy2002: Started scap sync-world: Backport for JS SDK: Added `Instrument#submitClick` for backwards compatibility
- 21:30 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2044.codfw.wmnet with reason: host reimage
- 21:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2024.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:26 cjming@deploy2002: Finished scap sync-world: Backport for JS SDK: Fix instrument_name field handling (duration: 06m 41s)
- 21:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2023.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2096.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:22 cjming@deploy2002: phuedx, cjming: Continuing with sync
- 21:22 cjming@deploy2002: phuedx, cjming: Backport for JS SDK: Fix instrument_name field handling synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:20 cjming@deploy2002: Started scap sync-world: Backport for JS SDK: Fix instrument_name field handling
- 21:16 cjming@deploy2002: mwscript-k8s job started: emptyUserGroup zhwiki accountcreator '--log-reason=phab:T418089' # T418089
- 21:14 cjming@deploy2002: Finished scap sync-world: Backport for zhwiki: remove accountcreator usergroup (T418089) (duration: 07m 34s)
- 21:11 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp2044.codfw.wmnet with OS trixie
- 21:11 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
- 21:10 cjming@deploy2002: cjming, anzx: Continuing with sync
- 21:10 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
- 21:09 cjming@deploy2002: cjming, anzx: Backport for zhwiki: remove accountcreator usergroup (T418089) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:09 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2043.codfw.wmnet with OS trixie
- 21:07 cjming@deploy2002: Started scap sync-world: Backport for zhwiki: remove accountcreator usergroup (T418089)
- 20:49 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
- 20:49 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
- 20:48 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2043.codfw.wmnet with reason: host reimage
- 20:47 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
- 20:47 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
- 20:42 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2043.codfw.wmnet with reason: host reimage
- 20:24 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS trixie
- 19:12 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.17 refs T413808
- 18:54 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Deploy: Fix scoping logic for haproxy DSL - swfrench@cumin2002"
- 18:54 swfrench@cumin2002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Fix scoping logic for haproxy DSL - swfrench@cumin2002
- 18:53 swfrench@cumin2002: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Deploy: Fix scoping logic for haproxy DSL - swfrench@cumin2002
- 18:53 swfrench@cumin2002: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Deploy: Fix scoping logic for haproxy DSL - swfrench@cumin2002"
- 18:44 urbanecm@deploy2002: Finished scap sync-world: Backport for cleanup: Remove bunch of unnecessary code from ReassignMentees (duration: 07m 26s)
- 18:40 urbanecm@deploy2002: urbanecm: Continuing with sync
- 18:39 urbanecm@deploy2002: urbanecm: Backport for cleanup: Remove bunch of unnecessary code from ReassignMentees synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:36 urbanecm@deploy2002: Started scap sync-world: Backport for cleanup: Remove bunch of unnecessary code from ReassignMentees
- 17:53 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "haproxy moat mode - oblivian@cumin1003"
- 17:53 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: haproxy moat mode - oblivian@cumin1003
- 17:52 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: haproxy moat mode - oblivian@cumin1003
- 17:52 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "haproxy moat mode - oblivian@cumin1003"
- 17:38 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1002.eqiad.wmnet with OS trixie
- 17:33 aqu@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:32 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 17:32 aqu@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/blunderbuss: apply
- 17:31 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: sync
- 17:31 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: sync
- 17:30 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 17:29 cgoubert@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 17:29 cgoubert@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 17:29 cgoubert@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 17:29 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
- 17:28 cgoubert@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 17:28 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
- 17:28 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 17:27 tappof: Deployment of the multi-instance Thanos Store Gateway patches for T412924: Initial groundwork completed
- 17:27 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 17:26 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 17:26 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 17:26 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 17:25 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
- 17:25 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 17:24 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 17:23 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 17:22 jgreen@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:22 jgreen@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove host frqueue2002.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1003"
- 17:22 jgreen@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove host frqueue2002.frack.codfw.wmnet from DNS for decommissioning - jgreen@cumin1003"
- 17:21 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
- 17:18 jgreen@cumin1003: START - Cookbook sre.dns.netbox
- 17:16 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
- 17:15 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 17:15 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
- 17:13 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 17:13 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 17:13 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 17:12 btullis@cumin1003: START - Cookbook sre.hosts.dhcp for host dse-k8s-worker1026.eqiad.wmnet
- 17:12 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host dse-k8s-worker1026.eqiad.wmnet
- 17:12 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 17:12 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 17:12 btullis@cumin1003: START - Cookbook sre.hosts.dhcp for host dse-k8s-worker1026.eqiad.wmnet
- 17:12 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 17:10 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 17:03 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 17:02 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1027.eqiad.wmnet with OS bookworm
- 17:01 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 17:00 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki1002.eqiad.wmnet with OS trixie
- 16:40 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 16:37 dancy@deploy2002: Installation of scap version "4.242.0" completed for 2 hosts
- 16:35 dancy@deploy2002: Installing scap version "4.242.0" for 2 host(s)
- 16:29 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1262 (T415786)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260225-162659-marostegui.json
- 16:29 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1262.eqiad.wmnet with reason: Maintenance
- 16:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1261 (T415786)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260225-162641-marostegui.json
- 16:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2096.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2095.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:27 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 16:25 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
- 16:16 urbanecm@deploy2002: Finished scap sync-world: Backport for Experiments: introduce IExperimentManager (T375198 T415536), Remove PHPDoc blocks that are 100% identical to the code (duration: 06m 44s)
- 16:15 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
- 16:15 elukey@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host pki1002.eqiad.wmnet with OS trixie
- 16:14 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 16:12 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 16:12 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 16:12 urbanecm@deploy2002: urbanecm: Continuing with sync
- 16:12 urbanecm@deploy2002: urbanecm: Backport for Experiments: introduce IExperimentManager (T375198 T415536), Remove PHPDoc blocks that are 100% identical to the code synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1261', diff saved to https://phabricator.wikimedia.org/P89026 and previous config saved to /var/cache/conftool/dbconfig/20260225-161132-marostegui.json
- 16:10 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 16:10 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 16:09 urbanecm@deploy2002: Started scap sync-world: Backport for Experiments: introduce IExperimentManager (T375198 T415536), Remove PHPDoc blocks that are 100% identical to the code
- 16:08 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 16:07 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 16:06 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 16:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2095.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:04 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:03 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2096
- 16:03 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2095
- 16:03 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2096
- 16:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2095
- 16:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2095 to codfw - jhancock@cumin2002"
- 16:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2095 to codfw - jhancock@cumin2002"
- 16:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:01 urbanecm@deploy2002: Finished scap sync-world: Backport for ExperimentManager: remove geForceVariant flag handling (T416894), SiteNoticeGenerator: stop adding per-variant classes (T416894), LevelingUpManager: stop supporting multiple delay specifications (T416894), tests: Introduce MentorRemoverTest (duration: 06m 42s)
- 15:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe2024.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:57 urbanecm@deploy2002: urbanecm: Continuing with sync
- 15:57 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:57 urbanecm@deploy2002: urbanecm: Backport for ExperimentManager: remove geForceVariant flag handling (T416894), SiteNoticeGenerator: stop adding per-variant classes (T416894), LevelingUpManager: stop supporting multiple delay specifications (T416894), tests: Introduce MentorRemoverTest synced to the testservers (see https://wikitech.wikimedia.or
- 15:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1261', diff saved to https://phabricator.wikimedia.org/P89025 and previous config saved to /var/cache/conftool/dbconfig/20260225-155624-marostegui.json
- 15:54 urbanecm@deploy2002: Started scap sync-world: Backport for ExperimentManager: remove geForceVariant flag handling (T416894), SiteNoticeGenerator: stop adding per-variant classes (T416894), LevelingUpManager: stop supporting multiple delay specifications (T416894), tests: Introduce MentorRemoverTest
- 15:46 urbanecm@deploy2002: Finished scap sync-world: Backport for cleanup: Remove bunch of unnecessary code from ReassignMentees (duration: 06m 43s)
- 15:46 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 15:45 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 15:43 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:43 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 15:43 jhathaway@dns1004: END - running authdns-update
- 15:43 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:42 urbanecm@deploy2002: urbanecm: Continuing with sync
- 15:42 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:42 urbanecm@deploy2002: urbanecm: Backport for cleanup: Remove bunch of unnecessary code from ReassignMentees synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:41 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1027.eqiad.wmnet with OS bookworm
- 15:41 jhathaway@dns1004: START - running authdns-update
- 15:41 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1261 (T415786)', diff saved to https://phabricator.wikimedia.org/P89024 and previous config saved to /var/cache/conftool/dbconfig/20260225-154116-marostegui.json
- 15:41 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:41 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki1002.eqiad.wmnet with OS trixie
- 15:40 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pki1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 15:40 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:40 urbanecm@deploy2002: Started scap sync-world: Backport for cleanup: Remove bunch of unnecessary code from ReassignMentees
- 15:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2024.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:36 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2024
- 15:36 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2024
- 15:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2024 to codfw - jhancock@cumin2002"
- 15:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2024 to codfw - jhancock@cumin2002"
- 15:33 kamila@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker2356.codfw.wmnet
- 15:33 kamila@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker2356.codfw.wmnet
- 15:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:31 elukey@cumin1003: START - Cookbook sre.hosts.provision for host pki1002.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
- 15:29 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:29 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:28 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:28 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:27 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:26 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe2023.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2023.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe2023.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:23 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert "Remove deprecated IRS v2 configurations" (duration: 08m 11s)
- 15:20 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:19 urbanecm@deploy2002: stran, urbanecm: Continuing with sync
- 15:19 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:19 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:18 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:18 urbanecm@deploy2002: stran, urbanecm: Backport for Revert "Remove deprecated IRS v2 configurations" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:17 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 15:17 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 15:17 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 15:17 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:17 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 15:16 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:15 urbanecm@deploy2002: Started scap sync-world: Backport for Revert "Remove deprecated IRS v2 configurations"
- 14:59 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 14:52 XioNoX: push pfw policies - T418305
- 14:29 moritzm: installing openssl security updates
- 14:26 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2045.codfw.wmnet with OS trixie
- 14:26 slyngshede@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - slyngshede@cumin1003"
- 14:25 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 14:25 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 14:10 stran@deploy2002: stran: Backport for Remove deprecated IRS v2 configurations (T413951) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:06 stran@deploy2002: Started scap sync-world: Backport for Remove deprecated IRS v2 configurations (T413951)
- 14:02 kamila@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker2356.codfw.wmnet
- 14:01 kamila@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2356.codfw.wmnet
- 14:00 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 13:59 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 13:49 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 13:48 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 13:47 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1026
- 13:47 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1026
- 13:30 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:27 aikochou@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:25 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 13:25 aikochou@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:24 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] Enable on all open Wikipedias (T417023) (duration: 08m 11s)
- 13:22 tappof: Starting deployment of the multi-instance Thanos Store Gateway patches for T412924
- 13:20 urbanecm@deploy2002: urbanecm: Continuing with sync
- 13:18 urbanecm@deploy2002: urbanecm: Backport for [Growth] Enable on all open Wikipedias (T417023) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:17 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 13:16 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] Enable on all open Wikipedias (T417023)
- 13:12 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] Enable wmgGEMentorListJsonSchemaEnabled (T417422) (duration: 07m 24s)
- 13:08 urbanecm@deploy2002: urbanecm: Continuing with sync
- 13:07 urbanecm@deploy2002: urbanecm: Backport for [Growth] Enable wmgGEMentorListJsonSchemaEnabled (T417422) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:05 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] Enable wmgGEMentorListJsonSchemaEnabled (T417422)
- 13:00 urbanecm@deploy2002: Finished scap sync-world: Backport for [Growth] testwiki: Enable wmgGEMentorListJsonSchemaEnabled (T417422) (duration: 12m 00s)
- 12:56 urbanecm@deploy2002: urbanecm: Continuing with sync
- 12:50 urbanecm@deploy2002: urbanecm: Backport for [Growth] testwiki: Enable wmgGEMentorListJsonSchemaEnabled (T417422) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:48 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] testwiki: Enable wmgGEMentorListJsonSchemaEnabled (T417422)
- 12:46 urbanecm@deploy2002: Finished scap sync-world: Backport for feat(DataProvider): Allow logging of read validation failures (T417893), [Growth] Log read failures when JSON schema validation is enabled (T417422 T417893) (duration: 06m 57s)
- 12:45 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 12:44 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 12:42 urbanecm@deploy2002: urbanecm: Continuing with sync
- 12:42 urbanecm@deploy2002: urbanecm: Backport for feat(DataProvider): Allow logging of read validation failures (T417893), [Growth] Log read failures when JSON schema validation is enabled (T417422 T417893) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:42 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 12:41 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 12:40 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 12:40 urbanecm@deploy2002: Started scap sync-world: Backport for feat(DataProvider): Allow logging of read validation failures (T417893), [Growth] Log read failures when JSON schema validation is enabled (T417422 T417893)
- 12:39 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 12:38 fabfur@cumin1003: conftool action : set/pooled=yes; selector: name=cp7009.*
- 12:38 fabfur: repooling cp7009 (T417253)
- 12:29 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
- 12:26 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:26 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:20 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:19 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:16 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:16 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:15 slyngshede@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - slyngshede@cumin1003"
- 12:04 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:03 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 11:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:48 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:42 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:41 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1027.eqiad.wmnet with OS bookworm
- 11:41 fabfur@cumin1003: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp7009*} and A:cp - 3.0 upgrade ()
- 11:41 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:37 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1028.eqiad.wmnet with OS bookworm
- 11:37 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 11:37 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 11:36 fabfur@cumin1003: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp7009*} and A:cp - 3.0 upgrade ()
- 11:36 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 11:35 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 11:34 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1027.eqiad.wmnet with OS bookworm
- 11:34 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1027.eqiad.wmnet with OS bookworm
- 11:32 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:32 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:27 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:27 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:26 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:26 fabfur@cumin1003: conftool action : set/pooled=no; selector: name=cp7009.*
- 11:25 fabfur: depooling cp7009 to upgrade haproxy (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1242427) (T417253)
- 11:23 btullis@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1025.eqiad.wmnet with OS bookworm
- 11:23 btullis@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin2002"
- 11:21 btullis@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin2002"
- 11:20 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
- 11:20 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1028.eqiad.wmnet with reason: host reimage
- 11:19 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab replica
- 11:17 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1027.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:15 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:14 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:14 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:14 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1028.eqiad.wmnet with reason: host reimage
- 11:13 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:13 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:13 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:13 btullis@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 11:09 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab replica
- 11:04 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:03 btullis@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1025.eqiad.wmnet with reason: host reimage
- 11:03 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:02 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1026.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:00 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1028.eqiad.wmnet with OS bookworm
- 10:57 btullis@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1025.eqiad.wmnet with reason: host reimage
- 10:55 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
- 10:53 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1027.eqiad.wmnet with OS bookworm
- 10:46 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1026.eqiad.wmnet with OS bookworm
- 10:46 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
- 10:43 btullis@cumin2002: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1025.eqiad.wmnet with OS bookworm
- 10:41 btullis@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 10:39 btullis@cumin2002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 10:39 btullis@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 10:37 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 10:36 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 10:36 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-worker1024.eqiad.wmnet with OS bookworm
- 10:36 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 10:36 btullis@cumin2002: START - Cookbook sre.hosts.provision for host dse-k8s-worker1025.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 10:35 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - btullis@cumin1003"
- 10:28 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 10:23 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dborch1003.eqiad.wmnet with OS trixie
- 10:18 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-worker1024.eqiad.wmnet with reason: host reimage
- 10:12 btullis@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-worker1024.eqiad.wmnet with reason: host reimage
- 10:09 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1003.eqiad.wmnet with reason: host reimage
- 10:02 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1003.eqiad.wmnet with reason: host reimage
- 09:57 btullis@cumin1003: START - Cookbook sre.hosts.reimage for host dse-k8s-worker1024.eqiad.wmnet with OS bookworm
- 09:54 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host dborch1003.eqiad.wmnet with OS trixie
- 09:54 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:54 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating records after renaming and moving vlan of some an-worker hosts - btullis@cumin1003"
- 09:53 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating records after renaming and moving vlan of some an-worker hosts - btullis@cumin1003"
- 09:52 elukey: uploaded python3-wmflib_3.0.0 to apt.wikimedia.org bullseye-wikimedia,bookworm-wikimedia,trixie-wikimedia
- 09:48 btullis@cumin1003: START - Cookbook sre.dns.netbox
- 09:22 XioNoX: push pfw policies - T418305
- 08:46 ammarpad@deploy2002: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=mediawikiwiki --logwiki=metawiki Egortropeano Fortuna1992 # T418331
- 08:45 ammarpad@deploy2002: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=gawiki --logwiki=metawiki DroopyDoggy AlterDiegos # T418330
- 08:20 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2045.codfw.wmnet with reason: host reimage
- 08:14 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2045.codfw.wmnet with reason: host reimage
- 07:59 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
- 06:16 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1023.eqiad.wmnet with OS trixie
- 05:59 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
- 05:54 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1023.eqiad.wmnet with reason: host reimage
- 05:38 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host dbproxy1023.eqiad.wmnet with OS trixie
- 02:25 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1261 (T415786)', diff saved to https://phabricator.wikimedia.org/P89022 and previous config saved to /var/cache/conftool/dbconfig/20260225-022502-marostegui.json
- 02:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1261.eqiad.wmnet with reason: Maintenance
- 02:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1260 (T415786)', diff saved to https://phabricator.wikimedia.org/P89021 and previous config saved to /var/cache/conftool/dbconfig/20260225-022446-marostegui.json
- 02:23 ryankemper: [WDQS] Restart codfw wdqs-main
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 49s)
- 02:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1260', diff saved to https://phabricator.wikimedia.org/P89020 and previous config saved to /var/cache/conftool/dbconfig/20260225-020938-marostegui.json
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1260', diff saved to https://phabricator.wikimedia.org/P89019 and previous config saved to /var/cache/conftool/dbconfig/20260225-015430-marostegui.json
- 01:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1260 (T415786)', diff saved to https://phabricator.wikimedia.org/P89018 and previous config saved to /var/cache/conftool/dbconfig/20260225-013921-marostegui.json
- 00:26 zabe@deploy2002: Finished scap sync-world: Backport for Start reading from new file tables on all small wikis (T416548) (duration: 06m 40s)
- 00:22 zabe@deploy2002: zabe: Continuing with sync
- 00:21 zabe@deploy2002: zabe: Backport for Start reading from new file tables on all small wikis (T416548) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:19 zabe@deploy2002: Started scap sync-world: Backport for Start reading from new file tables on all small wikis (T416548)
- 00:11 zabe: zabe@deploy2002:~$ foreachwiki extensions/TimedMediaHandler/maintenance/migrateTranscodeStates.php # T415064
- 00:10 zabe@deploy2002: Finished scap sync-world: Backport for Update documenation to reference config-schema.php (duration: 07m 20s)
- 00:06 zabe@deploy2002: zabe: Continuing with sync
- 00:05 zabe@deploy2002: zabe: Backport for Update documenation to reference config-schema.php synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:02 zabe@deploy2002: Started scap sync-world: Backport for Update documenation to reference config-schema.php
2026-02-24
- 23:41 swfrench-wmf: built envoy images (1.35.7-3) - T364245
- 23:29 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ncmonitor1001.eqiad.wmnet with OS trixie
- 23:04 ryankemper: [WDQS] `ryankemper@cumin2002:~$ sudo -E cumin 'A:wdqs-main AND P{wdqs2*} AND NOT P{wdqs2012*}' 'systemctl restart wdqs-blazegraph'` (2012 still seems healthy, rest are all not)
- 22:59 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ncmonitor1001.eqiad.wmnet with reason: host reimage
- 22:58 ryankemper: [WDQS] `ryankemper@cumin2002:~$ sudo -E cumin 'A:wdqs-main AND P{wdqs1*}' 'systemctl restart wdqs-blazegraph'`
- 22:52 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ncmonitor1001.eqiad.wmnet with reason: host reimage
- 22:40 brett@cumin2002: START - Cookbook sre.hosts.reimage for host ncmonitor1001.eqiad.wmnet with OS trixie
- 22:37 brett: import ncmonitor 3.1.0~deb13u1 into trixie-wikimedia (T401832)
- 22:35 hashar: Restarted Gerrit due to a replication config issue
- 21:23 aaron@deploy2002: Finished scap sync-world: Backport for Switch math sandbox specs to plain wikimedia.org (T418188), Copy rest_v1-wikimedia.json to standard-docroot (T418188) (duration: 07m 20s)
- 21:19 aaron@deploy2002: aaron: Continuing with sync
- 21:19 aaron@deploy2002: aaron: Backport for Switch math sandbox specs to plain wikimedia.org (T418188), Copy rest_v1-wikimedia.json to standard-docroot (T418188) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:16 aaron@deploy2002: Started scap sync-world: Backport for Switch math sandbox specs to plain wikimedia.org (T418188), Copy rest_v1-wikimedia.json to standard-docroot (T418188)
- 20:08 dduvall@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.17 refs T413808
- 19:56 dduvall@deploy2002: Finished scap sync-world: testwikis to 1.46.0-wmf.17 refs T413808 (duration: 44m 39s)
- 19:41 jhathaway@dns1004: END - running authdns-update
- 19:40 jhathaway@dns1004: START - running authdns-update
- 19:26 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host dse-k8s-worker1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:20 btullis@cumin1003: START - Cookbook sre.hosts.provision for host dse-k8s-worker1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:19 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1028
- 19:18 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1028
- 19:18 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1027
- 19:18 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1027
- 19:18 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1026
- 19:18 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1026
- 19:17 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1025
- 19:15 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1025
- 19:15 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1024
- 19:14 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1024
- 19:14 btullis@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host dse-k8s-worker1024
- 19:13 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1024
- 19:12 dduvall@deploy2002: Started scap sync-world: testwikis to 1.46.0-wmf.17 refs T413808
- 19:09 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:09 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating records after renaming and moving vlan of some an-worker hosts - btullis@cumin1003"
- 19:09 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Updating records after renaming and moving vlan of some an-worker hosts - btullis@cumin1003"
- 19:05 btullis@cumin1003: START - Cookbook sre.dns.netbox
- 18:56 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 18:53 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts dse-k8s-worker1024.eqiad.wmnet
- 18:53 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:53 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1024.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
- 18:50 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dse-k8s-worker1024.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
- 18:45 btullis@cumin1003: START - Cookbook sre.dns.netbox
- 18:40 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts dse-k8s-worker1024.eqiad.wmnet
- 18:37 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker[1118,1131,1133-1134].eqiad.wmnet
- 18:37 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:37 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1118,1131,1133-1134].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
- 18:34 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1118,1131,1133-1134].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
- 18:29 btullis@cumin1003: START - Cookbook sre.dns.netbox
- 18:18 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker[1118,1131,1133-1134].eqiad.wmnet
- 17:43 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from an-worker1117 to dse-k8s-worker1024
- 17:43 btullis@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dse-k8s-worker1024
- 17:41 btullis@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host dse-k8s-worker1024
- 17:41 btullis@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dse-k8s-worker1024 on all recursors
- 17:41 btullis@cumin1003: START - Cookbook sre.dns.wipe-cache dse-k8s-worker1024 on all recursors
- 17:41 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:41 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1117 to dse-k8s-worker1024 - btullis@cumin1003"
- 17:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2248 (T415786)', diff saved to https://phabricator.wikimedia.org/P89013 and previous config saved to /var/cache/conftool/dbconfig/20260224-174107-marostegui.json
- 17:39 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming an-worker1117 to dse-k8s-worker1024 - btullis@cumin1003"
- 17:29 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 17:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2248', diff saved to https://phabricator.wikimedia.org/P89012 and previous config saved to /var/cache/conftool/dbconfig/20260224-172559-marostegui.json
- 17:17 btullis@cumin1003: START - Cookbook sre.dns.netbox
- 17:17 btullis@cumin1003: START - Cookbook sre.hosts.rename from an-worker1117 to dse-k8s-worker1024
- 17:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2248', diff saved to https://phabricator.wikimedia.org/P89011 and previous config saved to /var/cache/conftool/dbconfig/20260224-171051-marostegui.json
- 16:57 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2248 (T415786)', diff saved to https://phabricator.wikimedia.org/P89010 and previous config saved to /var/cache/conftool/dbconfig/20260224-165542-marostegui.json
- 16:52 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts an-worker[1119-1130,1135-1141].eqiad.wmnet
- 16:52 btullis@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:52 btullis@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1119-1130,1135-1141].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
- 16:51 btullis@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker[1119-1130,1135-1141].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1003"
- 16:43 btullis@cumin1003: START - Cookbook sre.dns.netbox
- 16:09 brennen@deploy2002: Finished deploy [phabricator/deployment@01119c5]: re-deploy phab1004 for T418256 (for real this time) (duration: 01m 01s)
- 16:08 brennen@deploy2002: Started deploy [phabricator/deployment@01119c5]: re-deploy phab1004 for T418256 (for real this time)
- 16:08 brennen@deploy2002: Finished deploy [phabricator/deployment@01119c5]: re-deploy phab2002 for T418256 (for real this time) (duration: 00m 31s)
- 16:07 brennen@deploy2002: Started deploy [phabricator/deployment@01119c5]: re-deploy phab2002 for T418256 (for real this time)
- 16:06 brennen@deploy2002: Finished deploy [phabricator/deployment@aad109e]: deploy phab1004 for T418256 (duration: 01m 55s)
- 16:04 brennen@deploy2002: Started deploy [phabricator/deployment@aad109e]: deploy phab1004 for T418256
- 16:03 brennen@deploy2002: Finished deploy [phabricator/deployment@aad109e]: deploy phab2002 for T418256 (duration: 01m 28s)
- 16:02 brennen@deploy2002: Started deploy [phabricator/deployment@aad109e]: deploy phab2002 for T418256
- 16:00 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 16:00 mutante: gerrit2003 was restarted for maintenance reasons - expecting recovery soon
- 15:59 inflatador: bking@local restarting wdqs codfw main to deal with 5xx errors
- 15:57 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 15:56 dzahn@cumin2002: END (FAIL) - Cookbook sre.gerrit.restart-gerrit (exit_code=99) Restarting Gerrit on gerrit2003
- 15:55 dzahn@cumin2002: START - Cookbook sre.gerrit.restart-gerrit Restarting Gerrit on gerrit2003
- 15:53 ayounsi@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo
- 15:52 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2045.codfw.wmnet with OS trixie
- 15:52 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo
- 15:50 sukhe@dns1004: END - running authdns-update
- 15:49 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.sync-instances (exit_code=0) sync Gerrit data from gerrit2003.wikimedia.org to gerrit2002.wikimedia.org
- 15:48 sukhe: enable IPv6 glue records for ns[02].wikimedia.org: T81605
- 15:48 sukhe@dns1004: START - running authdns-update
- 15:46 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: sync
- 15:46 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: sync
- 15:44 urbanecm: Remove Phabricator MFA for EMcFarland-WMF (T418260)
- 15:44 dwisehaupt@dns1004: END - running authdns-update
- 15:43 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: sync
- 15:42 dwisehaupt@dns1004: START - running authdns-update
- 15:42 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: sync
- 15:40 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
- 15:38 btullis@cumin1003: START - Cookbook sre.hosts.decommission for hosts an-worker[1119-1130,1135-1141].eqiad.wmnet
- 15:35 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab2002.codfw.wmnet,phab[1004-1005].eqiad.wmnet with reason: T418256
- 15:19 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 15:17 arnaudb@cumin1003: START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit2003.wikimedia.org to gerrit2002.wikimedia.org
- 15:15 arnaudb@cumin1003: END (FAIL) - Cookbook sre.gerrit.sync-instances (exit_code=99) sync Gerrit data from gerrit2003.wikimedia.org to gerrit2002.wikimedia.org
- 15:12 arnaudb@cumin1003: START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit2003.wikimedia.org to gerrit2002.wikimedia.org
- 15:11 arnaudb@cumin1003: END (ERROR) - Cookbook sre.gerrit.sync-instances (exit_code=97) sync Gerrit data from gerrit2003.wikimedia.org to gerrit2002.wikimedia.org
- 15:08 arnaudb@cumin1003: START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit2003.wikimedia.org to gerrit2002.wikimedia.org
- 15:02 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 15:01 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bookworm
- 14:59 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 14:53 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 14:43 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
- 14:37 arnaudb@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
- 14:34 slyngshede@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie
- 14:29 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
- 14:17 arnaudb@cumin1003: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bookworm
- 14:14 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2002.wikimedia.org with OS bookworm
- 14:11 awight@deploy2002: Finished scap sync-world: Backport for Subreferencing pilot wikis, phase 2 (T418209) (duration: 08m 16s)
- 14:07 awight@deploy2002: awight: Continuing with sync
- 14:05 awight@deploy2002: awight: Backport for Subreferencing pilot wikis, phase 2 (T418209) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:03 awight@deploy2002: Started scap sync-world: Backport for Subreferencing pilot wikis, phase 2 (T418209)
- 13:54 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
- 13:53 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2045.codfw.wmnet with OS trixie
- 13:50 arnaudb@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2002.wikimedia.org with reason: host reimage
- 13:45 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:44 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:38 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
- 13:30 arnaudb@cumin1003: START - Cookbook sre.hosts.reimage for host gerrit2002.wikimedia.org with OS bookworm
- 13:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts moss-fe[2001-2002].codfw.wmnet
- 13:29 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:29 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moss-fe[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
- 13:29 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: moss-fe[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - mvernon@cumin2002"
- 13:27 arnaudb@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gerrit-replica.discovery.wmnet gerrit-spare.discovery.wmnet on all recursors
- 13:27 arnaudb@cumin1003: START - Cookbook sre.dns.wipe-cache gerrit-replica.discovery.wmnet gerrit-spare.discovery.wmnet on all recursors
- 13:27 arnaudb@dns1004: END - running authdns-update
- 13:27 fceratto@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:26 fceratto@dns1004: END - running authdns-update
- 13:26 arnaudb@dns1004: START - running authdns-update
- 13:25 fceratto@dns1004: START - running authdns-update
- 13:24 fceratto@cumin1003: START - Cookbook sre.dns.netbox
- 13:24 fceratto@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:24 fceratto@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Deploy manual changes from netbox - fceratto@cumin1003"
- 13:21 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 13:20 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 13:20 arnaudb@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gerrit-replica.discovery.wmnet gerrit-spare.discovery.wmnet on all recursors
- 13:20 arnaudb@cumin1003: START - Cookbook sre.dns.wipe-cache gerrit-replica.discovery.wmnet gerrit-spare.discovery.wmnet on all recursors
- 13:15 fceratto@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Deploy manual changes from netbox - fceratto@cumin1003"
- 13:14 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 13:14 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 13:11 fceratto@cumin1003: START - Cookbook sre.dns.netbox
- 13:07 fceratto@dns1004: START - running authdns-update
- 13:04 fceratto@dns1004: START - running authdns-update
- 13:02 arnaudb@dns1004: START - running authdns-update
- 13:01 fceratto@dns1004: START - running authdns-update
- 12:55 slyngshede@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cp2045.codfw.wmnet with OS trixie
- 12:52 fceratto@dns1004: START - running authdns-update
- 12:40 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
- 12:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 12:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 12:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 12:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 12:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 12:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 12:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 12:36 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 12:36 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 12:36 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 12:35 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 12:35 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 12:35 aikochou@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 12:33 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 12:32 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:32 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 12:30 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 12:30 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 12:30 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 12:29 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 12:29 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 12:23 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in codfw/ml-staging-codfw: maintenance
- 12:23 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster pool all services in codfw/ml-staging-codfw: maintenance
- 12:05 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
- 12:05 slyngshede@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie
- 11:52 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
- 11:52 slyngshede@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie
- 11:48 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1204.eqiad.wmnet
- 11:42 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1260 (T415786)', diff saved to https://phabricator.wikimedia.org/P89008 and previous config saved to /var/cache/conftool/dbconfig/20260224-114242-marostegui.json
- 11:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1260.eqiad.wmnet with reason: Maintenance
- 11:42 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T415786)', diff saved to https://phabricator.wikimedia.org/P89007 and previous config saved to /var/cache/conftool/dbconfig/20260224-114217-marostegui.json
- 11:40 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1204.eqiad.wmnet
- 11:38 fceratto@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1003.eqiad.wmnet
- 11:38 fceratto@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 11:36 mvernon@cumin2002: START - Cookbook sre.dns.netbox
- 11:29 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts moss-fe[2001-2002].codfw.wmnet
- 11:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P89006 and previous config saved to /var/cache/conftool/dbconfig/20260224-112708-marostegui.json
- 11:21 Emperor: depool moss-fe200{1,2} prep for decommissioning T416387
- 11:14 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in eqiad/ml-serve-eqiad: maintenance
- 11:14 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster pool all services in eqiad/ml-serve-eqiad: maintenance
- 11:14 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=apus,name=apus-fe2005.codfw.wmnet
- 11:13 mvernon@cumin2002: conftool action : set/pooled=yes; selector: service=apus,name=apus-fe2004.codfw.wmnet
- 11:13 mvernon@cumin2002: conftool action : set/weight=40; selector: service=apus,name=apus-fe2005.codfw.wmnet
- 11:13 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
- 11:12 mvernon@cumin2002: conftool action : set/weight=40; selector: service=apus,name=apus-fe2004.codfw.wmnet
- 11:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252', diff saved to https://phabricator.wikimedia.org/P89005 and previous config saved to /var/cache/conftool/dbconfig/20260224-111159-marostegui.json
- 11:00 dpogorzelski@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=codfw
- 11:00 dpogorzelski@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=eqiad
- 11:00 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in eqiad/ml-serve-eqiad: maintenance
- 10:59 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster pool all services in eqiad/ml-serve-eqiad: maintenance
- 10:58 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.wipe-cluster (exit_code=0) Wipe the K8s cluster ml-serve-eqiad: Kubernetes upgrade
- 10:57 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 10:57 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 10:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1252 (T415786)', diff saved to https://phabricator.wikimedia.org/P89003 and previous config saved to /var/cache/conftool/dbconfig/20260224-105651-marostegui.json
- 10:56 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 10:56 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 10:56 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 10:55 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 10:55 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 10:55 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 10:54 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
- 10:54 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 10:54 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 10:54 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
- 10:53 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 10:53 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 10:52 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
- 10:52 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 10:51 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 10:51 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 10:51 fceratto@cumin1003: START - Cookbook sre.dns.netbox
- 10:51 fceratto@cumin1003: START - Cookbook sre.ganeti.makevm for new host dborch1003.eqiad.wmnet
- 10:51 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 10:51 fceratto@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host dborch1003.eqiad.wmnet
- 10:51 fceratto@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dborch1003.eqiad.wmnet with OS trixie
- 10:44 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 10:41 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 10:36 slyngshede@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie
- 10:35 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
- 10:34 slyngshede@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp2045.codfw.wmnet with OS trixie
- 10:28 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 10:28 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 10:24 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 10:24 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 10:23 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 10:23 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 10:23 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 10:22 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 10:22 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 10:22 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 10:21 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 10:21 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 10:19 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 10:18 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 10:18 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 10:17 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 10:17 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
- 10:17 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
- 10:02 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host dborch1003.eqiad.wmnet with OS trixie
- 10:00 dpogorzelski@cumin1003: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster ml-serve-eqiad: Kubernetes upgrade
- 09:55 dpogorzelski@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=recommendation-api,name=codfw
- 09:55 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) depool all services in eqiad/ml-serve-eqiad: maintenance
- 09:54 fceratto@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dborch1003.eqiad.wmnet - fceratto@cumin1003"
- 09:54 fceratto@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dborch1003.eqiad.wmnet - fceratto@cumin1003"
- 09:54 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster depool all services in eqiad/ml-serve-eqiad: maintenance
- 09:54 fceratto@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1003.eqiad.wmnet on all recursors
- 09:54 fceratto@cumin1003: START - Cookbook sre.dns.wipe-cache dborch1003.eqiad.wmnet on all recursors
- 09:54 fceratto@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:54 fceratto@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1003.eqiad.wmnet - fceratto@cumin1003"
- 09:53 fceratto@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1003.eqiad.wmnet - fceratto@cumin1003"
- 09:45 fceratto@cumin1003: START - Cookbook sre.dns.netbox
- 09:45 fceratto@cumin1003: START - Cookbook sre.ganeti.makevm for new host dborch1003.eqiad.wmnet
- 09:45 fceratto@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1003.eqiad.wmnet
- 09:45 fceratto@cumin1003: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
- 09:45 fceratto@cumin1003: START - Cookbook sre.dns.netbox
- 09:44 fceratto@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 09:43 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp2045.codfw.wmnet with OS trixie
- 09:42 fceratto@cumin1003: START - Cookbook sre.dns.netbox
- 09:42 fceratto@cumin1003: START - Cookbook sre.ganeti.makevm for new host dborch1003.eqiad.wmnet
- 09:08 kharlan@deploy2002: Finished scap sync-world: Backport for IPInfo: Grant ipinfo-view-arbitrary-ip to checkuser group (T374718) (duration: 09m 29s)
- 09:04 kharlan@deploy2002: kharlan: Continuing with sync
- 09:01 kharlan@deploy2002: kharlan: Backport for IPInfo: Grant ipinfo-view-arbitrary-ip to checkuser group (T374718) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:59 kharlan@deploy2002: Started scap sync-world: Backport for IPInfo: Grant ipinfo-view-arbitrary-ip to checkuser group (T374718)
- 08:40 mlitn@deploy2002: Finished scap sync-world: Backport for Minerva TOC: reserve space for the article page heading button (T417932) (duration: 06m 33s)
- 08:36 mlitn@deploy2002: mlitn: Continuing with sync
- 08:35 mlitn@deploy2002: mlitn: Backport for Minerva TOC: reserve space for the article page heading button (T417932) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:33 mlitn@deploy2002: Started scap sync-world: Backport for Minerva TOC: reserve space for the article page heading button (T417932)
- 08:15 mlitn@deploy2002: Finished scap sync-world: Backport for Squashed diff to master (duration: 07m 23s)
- 08:11 mlitn@deploy2002: mlitn: Continuing with sync
- 08:10 mlitn@deploy2002: mlitn: Backport for Squashed diff to master synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:08 mlitn@deploy2002: Started scap sync-world: Backport for Squashed diff to master
- 07:48 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2248 (T415786)', diff saved to https://phabricator.wikimedia.org/P89002 and previous config saved to /var/cache/conftool/dbconfig/20260224-074831-marostegui.json
- 07:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2248.codfw.wmnet with reason: Maintenance
- 07:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 (T415786)', diff saved to https://phabricator.wikimedia.org/P89001 and previous config saved to /var/cache/conftool/dbconfig/20260224-074806-marostegui.json
- 07:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P89000 and previous config saved to /var/cache/conftool/dbconfig/20260224-073258-marostegui.json
- 07:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247', diff saved to https://phabricator.wikimedia.org/P88999 and previous config saved to /var/cache/conftool/dbconfig/20260224-071750-marostegui.json
- 07:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2247 (T415786)', diff saved to https://phabricator.wikimedia.org/P88998 and previous config saved to /var/cache/conftool/dbconfig/20260224-070241-marostegui.json
- 06:08 marostegui: Deploy schema change on dbstore1007:3314 T415786
- 06:06 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Schema change
- 05:56 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool pc1011: Repooling pc1 after migration to Debian trixie
- 05:56 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
- 05:56 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
- 05:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool pc1011: Repooling pc1 after migration to Debian trixie
- 05:01 mwpresync@deploy2002: Pruned MediaWiki: 1.46.0-wmf.14 (duration: 01m 10s)
- 01:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo
- 01:50 pt1979@cumin2002: START - Cookbook sre.network.tls for network device asw1-23-ulsfo
- 01:41 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo
- 01:41 pt1979@cumin2002: START - Cookbook sre.network.tls for network device asw1-23-ulsfo
2026-02-23
- 23:18 sbassett: Deployed security fix for T417603
- 23:09 sbassett: Deployed security fix for T416090
- 22:51 sbassett: Deployed security fix for T418122
- 22:31 dzahn@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 22:21 dzahn@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 22:04 tgr_: running foreachwikiindblist sul CentralAuth:UpdateAutomaticGlobalGroupMembership --local-group=bot
- 22:04 tgr_: UTC late deploys done
- 22:04 tgr@deploy2002: Finished scap sync-world: Backport for Configure rate limit class for local bots (and local-bot global group) (T415588) (duration: 10m 06s)
- 22:00 tgr@deploy2002: tgr, matmarex: Continuing with sync
- 21:56 tgr@deploy2002: tgr, matmarex: Backport for Configure rate limit class for local bots (and local-bot global group) (T415588) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:54 tgr@deploy2002: Started scap sync-world: Backport for Configure rate limit class for local bots (and local-bot global group) (T415588)
- 21:45 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1252 (T415786)', diff saved to https://phabricator.wikimedia.org/P88994 and previous config saved to /var/cache/conftool/dbconfig/20260223-214512-marostegui.json
- 21:45 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1252.eqiad.wmnet with reason: Maintenance
- 21:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T415786)', diff saved to https://phabricator.wikimedia.org/P88993 and previous config saved to /var/cache/conftool/dbconfig/20260223-214447-marostegui.json
- 21:42 tgr_: running foreachwikiindblist sul CentralAuth:UpdateAutomaticGlobalGroupMembership --local-group=checkuser --local-group=suppress
- {{safesubst:SAL entry|1=21:41 tgr@deploy2002: Finished scap sync-world: Backport for Revert "extension-list: add a bogus extension to test l10n-update" (T411516), Pre-deploy Comparative Reader Research survey on enwiki (T417829), Pre-deploy Comparative Reader Research survey on eswiki (T417834), [[gerrit:1242493|Edit check: catch various places where an error could derail things (T406836 T41}}
- 21:35 tgr@deploy2002: matmarex, bd808, dani, tgr, kemayo: Continuing with sync
- {{safesubst:SAL entry|1=21:32 tgr@deploy2002: matmarex, bd808, dani, tgr, kemayo: Backport for Revert "extension-list: add a bogus extension to test l10n-update" (T411516), Pre-deploy Comparative Reader Research survey on enwiki (T417829), Pre-deploy Comparative Reader Research survey on eswiki (T417834), [[gerrit:1242493|Edit check: catch various places where an error could derail things (T}}
- 21:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P88992 and previous config saved to /var/cache/conftool/dbconfig/20260223-212938-marostegui.json
- {{safesubst:SAL entry|1=21:27 tgr@deploy2002: Started scap sync-world: Backport for Revert "extension-list: add a bogus extension to test l10n-update" (T411516), Pre-deploy Comparative Reader Research survey on enwiki (T417829), Pre-deploy Comparative Reader Research survey on eswiki (T417834), [[gerrit:1242493|Edit check: catch various places where an error could derail things (T406836 T418}}
- 21:18 toyofuku@deploy2002: Finished scap sync-world: Backport for Migrate default user preference configuration to Community Configuration (T415355), Change "Learn more" link underneath Baby Globe on Minerva (T417077), i18n: Update community configuration copy (T415346) (duration: 40m 58s)
- 21:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P88991 and previous config saved to /var/cache/conftool/dbconfig/20260223-211430-marostegui.json
- 21:05 toyofuku@deploy2002: bwang, jdrewniak, toyofuku: Continuing with sync
- 21:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2023.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:01 toyofuku@deploy2002: bwang, jdrewniak, toyofuku: Backport for Migrate default user preference configuration to Community Configuration (T415355), Change "Learn more" link underneath Baby Globe on Minerva (T417077), i18n: Update community configuration copy (T415346) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be v
- 21:00 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2023
- 21:00 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2023
- 21:00 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2023
- 21:00 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2023
- 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2023 to codfw - jhancock@cumin2002"
- 20:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2023 to codfw - jhancock@cumin2002"
- 20:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T415786)', diff saved to https://phabricator.wikimedia.org/P88990 and previous config saved to /var/cache/conftool/dbconfig/20260223-205921-marostegui.json
- 20:55 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 20:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2022.codfw.wmnet with OS bullseye
- 20:48 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:37 toyofuku@deploy2002: Started scap sync-world: Backport for Migrate default user preference configuration to Community Configuration (T415355), Change "Learn more" link underneath Baby Globe on Minerva (T417077), i18n: Update community configuration copy (T415346)
- 20:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2022.codfw.wmnet with reason: host reimage
- 20:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2022.codfw.wmnet with reason: host reimage
- 19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2022.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2022.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2022.codfw.wmnet with OS bullseye
- 19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2022.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:35 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 19:34 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 19:34 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 19:32 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 19:26 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 19:25 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 19:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2022.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2022
- 19:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2022
- 19:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2022 to codfw - jhancock@cumin2002"
- 19:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2022 to codfw - jhancock@cumin2002"
- 19:10 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 19:09 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2022 to codfw - jhancock@cumin2002"
- 19:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2022 to codfw - jhancock@cumin2002"
- 19:06 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 19:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe2021.codfw.wmnet with OS bullseye
- 19:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 18:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe2021.codfw.wmnet with reason: host reimage
- 18:10 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2247 (T415786)', diff saved to https://phabricator.wikimedia.org/P88988 and previous config saved to /var/cache/conftool/dbconfig/20260223-181011-marostegui.json
- 18:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2247.codfw.wmnet with reason: Maintenance
- 18:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 (T415786)', diff saved to https://phabricator.wikimedia.org/P88987 and previous config saved to /var/cache/conftool/dbconfig/20260223-180947-marostegui.json
- 18:08 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe2021.codfw.wmnet with reason: host reimage
- 18:02 ayounsi@cumin1003: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device asw1-23-ulsfo
- 18:01 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo
- 18:00 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster staging-eqiad: trixie upgrade
- 18:00 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1005.eqiad.wmnet with OS trixie
- 17:58 ayounsi@cumin1003: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo
- 17:58 ayounsi@cumin1003: START - Cookbook sre.network.tls for network device asw1-23-ulsfo
- 17:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2021.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P88986 and previous config saved to /var/cache/conftool/dbconfig/20260223-175438-marostegui.json
- 17:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2021.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246', diff saved to https://phabricator.wikimedia.org/P88985 and previous config saved to /var/cache/conftool/dbconfig/20260223-173930-marostegui.json
- 17:38 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1005.eqiad.wmnet with reason: host reimage
- 17:35 dpogorzelski@cumin1003: conftool action : set/pooled=true; selector: dnsdisc=recommendation-api,name=codfw
- 17:32 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1005.eqiad.wmnet with reason: host reimage
- 17:31 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) pool all services in codfw/ml-serve-codfw: maintenance
- 17:30 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster pool all services in codfw/ml-serve-codfw: maintenance
- 17:28 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.wipe-cluster (exit_code=0) Wipe the K8s cluster ml-serve-codfw: Kubernetes upgrade
- 17:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-fe2021.codfw.wmnet with OS bullseye
- 17:25 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
- 17:25 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
- 17:25 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host frqueue2004
- 17:25 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
- 17:25 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'edit-check' for release 'main' .
- 17:24 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host frqueue2004
- 17:24 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
- 17:24 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 17:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2246 (T415786)', diff saved to https://phabricator.wikimedia.org/P88983 and previous config saved to /var/cache/conftool/dbconfig/20260223-172421-marostegui.json
- 17:24 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
- 17:24 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 17:23 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 17:23 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revise-tone-task-generator' for release 'main' .
- 17:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2021.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:23 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
- 17:23 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
- 17:22 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
- 17:22 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
- 17:22 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
- 17:21 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
- 17:21 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
- 17:20 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
- 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:20 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frqueue2004 to codfw - jhancock@cumin2002"
- 17:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frqueue2004 to codfw - jhancock@cumin2002"
- 17:18 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestagemaster1005.eqiad.wmnet with OS trixie
- 17:16 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 17:14 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1004.eqiad.wmnet with OS trixie
- 17:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2021.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2021.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:07 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
- 17:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2021.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:06 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 17:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2021 to codfw - jhancock@cumin2002"
- 17:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2021 to codfw - jhancock@cumin2002"
- 17:01 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 16:59 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:58 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 16:57 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2021 to codfw - jhancock@cumin2002"
- 16:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2021 to codfw - jhancock@cumin2002"
- 16:54 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1004.eqiad.wmnet with reason: host reimage
- 16:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe2021.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:48 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 16:47 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1004.eqiad.wmnet with reason: host reimage
- 16:45 jdrewniak@deploy2002: Finished scap sync-world: Backport for Updating portals submodule for Wikipedia 25 birthday (T128546) (duration: 06m 59s)
- 16:41 jdrewniak@deploy2002: jdrewniak: Continuing with sync
- 16:40 jdrewniak@deploy2002: jdrewniak: Backport for Updating portals submodule for Wikipedia 25 birthday (T128546) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:38 jdrewniak@deploy2002: Started scap sync-world: Backport for Updating portals submodule for Wikipedia 25 birthday (T128546)
- 16:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-fe2021.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:36 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe2021
- 16:36 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe2021
- 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2021 to codfw - jhancock@cumin2002"
- 16:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-fe2021 to codfw - jhancock@cumin2002"
- 16:32 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestagemaster1004.eqiad.wmnet with OS trixie
- 16:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:31 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:30 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 16:30 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:30 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 16:30 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:29 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 16:29 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:29 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 16:29 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:29 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster1003.eqiad.wmnet with OS trixie
- 16:29 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 16:29 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:28 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 16:26 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:19 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 16:18 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo
- 16:17 pt1979@cumin2002: START - Cookbook sre.network.tls for network device asw1-23-ulsfo
- 16:14 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:12 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 16:10 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:10 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 16:10 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:09 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
- 16:09 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
- 16:08 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster1003.eqiad.wmnet with reason: host reimage
- 16:05 pt1979@cumin2002: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device asw1-23-ulsfo
- 16:04 pt1979@cumin2002: START - Cookbook sre.network.tls for network device asw1-23-ulsfo
- 16:04 pt1979@cumin2002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-22-ulsfo
- 16:04 pt1979@cumin2002: START - Cookbook sre.network.tls for network device asw1-22-ulsfo
- 16:02 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster1003.eqiad.wmnet with reason: host reimage
- 15:48 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestagemaster1003.eqiad.wmnet with OS trixie
- 15:44 jayme@cumin1003: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-eqiad: trixie upgrade
- 15:37 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: wikikube-staging-worker-codfw@codfw
- 15:37 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 15:25 jayme@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 15:23 jayme@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: wikikube-staging-worker-codfw@codfw
- 15:23 jayme@cumin1003: END (ERROR) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=97) for alias: wikikube-staging-worker-codfw@codfw
- 15:23 jayme@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: wikikube-staging-worker-codfw@codfw
- 15:20 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: wikikube-staging-worker-eqiad@eqiad
- 15:20 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 15:19 jayme@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 15:14 jayme@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: wikikube-staging-worker-eqiad@eqiad
- 15:12 dpogorzelski@cumin1003: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster ml-serve-codfw: Kubernetes upgrade
- 15:09 jayme@deploy1003: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=kubernetes-staging,service=kubesvc
- 15:09 jayme@deploy1003: conftool action : set/weight=10; selector: dc=eqiad,cluster=kubernetes-staging,service=kubesvc
- 15:07 jayme@deploy1003: conftool action : gfet; selector: dc=eqiad,cluster=kubesvc
- 14:59 dpogorzelski@cumin1003: conftool action : set/pooled=false; selector: dnsdisc=recommendation-api,name=codfw
- 14:55 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) depool all services in codfw/ml-serve-codfw: maintenance
- 14:54 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster depool all services in codfw/ml-serve-codfw: maintenance
- 14:45 moritzm: installing busybox updates from Bookworm point release
- 14:40 tgr_: UTC afternoon deploy window over (skipped)
- 14:38 sukhe@cumin1003: END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: pool esams [reason: no reason specified, no task ID specified]
- 14:38 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool esams [reason: no reason specified, no task ID specified]
- 14:38 sukhe: dummy sre.dns.admin run
- 14:15 Amir1: cleaning useless rows of bot_passwords (T417977)
- 14:06 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1097.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:05 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster staging-codfw: trixie upgrade
- 14:05 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2005.codfw.wmnet with OS trixie
- 14:01 Raine: added Calico BGPPeers for ToR switches in all k8s clusters
- 14:00 kamila@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 13:58 kamila@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 13:58 kamila@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 13:57 kamila@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 13:55 kamila@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 13:54 kamila@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 13:53 kamila@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:52 kamila@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:51 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 13:50 kamila@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 13:50 kamila@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 13:50 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1096.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:49 kamila@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 13:44 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
- 13:37 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2005.codfw.wmnet with reason: host reimage
- 13:24 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:24 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-be1097.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:24 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:24 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-be1096.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:24 Emperor: start reef 18.2.7 upgrade of codfw apus frontends T417396
- 13:23 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1019.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:22 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:20 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestagemaster2005.codfw.wmnet with OS trixie
- 13:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netflow7002.magru.wmnet
- 13:16 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2004.codfw.wmnet with OS trixie
- 13:14 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netflow7002.magru.wmnet
- 13:13 kamila@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:12 kamila@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:11 kamila@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 13:10 kamila@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 13:09 kamila@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 13:08 kamila@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 13:07 kamila@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 13:06 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2011.codfw.wmnet with OS trixie
- 13:06 kamila@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 13:06 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 13:05 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Filter for suppressed usernames (T417868) (duration: 08m 39s)
- 13:05 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1020.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:04 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 13:02 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1018.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:02 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1016.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:02 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1017.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:02 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1019.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:02 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 12:59 dreamyjazz@deploy2002: dreamyjazz: Backport for Filter for suppressed usernames (T417868) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:58 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 12:58 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1016-20 - jclark@cumin1003"
- 12:58 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1016-20 - jclark@cumin1003"
- 12:57 dreamyjazz@deploy2002: Started scap sync-world: Backport for Filter for suppressed usernames (T417868)
- 12:55 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 12:54 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2004.codfw.wmnet with reason: host reimage
- 12:48 Emperor: start reef 18.2.7 upgrade of codfw apus storage nodes T417396
- 12:48 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2004.codfw.wmnet with reason: host reimage
- 12:42 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2011.codfw.wmnet with reason: host reimage
- 12:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:40 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:39 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:39 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2011.codfw.wmnet with reason: host reimage
- 12:36 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 12:36 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert^2 "[Growth] Force legacy validation of GrowthMentorList" (T417422) (duration: 15m 50s)
- 12:35 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 12:34 kamila@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 12:34 kamila@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 12:34 claime: Rebuilding envoy image - T364245
- 12:34 kamila@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 12:33 kamila@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 12:33 kamila@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 12:32 kamila@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 12:32 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:32 urbanecm@deploy2002: urbanecm: Continuing with sync
- 12:30 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 12:28 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestagemaster2004.codfw.wmnet with OS trixie
- 12:24 jayme@cumin1003: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-codfw: trixie upgrade
- 12:22 urbanecm@deploy2002: urbanecm: Backport for Revert^2 "[Growth] Force legacy validation of GrowthMentorList" (T417422) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:20 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host pc2011.codfw.wmnet with OS trixie
- 12:20 urbanecm@deploy2002: Started scap sync-world: Backport for Revert^2 "[Growth] Force legacy validation of GrowthMentorList" (T417422)
- 12:18 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1011.eqiad.wmnet with OS trixie
- 12:08 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
- 12:07 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
- 12:06 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
- 12:06 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
- 12:05 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
- 12:04 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
- 12:04 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 12:04 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 11:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1011.eqiad.wmnet with reason: host reimage
- 11:57 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 11:56 derick@deploy2002: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=ptwiki --logwiki=metawiki 'Bianca Fernandes Dias' Greenlighrts # T418113
- 11:56 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 11:54 kamila@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) pool for host wikikube-worker2356.codfw.wmnet
- 11:53 kamila@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2356.codfw.wmnet
- 11:53 cgoubert@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 11:53 cgoubert@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 11:53 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1011.eqiad.wmnet with reason: host reimage
- 11:49 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.reimage-stacked-control-plane (exit_code=0) Reimaging k8s control planes of cluster staging-codfw: trixie upgrade
- 11:49 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestagemaster2003.codfw.wmnet with OS trixie
- 11:48 Emperor: start reef 18.2.7 upgrade of eqiad apus frontends T417396
- 11:39 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host pc1011.eqiad.wmnet with OS trixie
- 11:37 kamila@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 11:37 kamila@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on pc2011.codfw.wmnet,pc1011.eqiad.wmnet with reason: Upgrade to debian trixie
- 11:36 marostegui@cumin1003: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on pc2011.codfw.wmnet,pc1011.eqiad.wmnet with reason: Upgrade to debian trixie
- 11:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc1011: Depooling pc1
- 11:35 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
- 11:34 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
- 11:34 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc1011: Depooling pc1
- 11:32 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Drop $wgIPReputationEnableLoginCaptchaIfIPKnown (T416941) (duration: 06m 58s)
- 11:28 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 11:28 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
- 11:27 dreamyjazz@deploy2002: dreamyjazz: Backport for Drop $wgIPReputationEnableLoginCaptchaIfIPKnown (T416941) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:25 dreamyjazz@deploy2002: Started scap sync-world: Backport for Drop $wgIPReputationEnableLoginCaptchaIfIPKnown (T416941)
- 11:23 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestagemaster2003.codfw.wmnet with reason: host reimage
- 11:21 Emperor: start reef 18.2.7 upgrade of eqiad apus storage nodes T417396
- 11:10 marostegui@dns1006: END - running authdns-update
- 11:09 marostegui: Failover m5-master T414656
- 11:08 marostegui@dns1006: START - running authdns-update
- 11:04 urbanecm@deploy2002: Finished scap sync-world: Backport for cleanup: Remove unused code, Validate mentor list using a JSON schema (T417422), Temporarily switch mentor list validation to legacy validator (T417422) (duration: 06m 46s)
- 11:02 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestagemaster2003.codfw.wmnet with OS trixie
- 11:00 urbanecm@deploy2002: urbanecm: Continuing with sync
- 10:59 urbanecm@deploy2002: urbanecm: Backport for cleanup: Remove unused code, Validate mentor list using a JSON schema (T417422), Temporarily switch mentor list validation to legacy validator (T417422) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 10:58 jayme@cumin1003: START - Cookbook sre.k8s.reimage-stacked-control-plane Reimaging k8s control planes of cluster staging-codfw: trixie upgrade
- 10:57 urbanecm@deploy2002: Started scap sync-world: Backport for cleanup: Remove unused code, Validate mentor list using a JSON schema (T417422), Temporarily switch mentor list validation to legacy validator (T417422)
- 10:57 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 10:57 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 10:56 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 10:56 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 10:55 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 10:55 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 10:52 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 10:52 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 10:43 kamila@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) pool for host wikikube-worker2356.codfw.wmnet
- 10:42 kamila@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2356.codfw.wmnet
- 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-fe2005.codfw.wmnet with OS bookworm
- 10:41 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
- 10:41 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
- 10:36 kamila@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) pool for host wikikube-worker2356.codfw.wmnet
- 10:36 kamila@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker2356.codfw.wmnet
- 10:23 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 10:21 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 10:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-fe2005.codfw.wmnet with reason: host reimage
- 10:15 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on apus-fe2005.codfw.wmnet with reason: host reimage
- 10:09 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 10:07 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 09:57 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2005.codfw.wmnet with OS bookworm
- 09:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host apus-fe2004.codfw.wmnet with OS bookworm
- 09:57 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
- 09:56 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
- 09:55 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 09:54 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 09:42 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert "[Growth] Force legacy validation of GrowthMentorList" (T417422) (duration: 06m 00s)
- 09:36 urbanecm@deploy2002: Started scap sync-world: Backport for Revert "[Growth] Force legacy validation of GrowthMentorList" (T417422)
- 09:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on apus-fe2004.codfw.wmnet with reason: host reimage
- 09:31 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on apus-fe2004.codfw.wmnet with reason: host reimage
- 09:23 urbanecm@deploy2002: urbanecm: Backport for [Growth] Force legacy validation of GrowthMentorList (T417422) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:21 urbanecm@deploy2002: Started scap sync-world: Backport for [Growth] Force legacy validation of GrowthMentorList (T417422)
- 09:18 kharlan@deploy2002: Finished scap sync-world: Backport for HCaptchaEnterpriseHealthChecker: Use a cache hit for health check (T412947) (duration: 08m 43s)
- 09:14 kharlan@deploy2002: kharlan: Continuing with sync
- 09:13 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2004.codfw.wmnet with OS bookworm
- 09:11 kharlan@deploy2002: kharlan: Backport for HCaptchaEnterpriseHealthChecker: Use a cache hit for health check (T412947) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:10 kharlan@deploy2002: Started scap sync-world: Backport for HCaptchaEnterpriseHealthChecker: Use a cache hit for health check (T412947)
- 08:56 mszwarc@deploy2002: Finished scap sync-world: Backport for Squashed diff to master, Lift IP cap for Editathon on commonswiki, eswiki (T417830), IPoid: Retry on intermittent network errors in OpenSearch fetcher (T417908), IPReputation: Lower IPoid request and connect timeouts (T417910) (duration: 12m 05s)
- 08:53 hashar@deploy2002: Finished deploy [integration/docroot@1641910]: build: update misc dependencies 50ce133 11dba19 1641910 (duration: 00m 12s)
- 08:53 hashar@deploy2002: Started deploy [integration/docroot@1641910]: build: update misc dependencies 50ce133 11dba19 1641910
- 08:50 mszwarc@deploy2002: mlitn, kharlan, anzx, mszwarc: Continuing with sync
- 08:48 mszwarc@deploy2002: mlitn, kharlan, anzx, mszwarc: Backport for Squashed diff to master, Lift IP cap for Editathon on commonswiki, eswiki (T417830), IPoid: Retry on intermittent network errors in OpenSearch fetcher (T417908), IPReputation: Lower IPoid request and connect timeouts (T417910) synced to the testservers (see https://wikitech.wikime
- 08:44 mszwarc@deploy2002: Started scap sync-world: Backport for Squashed diff to master, Lift IP cap for Editathon on commonswiki, eswiki (T417830), IPoid: Retry on intermittent network errors in OpenSearch fetcher (T417908), IPReputation: Lower IPoid request and connect timeouts (T417910)
- 08:43 mszwarc@deploy2002: Finished scap sync-world: Backport for Ensure that sysops don't have '(oathauth-recover-for-user)' right (T417877), cirrus: enable default_sort for completion on a set of wikis (T404858) (duration: 37m 07s)
- 08:30 mszwarc@deploy2002: mszwarc, dcausse: Continuing with sync
- 08:29 mszwarc@deploy2002: mszwarc, dcausse: Backport for Ensure that sysops don't have '(oathauth-recover-for-user)' right (T417877), cirrus: enable default_sort for completion on a set of wikis (T404858) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:06 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1249 (T415786)', diff saved to https://phabricator.wikimedia.org/P88979 and previous config saved to /var/cache/conftool/dbconfig/20260223-080657-marostegui.json
- 08:06 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
- 08:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T415786)', diff saved to https://phabricator.wikimedia.org/P88978 and previous config saved to /var/cache/conftool/dbconfig/20260223-080644-marostegui.json
- 08:06 mszwarc@deploy2002: Started scap sync-world: Backport for Ensure that sysops don't have '(oathauth-recover-for-user)' right (T417877), cirrus: enable default_sort for completion on a set of wikis (T404858)
- 07:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P88977 and previous config saved to /var/cache/conftool/dbconfig/20260223-075135-marostegui.json
- 07:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P88976 and previous config saved to /var/cache/conftool/dbconfig/20260223-073627-marostegui.json
- 07:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T415786)', diff saved to https://phabricator.wikimedia.org/P88975 and previous config saved to /var/cache/conftool/dbconfig/20260223-072119-marostegui.json
- 04:42 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2246 (T415786)', diff saved to https://phabricator.wikimedia.org/P88974 and previous config saved to /var/cache/conftool/dbconfig/20260223-044209-marostegui.json
- 04:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2246.codfw.wmnet with reason: Maintenance
- 04:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 (T415786)', diff saved to https://phabricator.wikimedia.org/P88973 and previous config saved to /var/cache/conftool/dbconfig/20260223-044144-marostegui.json
- 04:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P88972 and previous config saved to /var/cache/conftool/dbconfig/20260223-042636-marostegui.json
- 04:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245', diff saved to https://phabricator.wikimedia.org/P88971 and previous config saved to /var/cache/conftool/dbconfig/20260223-041128-marostegui.json
- 03:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2245 (T415786)', diff saved to https://phabricator.wikimedia.org/P88970 and previous config saved to /var/cache/conftool/dbconfig/20260223-035619-marostegui.json
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 50s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-22
- 18:36 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1248 (T415786)', diff saved to https://phabricator.wikimedia.org/P88969 and previous config saved to /var/cache/conftool/dbconfig/20260222-183605-marostegui.json
- 18:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
- 18:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T415786)', diff saved to https://phabricator.wikimedia.org/P88968 and previous config saved to /var/cache/conftool/dbconfig/20260222-183541-marostegui.json
- 18:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P88967 and previous config saved to /var/cache/conftool/dbconfig/20260222-182032-marostegui.json
- 18:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P88966 and previous config saved to /var/cache/conftool/dbconfig/20260222-180524-marostegui.json
- 17:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T415786)', diff saved to https://phabricator.wikimedia.org/P88965 and previous config saved to /var/cache/conftool/dbconfig/20260222-175016-marostegui.json
- 14:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2245 (T415786)', diff saved to https://phabricator.wikimedia.org/P88964 and previous config saved to /var/cache/conftool/dbconfig/20260222-144110-marostegui.json
- 14:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2245.codfw.wmnet with reason: Maintenance
- 04:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1247 (T415786)', diff saved to https://phabricator.wikimedia.org/P88963 and previous config saved to /var/cache/conftool/dbconfig/20260222-044859-marostegui.json
- 04:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
- 03:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2239.codfw.wmnet with reason: Maintenance
- 03:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T415786)', diff saved to https://phabricator.wikimedia.org/P88962 and previous config saved to /var/cache/conftool/dbconfig/20260222-032412-marostegui.json
- 03:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P88961 and previous config saved to /var/cache/conftool/dbconfig/20260222-030904-marostegui.json
- 02:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237', diff saved to https://phabricator.wikimedia.org/P88960 and previous config saved to /var/cache/conftool/dbconfig/20260222-025355-marostegui.json
- 02:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2237 (T415786)', diff saved to https://phabricator.wikimedia.org/P88959 and previous config saved to /var/cache/conftool/dbconfig/20260222-023847-marostegui.json
- 02:01 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 01m 06s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-21
- 17:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
- 17:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T415786)', diff saved to https://phabricator.wikimedia.org/P88958 and previous config saved to /var/cache/conftool/dbconfig/20260221-172135-marostegui.json
- 17:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P88957 and previous config saved to /var/cache/conftool/dbconfig/20260221-170628-marostegui.json
- 16:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P88956 and previous config saved to /var/cache/conftool/dbconfig/20260221-165120-marostegui.json
- 16:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T415786)', diff saved to https://phabricator.wikimedia.org/P88955 and previous config saved to /var/cache/conftool/dbconfig/20260221-163612-marostegui.json
- 15:41 inflatador: restart wdqs CODFW in response to huge error rates https://w.wiki/Hw9u
- 15:33 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.restart (exit_code=99)
- 15:28 bking@cumin2002: START - Cookbook sre.wdqs.restart
- 14:18 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2237 (T415786)', diff saved to https://phabricator.wikimedia.org/P88954 and previous config saved to /var/cache/conftool/dbconfig/20260221-141809-marostegui.json
- 14:18 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2237.codfw.wmnet with reason: Maintenance
- 14:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T415786)', diff saved to https://phabricator.wikimedia.org/P88953 and previous config saved to /var/cache/conftool/dbconfig/20260221-141744-marostegui.json
- 14:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P88952 and previous config saved to /var/cache/conftool/dbconfig/20260221-140236-marostegui.json
- 13:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236', diff saved to https://phabricator.wikimedia.org/P88951 and previous config saved to /var/cache/conftool/dbconfig/20260221-134728-marostegui.json
- 13:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2236 (T415786)', diff saved to https://phabricator.wikimedia.org/P88950 and previous config saved to /var/cache/conftool/dbconfig/20260221-133219-marostegui.json
- 08:48 taavi@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging MMiller out of all services on: 2449 hosts
- 03:45 pt1979@cumin2002: END (ERROR) - Cookbook sre.network.tls (exit_code=97) for network device asw1-22-ulsfo
- 03:44 pt1979@cumin2002: START - Cookbook sre.network.tls for network device asw1-22-ulsfo
- 03:39 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 03:39 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mgmt ip for asw1-22-uslfo - pt1979@cumin2002"
- 03:39 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mgmt ip for asw1-22-uslfo - pt1979@cumin2002"
- 03:21 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1243 (T415786)', diff saved to https://phabricator.wikimedia.org/P88949 and previous config saved to /var/cache/conftool/dbconfig/20260221-032125-marostegui.json
- 03:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
- 03:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T415786)', diff saved to https://phabricator.wikimedia.org/P88948 and previous config saved to /var/cache/conftool/dbconfig/20260221-032101-marostegui.json
- 03:13 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 03:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P88947 and previous config saved to /var/cache/conftool/dbconfig/20260221-030552-marostegui.json
- 02:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P88946 and previous config saved to /var/cache/conftool/dbconfig/20260221-025044-marostegui.json
- 02:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T415786)', diff saved to https://phabricator.wikimedia.org/P88945 and previous config saved to /var/cache/conftool/dbconfig/20260221-023536-marostegui.json
- 02:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 44s)
- 02:02 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 02:02 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mgmt ip for asw1-23-uslfo - pt1979@cumin2002"
- 02:02 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add mgmt ip for asw1-23-uslfo - pt1979@cumin2002"
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:59 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 01:08 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2236 (T415786)', diff saved to https://phabricator.wikimedia.org/P88944 and previous config saved to /var/cache/conftool/dbconfig/20260221-010845-marostegui.json
- 01:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2236.codfw.wmnet with reason: Maintenance
- 01:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T415786)', diff saved to https://phabricator.wikimedia.org/P88943 and previous config saved to /var/cache/conftool/dbconfig/20260221-010820-marostegui.json
- 00:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P88942 and previous config saved to /var/cache/conftool/dbconfig/20260221-005312-marostegui.json
- 00:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P88941 and previous config saved to /var/cache/conftool/dbconfig/20260221-003804-marostegui.json
- 00:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T415786)', diff saved to https://phabricator.wikimedia.org/P88940 and previous config saved to /var/cache/conftool/dbconfig/20260221-002255-marostegui.json
2026-02-20
- 23:03 eevans@cumin1003: END (PASS) - Cookbook sre.network.cf (exit_code=0)
- 23:03 eevans@cumin1003: START - Cookbook sre.network.cf
- 23:02 eevans@cumin1003: END (PASS) - Cookbook sre.network.cf (exit_code=0)
- 23:02 eevans@cumin1003: START - Cookbook sre.network.cf
- 22:35 eevans@cumin1003: END (PASS) - Cookbook sre.network.cf (exit_code=0)
- 22:35 eevans@cumin1003: START - Cookbook sre.network.cf
- 21:10 eevans@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site esams [reason: no reason specified, ]
- 21:10 eevans@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site esams [reason: no reason specified, ]
- 19:16 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site esams [reason: no reason specified, ]
- 19:16 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool site esams [reason: no reason specified, ]
- 19:14 sukhe@cumin1003: END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: depool site esams [reason: no reason specified, ]
- 19:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool site esams [reason: no reason specified, ]
- 19:12 sukhe@cumin1003: END (PASS) - Cookbook sre.network.cf (exit_code=0)
- 19:12 sukhe@cumin1003: START - Cookbook sre.network.cf
- 19:08 sukhe@cumin1003: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
- 19:08 sukhe@cumin1003: START - Cookbook sre.network.cf
- 19:07 sukhe@cumin1003: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
- 19:07 sukhe@cumin1003: START - Cookbook sre.network.cf
- 19:07 sukhe@cumin1003: END (FAIL) - Cookbook sre.network.cf (exit_code=1)
- 19:07 sukhe@cumin1003: START - Cookbook sre.network.cf
- 18:52 dwisehaupt@dns1004: END - running authdns-update
- 18:50 dwisehaupt@dns1004: START - running authdns-update
- 18:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2021.codfw.wmnet with OS bullseye
- 18:42 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2021.codfw.wmnet with OS bullseye
- 17:47 mutante: gerrit added Alex Sanford to wmf-deployment group - already has deployment shell group T418015
- 17:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2022.codfw.wmnet with OS bullseye
- 17:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2022.codfw.wmnet with OS bullseye
- 17:14 sbassett@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 17:14 sbassett@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 17:14 sbassett@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 17:13 sbassett@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 17:13 sbassett@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
- 17:13 sbassett@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
- 17:13 sbassett@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
- 17:13 sbassett@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
- 17:12 sbassett@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
- 17:12 sbassett@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
- 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host moss-fe2022.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host moss-fe2021.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-fe2022.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host moss-fe2021.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:46 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host moss-fe2022
- 16:46 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host moss-fe2021
- 16:46 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host moss-fe2022
- 16:46 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host moss-fe2021
- 16:45 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding moss-fe2021 to codfw - jhancock@cumin2002"
- 16:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding moss-fe2021 to codfw - jhancock@cumin2002"
- 16:37 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 14:36 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.roll-reimage-nodes (exit_code=0) rolling reimage on A:wikikube-staging-worker-eqiad
- 14:36 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1006.eqiad.wmnet with OS trixie
- 14:19 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1006.eqiad.wmnet with reason: host reimage
- 14:12 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1006.eqiad.wmnet with reason: host reimage
- 13:58 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestage1006.eqiad.wmnet with OS trixie
- 13:52 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1005.eqiad.wmnet with OS trixie
- 13:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:34 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1005.eqiad.wmnet with reason: host reimage
- 13:32 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1242 (T415786)', diff saved to https://phabricator.wikimedia.org/P88926 and previous config saved to /var/cache/conftool/dbconfig/20260220-133216-marostegui.json
- 13:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
- 13:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T415786)', diff saved to https://phabricator.wikimedia.org/P88925 and previous config saved to /var/cache/conftool/dbconfig/20260220-133152-marostegui.json
- 13:30 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1005.eqiad.wmnet with reason: host reimage
- 13:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P88924 and previous config saved to /var/cache/conftool/dbconfig/20260220-131644-marostegui.json
- 13:15 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestage1005.eqiad.wmnet with OS trixie
- 13:13 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1004.eqiad.wmnet with OS trixie
- 13:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P88923 and previous config saved to /var/cache/conftool/dbconfig/20260220-130136-marostegui.json
- 13:01 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.roll-reimage-nodes (exit_code=0) rolling reimage on A:wikikube-staging-worker-codfw
- 13:01 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2004.codfw.wmnet with OS trixie
- 12:54 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
- 12:51 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1004.eqiad.wmnet with reason: host reimage
- 12:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T415786)', diff saved to https://phabricator.wikimedia.org/P88922 and previous config saved to /var/cache/conftool/dbconfig/20260220-124627-marostegui.json
- 12:39 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage
- 12:36 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2004.codfw.wmnet with reason: host reimage
- 12:35 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestage1004.eqiad.wmnet with OS trixie
- 12:34 jayme@cumin1003: START - Cookbook sre.k8s.roll-reimage-nodes rolling reimage on A:wikikube-staging-worker-eqiad
- 12:16 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestage2004.codfw.wmnet with OS trixie
- 12:15 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2003.codfw.wmnet with OS trixie
- 11:54 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage
- 11:52 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
- 11:51 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
- 11:51 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 11:50 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 11:50 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
- 11:49 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
- 11:49 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 11:49 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 11:47 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2003.codfw.wmnet with reason: host reimage
- 11:44 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2219 (T415786)', diff saved to https://phabricator.wikimedia.org/P88918 and previous config saved to /var/cache/conftool/dbconfig/20260220-114437-marostegui.json
- 11:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
- 11:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T415786)', diff saved to https://phabricator.wikimedia.org/P88917 and previous config saved to /var/cache/conftool/dbconfig/20260220-114412-marostegui.json
- 11:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P88916 and previous config saved to /var/cache/conftool/dbconfig/20260220-112903-marostegui.json
- 11:28 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
- 11:28 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
- 11:26 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestage2003.codfw.wmnet with OS trixie
- 11:20 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2002.codfw.wmnet with OS trixie
- 11:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P88915 and previous config saved to /var/cache/conftool/dbconfig/20260220-111355-marostegui.json
- 11:03 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
- 10:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T415786)', diff saved to https://phabricator.wikimedia.org/P88914 and previous config saved to /var/cache/conftool/dbconfig/20260220-105847-marostegui.json
- 10:57 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2002.codfw.wmnet with reason: host reimage
- 10:48 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
- 10:48 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
- 10:47 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 10:47 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 10:38 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestage2002.codfw.wmnet with OS trixie
- 10:37 jayme@cumin1003: START - Cookbook sre.k8s.roll-reimage-nodes rolling reimage on A:wikikube-staging-worker-codfw
- 10:34 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T417862' 'Wikimedia Foundation/Advancement/Community Growth/Community Resources and Partnerships' 'Wikimedia Foundation/Advancement/Community Growth/Community Investment and Partnerships' Ammarpad # T417862
- 10:14 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 10:13 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 09:41 hashar: Upgraded CI Jenkins from 2.528.3 to 2.541.2 # T417791
- 08:29 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd1003.eqiad.wmnet
- 08:19 brouberol@cumin1003: START - Cookbook sre.hosts.reboot-single for host cephosd1003.eqiad.wmnet
- 07:18 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.sync-instances (exit_code=0) sync Gerrit data from gerrit2003.wikimedia.org to gerrit1003.wikimedia.org
- 07:13 arnaudb@cumin1003: START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit2003.wikimedia.org to gerrit1003.wikimedia.org
- 01:49 zabe@deploy2002: Finished scap sync-world: Backport for Start reading from new file tables on mediawikiwiki (T416548) (duration: 07m 17s)
- 01:45 zabe@deploy2002: zabe: Continuing with sync
- 01:44 zabe@deploy2002: zabe: Backport for Start reading from new file tables on mediawikiwiki (T416548) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 01:41 zabe@deploy2002: Started scap sync-world: Backport for Start reading from new file tables on mediawikiwiki (T416548)
- 00:33 ryankemper: [WDQS] Restarted blazegraph on `wdqs1014` as well. all 3 hosts were deadlocked
- 00:32 ryankemper: [WDQS] Restarted blazegraph on `wdqs101[1,3]`
2026-02-19
- 23:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1241 (T415786)', diff saved to https://phabricator.wikimedia.org/P88911 and previous config saved to /var/cache/conftool/dbconfig/20260219-234101-marostegui.json
- 23:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
- 23:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T415786)', diff saved to https://phabricator.wikimedia.org/P88910 and previous config saved to /var/cache/conftool/dbconfig/20260219-234036-marostegui.json
- 23:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P88909 and previous config saved to /var/cache/conftool/dbconfig/20260219-232528-marostegui.json
- 23:11 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2210 (T415786)', diff saved to https://phabricator.wikimedia.org/P88908 and previous config saved to /var/cache/conftool/dbconfig/20260219-231101-marostegui.json
- 23:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
- 23:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T415786)', diff saved to https://phabricator.wikimedia.org/P88907 and previous config saved to /var/cache/conftool/dbconfig/20260219-231037-marostegui.json
- 23:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P88906 and previous config saved to /var/cache/conftool/dbconfig/20260219-231020-marostegui.json
- 23:00 ryankemper@cumin2002: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop test cluster
- 22:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P88904 and previous config saved to /var/cache/conftool/dbconfig/20260219-225529-marostegui.json
- 22:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T415786)', diff saved to https://phabricator.wikimedia.org/P88903 and previous config saved to /var/cache/conftool/dbconfig/20260219-225512-marostegui.json
- 22:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P88902 and previous config saved to /var/cache/conftool/dbconfig/20260219-224020-marostegui.json
- 22:32 egardner@deploy2002: Finished scap sync-world: Backport for Minerva TOC: Fix TOC instrumentation selectors (T415611) (duration: 07m 31s)
- 22:27 egardner@deploy2002: egardner: Continuing with sync
- 22:26 egardner@deploy2002: egardner: Backport for Minerva TOC: Fix TOC instrumentation selectors (T415611) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T415786)', diff saved to https://phabricator.wikimedia.org/P88901 and previous config saved to /var/cache/conftool/dbconfig/20260219-222512-marostegui.json
- 22:24 egardner@deploy2002: Started scap sync-world: Backport for Minerva TOC: Fix TOC instrumentation selectors (T415611)
- 22:24 ryankemper: T415696 Will be merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1237142 shortly, which will permanently decom the LDF endpoint for wdqs services
- 22:16 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
- 22:14 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop test cluster
- 22:14 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
- 21:35 cscott@deploy2002: Finished scap sync-world: Backport for Enable parser survey for opted out users on some English-language wikis (T414852), Deploy PRV to 19 wikis (T417349) (duration: 10m 28s)
- 21:33 jhathaway@dns1004: END - running authdns-update
- 21:32 jhathaway@dns1004: START - running authdns-update
- 21:31 cscott@deploy2002: cscott, arlolra: Continuing with sync
- 21:26 cscott@deploy2002: cscott, arlolra: Backport for Enable parser survey for opted out users on some English-language wikis (T414852), Deploy PRV to 19 wikis (T417349) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:24 cscott@deploy2002: Started scap sync-world: Backport for Enable parser survey for opted out users on some English-language wikis (T414852), Deploy PRV to 19 wikis (T417349)
- 21:16 arlolra@deploy2002: Finished scap sync-world: Backport for Update Qids according to communication with communities (v20260219) (T417902), Fix finding joiner in the face of pwrapping (T411935) (duration: 07m 16s)
- 21:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe2005.codfw.wmnet with OS bookworm
- 21:12 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe2004.codfw.wmnet with OS bookworm
- 21:12 arlolra@deploy2002: arlolra, aude: Continuing with sync
- 21:10 arlolra@deploy2002: arlolra, aude: Backport for Update Qids according to communication with communities (v20260219) (T417902), Fix finding joiner in the face of pwrapping (T411935) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:08 arlolra@deploy2002: Started scap sync-world: Backport for Update Qids according to communication with communities (v20260219) (T417902), Fix finding joiner in the face of pwrapping (T411935)
- 20:24 cdanis@dns1004: END - running authdns-update
- 20:22 cdanis@dns1004: START - running authdns-update
- 20:05 cdanis@dns1004: END - running authdns-update
- 20:03 cdanis@dns1004: START - running authdns-update
- 19:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2005.codfw.wmnet with OS bookworm
- 19:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2004.codfw.wmnet with OS bookworm
- 19:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-fe2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:10 dancy@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.16 refs T413807
- 18:03 hashar: Hard restarting Zuul and flushing all changes currently in the queue
- 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2020.codfw.wmnet with OS trixie
- 17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2019.codfw.wmnet with OS trixie
- 17:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:30 inflatador: bking@wmf restart bg on wdqs2022.codfw.wmnet,wdqs2014.codfw.wmnet,wdqs2007.codfw.wmnet to clear ProbeDown alerts
- 17:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2020.codfw.wmnet with reason: host reimage
- 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2019.codfw.wmnet with reason: host reimage
- 17:11 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2020.codfw.wmnet with reason: host reimage
- 17:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2019.codfw.wmnet with reason: host reimage
- 16:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2020.codfw.wmnet with OS trixie
- 16:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2019.codfw.wmnet with OS trixie
- 16:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:42 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 15:42 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 15:35 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 15:34 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 15:30 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 15:28 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 15:26 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart (exit_code=0) rolling restart_daemons on A:dnsbox and (A:dnsbox)
- 15:01 ladsgroup@deploy2002: Finished scap sync-world: Backport for OutputPage: Sort language links before storing them (T253764), OutputPage: Sort language links before storing them (T253764) (duration: 08m 06s)
- 14:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 14:55 ladsgroup@deploy2002: ladsgroup: Backport for OutputPage: Sort language links before storing them (T253764), OutputPage: Sort language links before storing them (T253764) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:53 ladsgroup@deploy2002: Started scap sync-world: Backport for OutputPage: Sort language links before storing them (T253764), OutputPage: Sort language links before storing them (T253764)
- 14:37 jgreen@dns1004: END - running authdns-update
- 14:36 jgreen@dns1004: START - running authdns-update
- 14:36 jgiannelos@deploy2002: Finished scap sync-world: Backport for proofreadpage: Enable parsoid for rendering extension output (T408915) (duration: 07m 41s)
- 14:32 jgiannelos@deploy2002: jgiannelos: Continuing with sync
- 14:30 jgiannelos@deploy2002: jgiannelos: Backport for proofreadpage: Enable parsoid for rendering extension output (T408915) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:28 jgiannelos@deploy2002: Started scap sync-world: Backport for proofreadpage: Enable parsoid for rendering extension output (T408915)
- 14:27 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart rolling restart_daemons on A:dnsbox and (A:dnsbox)
- 14:26 jgiannelos@deploy2002: Finished scap sync-world: Backport for parsoid: Override test config for parsoid testing env on k8s (T386246) (duration: 07m 23s)
- 14:23 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
- 14:22 jgiannelos@deploy2002: jgiannelos: Continuing with sync
- 14:20 jgiannelos@deploy2002: jgiannelos: Backport for parsoid: Override test config for parsoid testing env on k8s (T386246) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:18 jgiannelos@deploy2002: Started scap sync-world: Backport for parsoid: Override test config for parsoid testing env on k8s (T386246)
- 14:17 kart_: Updated Recommendation API to 2026-02-10-184357-production (T409482)
- 14:10 mlitn@deploy2002: Finished scap sync-world: Backport for Squashed diff to master (duration: 06m 43s)
- 14:10 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
- 14:07 mlitn@deploy2002: mlitn: Continuing with sync
- 14:06 mlitn@deploy2002: mlitn: Backport for Squashed diff to master synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:04 mlitn@deploy2002: Started scap sync-world: Backport for Squashed diff to master
- 14:02 kartik@deploy2002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:54 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 13:54 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 13:53 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 13:53 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.sync-instances (exit_code=0) sync Gerrit data from gerrit2003.wikimedia.org to gerrit1003.wikimedia.org
- 13:49 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 13:48 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 13:48 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 13:47 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 13:46 cgoubert@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
- 13:46 cgoubert@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
- 13:45 cgoubert@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
- 13:45 cgoubert@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
- 13:45 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 13:44 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
- 13:44 kartik@deploy2002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:44 cgoubert@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
- 13:43 cgoubert@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
- 13:43 cgoubert@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 13:42 cgoubert@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
- 13:29 kartik@deploy2002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
- 13:27 jiji@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM wikikube-worker-exp2001.codfw.wmnet
- 13:25 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
- 13:24 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
- 13:20 jiji@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM wikikube-worker-exp2001.codfw.wmnet
- 13:20 jiji@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM wikikube-worker-exp1001.eqiad.wmnet
- 13:14 jiji@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM wikikube-worker-exp1001.eqiad.wmnet
- 13:13 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1003.eqiad.wmnet
- 13:12 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1003.eqiad.wmnet
- 13:12 arnaudb@cumin1003: START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit2003.wikimedia.org to gerrit1003.wikimedia.org
- 13:12 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1003.eqiad.wmnet
- 13:05 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host kubestage1003.eqiad.wmnet
- 12:59 jayme@cumin1003: END (FAIL) - Cookbook sre.k8s.pool-depool-node (exit_code=99) depool for host kubestage1003.eqiad.wmnet
- 12:56 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1003.eqiad.wmnet
- 12:56 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
- 12:56 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
- 12:55 jayme: imported linux-base 4.12.1+wmf1 to trixie-wikimedia - T417632
- 12:50 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 12:47 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 12:46 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 12:46 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 12:44 mlitn@deploy2002: Finished scap sync-world: Backport for Shared stream for reader experiments (T415611) (duration: 07m 42s)
- 12:43 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 12:40 mlitn@deploy2002: mfossati, mlitn: Continuing with sync
- 12:38 mlitn@deploy2002: mfossati, mlitn: Backport for Shared stream for reader experiments (T415611) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:36 mlitn@deploy2002: Started scap sync-world: Backport for Shared stream for reader experiments (T415611)
- 12:33 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 11:48 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 11:48 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 11:44 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet
- 11:38 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet
- 11:34 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
- 11:34 jiji@deploy2002: Finished scap sync-world: switching mw-parsoid to pinkllama releases (T386246) (duration: 06m 12s)
- 11:34 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
- 11:32 jiji@deploy2002: jiji: Continuing with sync
- 11:30 jiji@deploy2002: jiji: switching mw-parsoid to pinkllama releases (T386246) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:29 jiji@deploy2002: Started scap sync-world: switching mw-parsoid to pinkllama releases (T386246)
- 10:54 moritzm: installing gnutls28 security updates
- 10:19 marostegui@dns1006: END - running authdns-update
- 10:17 marostegui@dns1006: START - running authdns-update
- 10:07 marostegui: Switchover m2 master proxy host from dbproxy1023 to dbproxy1025 T414656
- 10:06 marostegui@dns1006: START - running authdns-update
- 09:46 hashar: upgraded Jenkins 2.541.2 to 2.528.3 on contint2002 (Jenkins does not run there) Upgrade + T417791
- 09:43 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1238 (T415786)', diff saved to https://phabricator.wikimedia.org/P88896 and previous config saved to /var/cache/conftool/dbconfig/20260219-094258-marostegui.json
- 09:42 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
- 09:42 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T415786)', diff saved to https://phabricator.wikimedia.org/P88895 and previous config saved to /var/cache/conftool/dbconfig/20260219-094234-marostegui.json
- 09:37 XioNoX: lsw1-d7-eqiad# tools network-instance default protocols bgp neighbor 10.64.128.17 reset-peer - T411054
- 09:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P88893 and previous config saved to /var/cache/conftool/dbconfig/20260219-092726-marostegui.json
- 09:23 hashar@deploy2002: Finished scap sync-world: Backport for Do not pass null to AccessTokenEntity::setUserIdentifier() (T417820), Fix "iss" field missing in OAuth 2 access token JWT (T417839) (duration: 08m 37s)
- 09:19 hashar@deploy2002: reedy, jforrester, hashar: Continuing with sync
- 09:17 hashar@deploy2002: reedy, jforrester, hashar: Backport for Do not pass null to AccessTokenEntity::setUserIdentifier() (T417820), Fix "iss" field missing in OAuth 2 access token JWT (T417839) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:14 hashar@deploy2002: Started scap sync-world: Backport for Do not pass null to AccessTokenEntity::setUserIdentifier() (T417820), Fix "iss" field missing in OAuth 2 access token JWT (T417839)
- 09:14 hashar: Deploying https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1240406 config change for beta which was left unfetched/undeployed :)
- 09:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P88891 and previous config saved to /var/cache/conftool/dbconfig/20260219-091217-marostegui.json
- 09:11 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 09:10 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 09:10 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
- 09:09 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
- 08:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2206 (T415786)', diff saved to https://phabricator.wikimedia.org/P88890 and previous config saved to /var/cache/conftool/dbconfig/20260219-085723-marostegui.json
- 08:57 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
- 08:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T415786)', diff saved to https://phabricator.wikimedia.org/P88889 and previous config saved to /var/cache/conftool/dbconfig/20260219-085709-marostegui.json
- 07:47 brouberol@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 07:46 brouberol@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 57s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 00:51 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host apus-fe2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
2026-02-18
- 23:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-fe2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host apus-fe2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-fe2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 23:54 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe2004
- 23:54 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe2004
- 23:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding apus-fe2004 to codfw - jhancock@cumin2002"
- 23:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding apus-fe2004 to codfw - jhancock@cumin2002"
- 23:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:46 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding apus-fe2004 to codfw - jhancock@cumin2002"
- 23:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding apus-fe2004 to codfw - jhancock@cumin2002"
- 23:42 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 23:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd2008-dev.codfw.wmnet with OS bookworm
- 23:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw2004-dev.codfw.wmnet with OS trixie
- 23:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 23:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw2004-dev.codfw.wmnet with reason: host reimage
- 22:51 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw2004-dev.codfw.wmnet with reason: host reimage
- 22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw2004-dev.codfw.wmnet with OS trixie
- 22:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cloudcephosd2008-dev.codfw.wmnet with OS bookworm
- 22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudgw2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd2008-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudgw2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcephosd2008-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudgw2004-dev
- 22:15 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudgw2004-dev
- 22:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd2008-dev
- 22:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2008-dev
- 22:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2019.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:56 ladsgroup@deploy2002: Finished scap sync-world: Backport for Revert "Support CSS/JS thumbnail sizing in Parsoid" (T417828) (duration: 07m 14s)
- 21:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:52 ladsgroup@deploy2002: ladsgroup, somerandomdeveloper: Continuing with sync
- 21:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2019.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 21:51 ladsgroup@deploy2002: ladsgroup, somerandomdeveloper: Backport for Revert "Support CSS/JS thumbnail sizing in Parsoid" (T417828) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:49 ladsgroup@deploy2002: Started scap sync-world: Backport for Revert "Support CSS/JS thumbnail sizing in Parsoid" (T417828)
- 21:35 kemayo@deploy2002: Finished scap sync-world: Backport for BaseEditCheck: fix check for blockquote (T417801), Add instrument for clicks in TOC references link (T415910), Add instrument for clicks in footnotes in the article (T415909) (duration: 07m 56s)
- 21:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
- 21:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T415786)', diff saved to https://phabricator.wikimedia.org/P88887 and previous config saved to /var/cache/conftool/dbconfig/20260218-213149-marostegui.json
- 21:31 kemayo@deploy2002: kemayo, thiemowmde: Continuing with sync
- 21:29 kemayo@deploy2002: kemayo, thiemowmde: Backport for BaseEditCheck: fix check for blockquote (T417801), Add instrument for clicks in TOC references link (T415910), Add instrument for clicks in footnotes in the article (T415909) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:27 kemayo@deploy2002: Started scap sync-world: Backport for BaseEditCheck: fix check for blockquote (T417801), Add instrument for clicks in TOC references link (T415910), Add instrument for clicks in footnotes in the article (T415909)
- 21:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2018.codfw.wmnet with OS trixie
- 21:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P88885 and previous config saved to /var/cache/conftool/dbconfig/20260218-211640-marostegui.json
- 21:10 sgimeno@deploy2002: Finished scap sync-world: Backport for [Growth] Specify notification delay as int instead of array (T375198 T415536) (duration: 07m 48s)
- 21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2017.codfw.wmnet with OS trixie
- 21:10 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 21:06 sgimeno@deploy2002: sgimeno: Continuing with sync
- 21:05 sgimeno@deploy2002: sgimeno: Backport for [Growth] Specify notification delay as int instead of array (T375198 T415536) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:03 sgimeno@deploy2002: Started scap sync-world: Backport for [Growth] Specify notification delay as int instead of array (T375198 T415536)
- 21:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P88884 and previous config saved to /var/cache/conftool/dbconfig/20260218-210132-marostegui.json
- 20:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2018.codfw.wmnet with reason: host reimage
- 20:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2017.codfw.wmnet with reason: host reimage
- 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2018.codfw.wmnet with reason: host reimage
- 20:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2017.codfw.wmnet with reason: host reimage
- 20:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T415786)', diff saved to https://phabricator.wikimedia.org/P88883 and previous config saved to /var/cache/conftool/dbconfig/20260218-204624-marostegui.json
- 20:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2018.codfw.wmnet with OS trixie
- 20:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2018.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2017.codfw.wmnet with OS trixie
- 20:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2018.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2015.codfw.wmnet with OS trixie
- 20:20 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2016.codfw.wmnet with OS trixie
- 20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 20:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2015.codfw.wmnet with reason: host reimage
- 20:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2015.codfw.wmnet with reason: host reimage
- 19:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2016.codfw.wmnet with reason: host reimage
- 19:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2016.codfw.wmnet with reason: host reimage
- 19:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2015.codfw.wmnet with OS trixie
- 19:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:32 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2016.codfw.wmnet with OS trixie
- 19:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:30 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1221 (T415786)', diff saved to https://phabricator.wikimedia.org/P88882 and previous config saved to /var/cache/conftool/dbconfig/20260218-193017-marostegui.json
- 19:30 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: Maintenance
- 19:29 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
- 19:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T415786)', diff saved to https://phabricator.wikimedia.org/P88881 and previous config saved to /var/cache/conftool/dbconfig/20260218-192929-marostegui.json
- 19:28 dancy@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.16 refs T413807
- 19:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P88880 and previous config saved to /var/cache/conftool/dbconfig/20260218-191420-marostegui.json
- 19:05 brett: import haproxykafka 0.3.16+deb13u1 into trixie-wikimedia (T401832)
- 18:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P88879 and previous config saved to /var/cache/conftool/dbconfig/20260218-185912-marostegui.json
- 18:50 ladsgroup@deploy2002: Finished scap sync-world: Backport for Make Pdf thumbs follow the thumb steps (T402792 T414805) (duration: 07m 43s)
- 18:46 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 18:44 ladsgroup@deploy2002: ladsgroup: Backport for Make Pdf thumbs follow the thumb steps (T402792 T414805) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T415786)', diff saved to https://phabricator.wikimedia.org/P88878 and previous config saved to /var/cache/conftool/dbconfig/20260218-184405-marostegui.json
- 18:42 ladsgroup@deploy2002: Started scap sync-world: Backport for Make Pdf thumbs follow the thumb steps (T402792 T414805)
- {{safesubst:SAL entry|1=18:05 reedy@deploy2002: Finished scap sync-world: Backport for Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722), Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722), Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722), Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722), [[gerrit:1240347|Upgrading firebase/php-jwt (v6.11.1 =>}}
- 18:00 reedy@deploy2002: reedy: Continuing with sync
- 18:00 reedy@deploy2002: reedy: Backport for Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722), Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722), Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722), Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722), Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722)
- {{safesubst:SAL entry|1=17:58 reedy@deploy2002: Started scap sync-world: Backport for Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722), Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722), Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722), Upgrading firebase/php-jwt (v6.11.1 => v7.0.2) (T417722), [[gerrit:1240347|Upgrading firebase/php-jwt (v6.11.1 => v}}
- 17:12 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@863e5c2] (releasing): Jenkins security updates (duration: 01m 32s)
- 17:10 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@863e5c2] (releasing): Jenkins security updates
- 16:53 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit1003.wikimedia.org with OS bookworm
- 16:48 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica AND NOT P{lvs7003.magru.wmnet} and A:liberica (T417306)
- 16:34 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage
- 16:30 arnaudb@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage
- 16:30 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@863e5c2] (releasing): Jenkins update test on backup host (duration: 01m 48s)
- 16:29 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@863e5c2] (releasing): Jenkins update test on backup host
- 16:22 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica AND NOT P{lvs7003.magru.wmnet} and A:liberica (T417306)
- 16:19 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs7003.magru.wmnet} and A:liberica (T417306)
- 16:18 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs7003.magru.wmnet} and A:liberica (T417306)
- 16:18 vgutierrez: upload liberica 0.24 to bookworm-wikimedia (apt.wm.o) - T417306
- 16:14 moritzm: upgrade codfw1dev instances of cloudlb and cloudservices* to Bird 2.18 T413740
- 16:12 arnaudb@cumin1003: START - Cookbook sre.hosts.reimage for host gerrit1003.wikimedia.org with OS bookworm
- 16:12 arnaudb@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gerrit1003.wikimedia.org with OS bookworm
- 16:10 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on wdqs1028.eqiad.wmnet with reason: broken puppet
- 16:03 moritzm: imported jenkins 2.541.2 for bullseye-wikimedia/bookworm-wikimedia
- 15:44 claime: homer 'lsw*codfw*' commit 'T417772'
- 15:42 vgutierrez@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs7003.magru.wmnet} and A:liberica (T417306)
- 15:41 vgutierrez@cumin1003: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs7003.magru.wmnet} and A:liberica (T417306)
- 15:40 bjensen: homer 'cr*codfw*' commit 'T417772'
- 15:37 zabe: zabe@deploy2002:~$ mwscript extensions/TimedMediaHandler/maintenance/migrateTranscodeStates.php mediawikiwiki # T415064
- 15:35 vgutierrez: upload liberica 0.23 to bookworm-wikimedia (apt.wm.o) - T417306
- 15:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for cloudgw2004 and cloudcephosd2008-dev - pt1979@cumin2002"
- 15:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns for cloudgw2004 and cloudcephosd2008-dev - pt1979@cumin2002"
- 15:24 pt1979@cumin2002: START - Cookbook sre.dns.netbox
- 15:24 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
- 15:22 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
- 15:18 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
- 15:17 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:16 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
- 15:16 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:16 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:15 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:15 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:14 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:10 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:10 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:09 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:09 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:08 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:08 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:02 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
- 14:59 jmm@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
- 14:54 vgutierrez: uplodaded tcp-mss-clamper 0.6+deb13u1 to trixie-wikimedia (apt-wm.o) - T401832
- 14:48 vgutierrez: upload golang-gitlab-wikimedia-sre-qemutest-dev 0.1.0+deb13u1 to trixie-wikimedia (apt.wm.o) - T401832
- 14:39 vgutierrez: upload golang-github-u-root-u-root 0.12.0-1 to trixie-wikimedia (apt.wm.o) - T401832
- 14:15 mszwarc@deploy2002: Finished scap sync-world: Backport for ruwikisource: EnableProtectionIndicators (T417590), Add '(oathauth-recover-for-user)' to 'wmf-supportsafety' (T415883) (duration: 08m 10s)
- 14:11 mszwarc@deploy2002: anzx, mszwarc: Continuing with sync
- 14:09 mszwarc@deploy2002: anzx, mszwarc: Backport for ruwikisource: EnableProtectionIndicators (T417590), Add '(oathauth-recover-for-user)' to 'wmf-supportsafety' (T415883) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:07 mszwarc@deploy2002: Started scap sync-world: Backport for ruwikisource: EnableProtectionIndicators (T417590), Add '(oathauth-recover-for-user)' to 'wmf-supportsafety' (T415883)
- 13:24 arnaudb@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on gerrit1003.wikimedia.org with reason: T417246
- 12:52 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wdqs1028.eqiad.wmnet with OS bookworm
- 12:45 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage1003.eqiad.wmnet
- 12:45 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage1003.eqiad.wmnet
- 12:43 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage1003.eqiad.wmnet with OS trixie
- 12:24 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
- 12:16 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage1003.eqiad.wmnet with reason: host reimage
- 12:12 logmsgbot: dreamyjazz Deployed security patch for T411366
- 12:06 logmsgbot: dreamyjazz Deployed security patch for T411366
- 12:01 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestage1003.eqiad.wmnet with OS trixie
- 11:59 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage1003.eqiad.wmnet
- 11:57 vgutierrez: upload golang-github-intel-go-cpuid 0.0~git20210602.5747e5c-2+deb13u1 to trixie-wikimedia (apt.wm.o) - T401832
- 11:54 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage1003.eqiad.wmnet
- 11:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 11:48 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 11:19 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage
- 11:14 arnaudb@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage
- 11:12 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 11:09 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 11:07 fabfur@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "New scope bots - fabfur@cumin1003"
- 11:07 fabfur@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: New scope bots - fabfur@cumin1003
- 11:06 fabfur@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: New scope bots - fabfur@cumin1003
- 11:06 fabfur@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "New scope bots - fabfur@cumin1003"
- 10:56 arnaudb@cumin1003: START - Cookbook sre.hosts.reimage for host gerrit1003.wikimedia.org with OS bookworm
- 10:51 joal@deploy2002: Finished deploy [analytics/refinery@28fa1ea] (thin): Regular analytics weekly train THIN [analytics/refinery@28fa1eac] (duration: 01m 56s)
- 10:49 joal@deploy2002: Started deploy [analytics/refinery@28fa1ea] (thin): Regular analytics weekly train THIN [analytics/refinery@28fa1eac]
- 10:49 joal@deploy2002: Finished deploy [analytics/refinery@28fa1ea]: Regular analytics weekly train [analytics/refinery@28fa1eac] (duration: 04m 06s)
- 10:44 joal@deploy2002: Started deploy [analytics/refinery@28fa1ea]: Regular analytics weekly train [analytics/refinery@28fa1eac]
- 10:44 joal@deploy2002: Finished deploy [analytics/refinery@28fa1ea] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28fa1eac] (duration: 01m 57s)
- 10:42 joal@deploy2002: Started deploy [analytics/refinery@28fa1ea] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@28fa1eac]
- 10:41 arnaudb@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gerrit1003.wikimedia.org with OS bookworm
- 10:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin2003.codfw.wmnet
- 10:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin2003.codfw.wmnet
- 09:53 arnaudb@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage
- 09:46 arnaudb@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit1003.wikimedia.org with reason: host reimage
- 09:28 arnaudb@cumin1003: START - Cookbook sre.hosts.reimage for host gerrit1003.wikimedia.org with OS bookworm
- 08:58 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1029.eqiad.wmnet with OS trixie
- 08:38 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1029.eqiad.wmnet with reason: host reimage
- 08:32 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1029.eqiad.wmnet with reason: host reimage
- 08:19 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host dbproxy1029.eqiad.wmnet with OS trixie
- 05:32 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2179 (T415786)', diff saved to https://phabricator.wikimedia.org/P88861 and previous config saved to /var/cache/conftool/dbconfig/20260218-053229-marostegui.json
- 05:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Maintenance
- 05:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T415786)', diff saved to https://phabricator.wikimedia.org/P88860 and previous config saved to /var/cache/conftool/dbconfig/20260218-053204-marostegui.json
- 05:28 kart_: Updated cxserver to 2026-01-20-115813-production (T415038, T415046, T414558)
- 05:25 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
- 05:25 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
- 05:24 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
- 05:24 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
- 05:18 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
- 05:17 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
- 05:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P88859 and previous config saved to /var/cache/conftool/dbconfig/20260218-051656-marostegui.json
- 05:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P88858 and previous config saved to /var/cache/conftool/dbconfig/20260218-050148-marostegui.json
- 04:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T415786)', diff saved to https://phabricator.wikimedia.org/P88857 and previous config saved to /var/cache/conftool/dbconfig/20260218-044639-marostegui.json
- 03:23 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1199 (T415786)', diff saved to https://phabricator.wikimedia.org/P88856 and previous config saved to /var/cache/conftool/dbconfig/20260218-032324-marostegui.json
- 03:23 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
- 03:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T415786)', diff saved to https://phabricator.wikimedia.org/P88855 and previous config saved to /var/cache/conftool/dbconfig/20260218-032258-marostegui.json
- 03:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P88854 and previous config saved to /var/cache/conftool/dbconfig/20260218-030750-marostegui.json
- 02:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P88853 and previous config saved to /var/cache/conftool/dbconfig/20260218-025242-marostegui.json
- 02:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T415786)', diff saved to https://phabricator.wikimedia.org/P88852 and previous config saved to /var/cache/conftool/dbconfig/20260218-023733-marostegui.json
- 01:02 zabe@deploy2002: Finished scap sync-world: Backport for Add small comment pointing to ForeignDBViaLBRepo above file migration (T416548) (duration: 11m 16s)
- 00:58 Krinkle: Edit Module:Date on various wikis in attempt to mitigate T416616, T416540. Details at https://phabricator.wikimedia.org/T416616#11625838.
- 00:55 zabe@deploy2002: zabe: Continuing with sync
- 00:55 zabe@deploy2002: zabe: Backport for Add small comment pointing to ForeignDBViaLBRepo above file migration (T416548) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:50 zabe@deploy2002: Started scap sync-world: Backport for Add small comment pointing to ForeignDBViaLBRepo above file migration (T416548)
2026-02-17
- 22:03 kemayo@deploy2002: Finished scap sync-world: Backport for EditCheck: update shown stats on initial page load (T417452), EditCheck: adjust editsuggestion-seen tag (T413419) (duration: 40m 26s)
- 21:50 kemayo@deploy2002: caro, kemayo: Continuing with sync
- 21:50 robh@dns1004: END - running authdns-update
- 21:49 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wdqs1028.eqiad.wmnet with reason: host reimage
- 21:48 robh@dns1004: START - running authdns-update
- 21:47 kemayo@deploy2002: caro, kemayo: Backport for EditCheck: update shown stats on initial page load (T417452), EditCheck: adjust editsuggestion-seen tag (T413419) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:45 jhathaway@dns1004: END - running authdns-update
- 21:44 jhathaway@dns1004: START - running authdns-update
- 21:43 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wdqs1028.eqiad.wmnet with reason: host reimage
- 21:23 bking@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host wdqs1028
- 21:23 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wdqs1028
- 21:22 kemayo@deploy2002: Started scap sync-world: Backport for EditCheck: update shown stats on initial page load (T417452), EditCheck: adjust editsuggestion-seen tag (T413419)
- 21:21 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wdqs1028
- 21:21 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wdqs1028.eqiad.wmnet 6.48.64.10.in-addr.arpa 6.0.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 21:21 bking@cumin2002: START - Cookbook sre.dns.wipe-cache wdqs1028.eqiad.wmnet 6.48.64.10.in-addr.arpa 6.0.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 21:21 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:21 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1028 - bking@cumin2002"
- 21:21 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host wdqs1028 - bking@cumin2002"
- 21:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host apus-fe2005.codfw.wmnet with OS bookworm
- 21:16 bking@cumin2002: START - Cookbook sre.dns.netbox
- 21:16 bking@cumin2002: START - Cookbook sre.hosts.move-vlan for host wdqs1028
- 21:15 bking@cumin2002: START - Cookbook sre.hosts.reimage for host wdqs1028.eqiad.wmnet with OS bookworm
- 20:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:40 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host frdb1008
- 20:40 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host frdb1008
- 20:39 vriley@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host frdb1008
- 20:39 vriley@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host frdb1008
- 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:38 vriley@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [frdb1008] - vriley@cumin1003"
- 20:38 vriley@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [frdb1008] - vriley@cumin1003"
- 20:35 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
- 20:35 vriley@cumin1003: START - Cookbook sre.dns.netbox
- 20:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:33 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
- 20:33 urbanecm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
- 20:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe2004
- 20:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe2004
- 20:31 urbanecm@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
- 20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:30 urbanecm@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
- 20:29 urbanecm@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
- 20:29 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 20:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host apus-fe2005.codfw.wmnet with OS bookworm
- 20:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host apus-2004.codfw.wmnet with OS bookworm
- 20:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['apus-fe2004']
- 20:25 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['apus-fe2004']
- 20:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host apus-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-fe2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host apus-2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:09 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe2005
- 20:09 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe2005
- 20:09 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host apus-fe2004
- 20:09 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host apus-fe2004
- 20:09 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding apus-fe2004 to codfw - jhancock@cumin2002"
- 20:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding apus-fe2004 to codfw - jhancock@cumin2002"
- 20:05 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cumin2003.codfw.wmnet with OS trixie
- 19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin2003.codfw.wmnet with reason: host reimage
- 19:36 cjd91: cdobbins@apt1002 import fifo-log-demux 0.7.5+deb13u1 into trixie-wikimedia
- 19:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin2003.codfw.wmnet with reason: host reimage
- 19:31 dancy@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.16 refs T413807
- 19:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host cumin2003.codfw.wmnet with OS trixie
- 19:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cumin2003']
- 19:16 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cumin2003']
- 19:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cumin2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:12 ladsgroup@deploy2002: Finished scap sync-world: Backport for Undeploy InterwikiSorting - V: Stop loading i18n (T253764) (duration: 41m 15s)
- 19:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cumin2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:00 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cumin2003
- 19:00 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cumin2003
- 18:59 ladsgroup@deploy2002: jforrester, ladsgroup: Continuing with sync
- 18:57 ladsgroup@deploy2002: jforrester, ladsgroup: Backport for Undeploy InterwikiSorting - V: Stop loading i18n (T253764) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2017.codfw.wmnet with OS trixie
- 18:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2016.codfw.wmnet with OS trixie
- 18:31 ladsgroup@deploy2002: Started scap sync-world: Backport for Undeploy InterwikiSorting - V: Stop loading i18n (T253764)
- 18:21 ladsgroup@deploy2002: Finished scap sync-world: Backport for Undeploy InterwikiSorting - IV: Drop all config (T253764) (duration: 07m 57s)
- 18:17 ladsgroup@deploy2002: ladsgroup, jforrester: Continuing with sync
- 18:15 ladsgroup@deploy2002: ladsgroup, jforrester: Backport for Undeploy InterwikiSorting - IV: Drop all config (T253764) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:13 ladsgroup@deploy2002: Started scap sync-world: Backport for Undeploy InterwikiSorting - IV: Drop all config (T253764)
- 18:04 ladsgroup@deploy2002: Finished scap sync-world: Backport for Undeploy InterwikiSorting - III: Drop InterwikiSortOrders.php (T253764) (duration: 07m 10s)
- 18:00 ladsgroup@deploy2002: ladsgroup, jforrester: Continuing with sync
- 17:59 ladsgroup@deploy2002: ladsgroup, jforrester: Backport for Undeploy InterwikiSorting - III: Drop InterwikiSortOrders.php (T253764) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:57 ladsgroup@deploy2002: Started scap sync-world: Backport for Undeploy InterwikiSorting - III: Drop InterwikiSortOrders.php (T253764)
- 17:57 brett: Import python-logstash (python3-logstash) 0.4.6~deb13u1 to trixie-wikimedia (T401832)
- 17:54 ladsgroup@deploy2002: Finished scap sync-world: Backport for Undeploy InterwikiSorting - II: Drop loading ability (T253764) (duration: 07m 52s)
- 17:50 ladsgroup@deploy2002: jforrester, ladsgroup: Continuing with sync
- 17:48 ladsgroup@deploy2002: jforrester, ladsgroup: Backport for Undeploy InterwikiSorting - II: Drop loading ability (T253764) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:46 ladsgroup@deploy2002: Started scap sync-world: Backport for Undeploy InterwikiSorting - II: Drop loading ability (T253764)
- 17:41 Amir1: ladsgroup@deploy2002:~$ mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 compact-language-links
- 17:41 ladsgroup@deploy2002: Finished scap sync-world: Backport for Undeploy InterwikiSorting - I: Disable everywhere (T253764) (duration: 06m 58s)
- 17:37 ladsgroup@deploy2002: ladsgroup, jforrester: Continuing with sync
- 17:36 ladsgroup@deploy2002: ladsgroup, jforrester: Backport for Undeploy InterwikiSorting - I: Disable everywhere (T253764) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2017.codfw.wmnet with OS trixie
- 17:34 ladsgroup@deploy2002: Started scap sync-world: Backport for Undeploy InterwikiSorting - I: Disable everywhere (T253764)
- 17:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2017.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2017.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2016.codfw.wmnet with OS trixie
- 17:17 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup2016']
- 17:17 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2016']
- 17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:06 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ms-be[2057-2061].codfw.wmnet
- 17:06 elukey@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:06 elukey@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2057-2061].codfw.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin2002"
- 17:06 elukey@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ms-be[2057-2061].codfw.wmnet decommissioned, removing all IPs except the asset tag one - elukey@cumin2002"
- 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2346.codfw.wmnet with OS bookworm
- 17:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
- 17:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:02 elukey@cumin2002: START - Cookbook sre.dns.netbox
- 16:48 elukey@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2057-2061].codfw.wmnet
- 16:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2346.codfw.wmnet with reason: host reimage
- 16:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2015.codfw.wmnet with OS trixie
- 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2346.codfw.wmnet with reason: host reimage
- 16:36 brennen@deploy2002: Finished deploy [phabricator/deployment@aad109e]: deploy phab1004 for T417657 (duration: 01m 08s)
- 16:35 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ms-be2057.codfw.wmnet
- 16:35 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:34 brennen@deploy2002: Started deploy [phabricator/deployment@aad109e]: deploy phab1004 for T417657
- 16:34 brennen@deploy2002: Finished deploy [phabricator/deployment@aad109e]: deploy phab2002 for T417657 (duration: 00m 31s)
- 16:33 brennen@deploy2002: Started deploy [phabricator/deployment@aad109e]: deploy phab2002 for T417657
- 16:33 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab2002.codfw.wmnet with reason: deployment
- 16:33 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on phab1004.eqiad.wmnet with reason: deployment
- 16:32 mvernon@cumin2002: START - Cookbook sre.dns.netbox
- 16:32 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cumin2003 to codfw - jhancock@cumin2002"
- 16:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cumin2003 to codfw - jhancock@cumin2002"
- 16:30 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2346.codfw.wmnet with OS bookworm
- 16:28 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 16:28 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be2057.codfw.wmnet
- 16:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 16:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 16:25 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 16:25 brouberol@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 16:21 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ms-be[2057-2061].codfw.wmnet
- 16:21 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:21 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudgw2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudgw2004-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudgw2004-dev
- 16:20 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudgw2004-dev
- 16:19 mvernon@cumin2002: START - Cookbook sre.dns.netbox
- 16:15 mvernon@cumin2002: START - Cookbook sre.hosts.decommission for hosts ms-be[2057-2061].codfw.wmnet
- 16:14 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 16:13 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 16:07 brouberol@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 16:06 brouberol@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating cloudceph2008-dev in codfw - jhancock@cumin2002"
- 16:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updating cloudceph2008-dev in codfw - jhancock@cumin2002"
- 15:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudgw2004-dev
- 15:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudgw2004-dev
- 15:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudgw2004-dev
- 15:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudgw2004-dev
- 15:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudgw2004-dev to codfw - jhancock@cumin2002"
- 15:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudgw2004-dev to codfw - jhancock@cumin2002"
- 15:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:55 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 15:54 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 15:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 15:49 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 15:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:48 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 15:48 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:48 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:46 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 15:46 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:45 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:43 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:43 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 15:43 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:43 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:43 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:35 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
- 15:35 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
- 15:31 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 15:31 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 15:31 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 15:31 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 15:31 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
- 15:31 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
- 15:31 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
- 15:31 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
- 15:31 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 15:30 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 15:30 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: wikikube-staging-worker-codfw@codfw
- 15:30 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 15:30 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 15:30 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 15:29 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 15:29 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 15:29 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 15:29 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 15:27 brouberol@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 15:26 brouberol@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 15:16 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply
- 15:16 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply
- 15:14 jayme@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
- 15:13 ladsgroup@deploy2002: Finished scap sync-world: Backport for sqwiki: remove editor usergroup (T415196) (duration: 07m 45s)
- 15:11 jayme@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: wikikube-staging-worker-codfw@codfw
- 15:11 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for alias: wikikube-staging-worker-eqiad@eqiad
- 15:11 jayme@cumin1003: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 15:09 jayme@cumin1003: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
- 15:09 ladsgroup@deploy2002: ladsgroup, anzx: Continuing with sync
- 15:08 ladsgroup@deploy2002: ladsgroup, anzx: Backport for sqwiki: remove editor usergroup (T415196) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:07 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site magru [reason: XioNoX: maint work done, T416442]
- 15:07 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site magru [reason: XioNoX: maint work done, T416442]
- 15:06 sukhe@cumin1003: END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: pool site magru [reason: Xionix maint work done, T416442]
- 15:06 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site magru [reason: Xionix maint work done, T416442]
- 15:06 jayme@cumin1003: START - Cookbook sre.loadbalancer.migrate-service-ipip for alias: wikikube-staging-worker-eqiad@eqiad
- 15:05 ladsgroup@deploy2002: Started scap sync-world: Backport for sqwiki: remove editor usergroup (T415196)
- 15:02 phuedx@deploy2002: Finished scap sync-world: Backport for Test Kitchen: Set event intake service name (duration: 11m 56s)
- 15:01 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 40 hosts
- 15:01 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 40 hosts
- 14:59 sukhe@dns1004: END - running authdns-update
- 14:58 sukhe: running authdns-update after magru depool
- 14:58 sukhe@dns1004: START - running authdns-update
- 14:58 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=magru [reason: magru maintenance done]
- 14:58 vgutierrez: upload golang-github-mmatczuk-anyflag-dev 0.0~git20240709.eb9e24c-1 to trixie-wikimedia (apt.wm.o) - T401832
- 14:57 phuedx@deploy2002: phuedx: Continuing with sync
- 14:55 brouberol@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 14:54 brouberol@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 14:52 phuedx@deploy2002: phuedx: Backport for Test Kitchen: Set event intake service name synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:50 phuedx@deploy2002: Started scap sync-world: Backport for Test Kitchen: Set event intake service name
- 14:47 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:44 XioNoX: mr1-magru> request system reboot - T416442
- 14:36 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:35 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:34 ladsgroup@deploy2002: Finished scap sync-world: Backport for lift IP cap for event at Tshwane University of Technology (T417578) (duration: 06m 45s)
- 14:30 ladsgroup@deploy2002: anzx, ladsgroup: Continuing with sync
- 14:29 ladsgroup@deploy2002: anzx, ladsgroup: Backport for lift IP cap for event at Tshwane University of Technology (T417578) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:27 ladsgroup@deploy2002: Started scap sync-world: Backport for lift IP cap for event at Tshwane University of Technology (T417578)
- 14:26 brouberol@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 14:26 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ml-serve1014.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 14:23 XioNoX: asw1-b4-magru> request system reboot - T416442
- 14:23 brouberol@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 14:16 ladsgroup@deploy2002: Finished scap sync-world: Backport for Add ParserOutputFlags::PREVENT_SELECTIVE_UPDATE (T348236) (duration: 08m 13s)
- 14:15 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2172 (T415786)', diff saved to https://phabricator.wikimedia.org/P88846 and previous config saved to /var/cache/conftool/dbconfig/20260217-141510-marostegui.json
- 14:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
- 14:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T415786)', diff saved to https://phabricator.wikimedia.org/P88845 and previous config saved to /var/cache/conftool/dbconfig/20260217-141457-marostegui.json
- 14:12 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on asw1-b4-magru,asw1-b4-magru IPv6,asw1-b4-magru.mgmt with reason: router upgrade
- 14:12 ladsgroup@deploy2002: ladsgroup, cscott: Continuing with sync
- 14:10 ladsgroup@deploy2002: ladsgroup, cscott: Backport for Add ParserOutputFlags::PREVENT_SELECTIVE_UPDATE (T348236) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:08 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on asw1-b3-magru,asw1-b3-magru IPv6,asw1-b3-magru.mgmt with reason: router upgrade
- 14:08 ladsgroup@deploy2002: Started scap sync-world: Backport for Add ParserOutputFlags::PREVENT_SELECTIVE_UPDATE (T348236)
- 14:07 vgutierrez: upload golang-github-florianl-go-tc_0.4.7 to trixie-wikimedia (apt.wm.o) - T401832
- 14:03 XioNoX: asw1-b3-magru> request system reboot - T416442
- 13:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P88844 and previous config saved to /var/cache/conftool/dbconfig/20260217-135949-marostegui.json
- 13:52 Amir1: ladsgroup@deploy2002:~$ mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 echo-subscriptions-web-article-linked
- 13:47 XioNoX: cr2-magru> request vmhost reboot - T416442
- 13:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P88843 and previous config saved to /var/cache/conftool/dbconfig/20260217-134440-marostegui.json
- 13:30 ladsgroup@cumin1003: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
- 13:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T415786)', diff saved to https://phabricator.wikimedia.org/P88841 and previous config saved to /var/cache/conftool/dbconfig/20260217-132932-marostegui.json
- 13:25 ladsgroup@cumin1003: START - Cookbook sre.wikireplicas.update-views
- 13:23 XioNoX: cr1-magru> request vmhost reboot - T416442
- 13:13 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 40 hosts with reason: Switches upgrade
- 13:11 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr2-magru,cr2-magru IPv6 with reason: router upgrade
- 13:08 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host wikikube-worker1030.eqiad.wmnet
- 13:08 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host wikikube-worker1030.eqiad.wmnet
- 13:05 cgoubert@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker1030.eqiad.wmnet
- 13:04 cgoubert@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker1030.eqiad.wmnet
- 13:03 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr1-magru,cr1-magru IPv6 with reason: router upgrade
- 13:02 ayounsi@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 2:00:00 on cr1-magru,cr1-magru IPv6,cr1-magru.mgmt with reason: router upgrade
- 13:01 ayounsi@cumin1003: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=magru [reason: magru maintenance]
- 13:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site magru [reason: no reason specified, T416442]
- 13:00 ayounsi@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool site magru [reason: no reason specified, T416442]
- 12:40 reedy@deploy2002: Finished scap sync-world: Backport for CommonSettings.php: Stop loading WebAuthn (T303495), wmf-config: Remove $wmgUseWebAuthn and extension from extension-list (T303495) (duration: 10m 58s)
- 12:34 reedy@deploy2002: reedy: Continuing with sync
- 12:33 reedy@deploy2002: reedy: Backport for CommonSettings.php: Stop loading WebAuthn (T303495), wmf-config: Remove $wmgUseWebAuthn and extension from extension-list (T303495) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:29 reedy@deploy2002: Started scap sync-world: Backport for CommonSettings.php: Stop loading WebAuthn (T303495), wmf-config: Remove $wmgUseWebAuthn and extension from extension-list (T303495)
- 12:15 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 12:05 jiji@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
- 11:59 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 11:55 jiji@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 11:50 jiji@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 11:46 jiji@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 11:42 jiji@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 11:40 jiji@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 11:29 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Take locked users into account for case auto-closure (T417013) (duration: 37m 25s)
- 11:17 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 11:16 dreamyjazz@deploy2002: dreamyjazz: Backport for Take locked users into account for case auto-closure (T417013) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1190 (T415786)', diff saved to https://phabricator.wikimedia.org/P88840 and previous config saved to /var/cache/conftool/dbconfig/20260217-111643-marostegui.json
- 11:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
- 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T415786)', diff saved to https://phabricator.wikimedia.org/P88839 and previous config saved to /var/cache/conftool/dbconfig/20260217-111618-marostegui.json
- 11:06 moritzm: upgrading clamav on vrts1003 to 1.4.3
- 11:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P88838 and previous config saved to /var/cache/conftool/dbconfig/20260217-110109-marostegui.json
- 10:52 dreamyjazz@deploy2002: Started scap sync-world: Backport for Take locked users into account for case auto-closure (T417013)
- 10:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P88837 and previous config saved to /var/cache/conftool/dbconfig/20260217-104601-marostegui.json
- 10:44 moritzm: upgrading clamav on vrts2002 to 1.4.3
- 10:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T415786)', diff saved to https://phabricator.wikimedia.org/P88836 and previous config saved to /var/cache/conftool/dbconfig/20260217-103053-marostegui.json
- 10:26 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 10:24 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 09:44 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
- 09:44 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
- 09:34 trueg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:33 trueg@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
- 09:26 trueg@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:25 trueg@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
- 09:24 trueg@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 09:24 trueg@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 09:18 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kubestage2001.codfw.wmnet with OS trixie
- 08:58 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
- 08:56 trueg@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 08:56 trueg@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 08:54 jayme@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on kubestage2001.codfw.wmnet with reason: host reimage
- 08:48 wmde-fisch@deploy2002: Finished scap sync-world: Backport for Parsoid: Add safeguard when checking for reflist template (T416630) (duration: 10m 51s)
- 08:42 wmde-fisch@deploy2002: wmde-fisch: Continuing with sync
- 08:41 wmde-fisch@deploy2002: wmde-fisch: Backport for Parsoid: Add safeguard when checking for reflist template (T416630) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:37 wmde-fisch@deploy2002: Started scap sync-world: Backport for Parsoid: Add safeguard when checking for reflist template (T416630)
- 08:34 jayme@cumin1003: START - Cookbook sre.hosts.reimage for host kubestage2001.codfw.wmnet with OS trixie
- 08:30 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
- 08:30 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
- 05:01 mwpresync@deploy2002: Pruned MediaWiki: 1.46.0-wmf.13 (duration: 01m 11s)
- 04:47 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.46.0-wmf.16 refs T413807 (duration: 44m 09s)
- 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.46.0-wmf.16 refs T413807
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 01s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-16
- 22:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2155 (T415786)', diff saved to https://phabricator.wikimedia.org/P88834 and previous config saved to /var/cache/conftool/dbconfig/20260216-222716-marostegui.json
- 22:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
- 22:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T415786)', diff saved to https://phabricator.wikimedia.org/P88833 and previous config saved to /var/cache/conftool/dbconfig/20260216-222651-marostegui.json
- 22:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P88832 and previous config saved to /var/cache/conftool/dbconfig/20260216-221143-marostegui.json
- 21:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P88831 and previous config saved to /var/cache/conftool/dbconfig/20260216-215635-marostegui.json
- 21:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T415786)', diff saved to https://phabricator.wikimedia.org/P88830 and previous config saved to /var/cache/conftool/dbconfig/20260216-214127-marostegui.json
- 19:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1160 (T415786)', diff saved to https://phabricator.wikimedia.org/P88829 and previous config saved to /var/cache/conftool/dbconfig/20260216-194026-marostegui.json
- 19:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
- 19:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
- 19:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
- 19:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
- 19:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
- 19:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
- 19:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
- 19:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
- 19:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
- 19:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
- 19:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
- 19:07 zabe: zabe@deploy2002:~$ mwscript extensions/TimedMediaHandler/maintenance/migrateTranscodeStates.php testwiki --force # T415064
- 18:09 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:09 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:08 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:08 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:08 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:08 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:31 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 17:28 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 17:25 vgutierrez: repool cp7001 - T417536
- 16:52 vgutierrez: depool cp7001 - T417536
- 15:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
- 15:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
- 15:35 fceratto@dns1004: END - running authdns-update
- 15:33 fceratto@dns1004: START - running authdns-update
- 15:32 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 15:07 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 15:00 zabe@deploy2002: Finished scap sync-world: Backport for Start reading from il_target_id on commonswiki (T413669) (duration: 11m 31s)
- 14:54 zabe@deploy2002: zabe: Continuing with sync
- 14:53 zabe@deploy2002: zabe: Backport for Start reading from il_target_id on commonswiki (T413669) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:49 zabe@deploy2002: Started scap sync-world: Backport for Start reading from il_target_id on commonswiki (T413669)
- 14:42 marostegui@dns1006: END - running authdns-update
- 14:41 marostegui: Failover m3 dbproxy (phabricator) T414656
- 14:40 marostegui@dns1006: START - running authdns-update
- 14:01 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=ms-fe1024.eqiad.wmnet
- 14:01 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=ms-fe1023.eqiad.wmnet
- 14:01 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=ms-fe1022.eqiad.wmnet
- 14:01 mvernon@cumin2002: conftool action : set/weight=40; selector: name=ms-fe1024.eqiad.wmnet
- 14:01 mvernon@cumin2002: conftool action : set/weight=40; selector: name=ms-fe1023.eqiad.wmnet
- 14:00 mvernon@cumin2002: conftool action : set/weight=40; selector: name=ms-fe1022.eqiad.wmnet
- 13:58 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=ms-fe1021.eqiad.wmnet
- 13:58 mvernon@cumin2002: conftool action : set/weight=40; selector: name=ms-fe1021.eqiad.wmnet
- 13:57 mvernon@cumin2002: conftool action : set/pooled=yes; selector: name=ms-fe2021.eqiad.wmnet
- 13:57 mvernon@cumin2002: conftool action : set/weight=40; selector: name=ms-fe2021.eqiad.wmnet
- 13:52 jayme: All compatible ipblock sources have been migrated from fetch_external_clouds_vendors_nets.py to hiddenparma - T412805
- 13:42 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe[1009-1020].eqiad.wmnet} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
- 13:42 Amir1: ladsgroup@deploy2002:~$ mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 echo-subscriptions-email-edit-thank
- 13:35 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe[1009-1020].eqiad.wmnet} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
- 12:46 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 12:46 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 12:44 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.update-replication (exit_code=99)
- 12:44 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 12:43 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.update-replication (exit_code=99)
- 12:43 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 12:42 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.update-replication (exit_code=99)
- 12:42 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 12:41 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.update-replication (exit_code=99)
- 12:41 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 12:40 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.update-replication (exit_code=99)
- 12:40 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 12:39 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.update-replication (exit_code=99)
- 12:39 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 12:38 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 12:38 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 12:37 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.update-replication (exit_code=99)
- 12:37 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 12:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 12:36 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 12:36 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 12:36 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 12:36 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.update-replication (exit_code=99)
- 12:36 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 12:34 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.update-replication (exit_code=99)
- 12:34 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 12:26 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 12:26 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 11:53 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=mediawikiwiki --reason 'Requested at phab:T417210' 'Writing systems/Syntax' 'Language Converter/Advanced syntax' Ammarpad # T417210
- 11:40 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 11:34 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 11:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:28 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "decom puppetmaster1001 - jmm@cumin2002"
- 11:27 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "decom puppetmaster1001 - jmm@cumin2002"
- 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts puppetmaster1001.eqiad.wmnet
- 11:01 jmm@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=94)
- 10:56 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 10:50 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetmaster1001.eqiad.wmnet
- 10:26 jayme@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1003 - T412805"
- 10:26 jayme@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1003 - T412805
- 10:25 jayme@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1003 - T412805
- 10:25 jayme@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1003 - T412805"
- 10:24 jayme@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1003 - T412805"
- 10:24 jayme@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1003 - T412805
- 10:23 jayme@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: [not really into teleological thinking] - jayme@cumin1003 - T412805
- 10:23 jayme@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "[not really into teleological thinking] - jayme@cumin1003 - T412805"
- 10:20 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 10:20 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 10:17 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.failover (exit_code=0) from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 10:16 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
- 10:16 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
- 10:16 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
- 10:15 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
- 10:13 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit2003, replica=gerrit1003)
- 10:13 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit2003, replica=gerrit1003)
- 10:09 arnaudb@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) gerrit.wikimedia.org gerrit-replica.wikimedia.org gerrit.discovery.wmnet on all recursors
- 10:09 arnaudb@cumin1003: START - Cookbook sre.dns.wipe-cache gerrit.wikimedia.org gerrit-replica.wikimedia.org gerrit.discovery.wmnet on all recursors
- 10:08 arnaudb@dns1004: END - running authdns-update
- 10:03 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.sync-instances (exit_code=0) sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 10:01 arnaudb@cumin1003: START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 10:00 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.localbackup (exit_code=0) Prepare local backup on: gerrit1003.wikimedia.org
- 10:00 arnaudb@cumin1003: START - Cookbook sre.gerrit.localbackup Prepare local backup on: gerrit1003.wikimedia.org
- 09:59 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit2003.wikimedia.org
- 09:59 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit2003.wikimedia.org
- 09:59 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.read-only-toggle (exit_code=0) from gerrit1003.wikimedia.org
- 09:54 arnaudb@cumin1003: START - Cookbook sre.gerrit.read-only-toggle from gerrit1003.wikimedia.org
- 09:50 arnaudb@dns1004: START - running authdns-update
- 09:50 arnaudb@cumin1003: START - Cookbook sre.gerrit.failover from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 09:50 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.topology-check (exit_code=0) Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
- 09:50 arnaudb@cumin1003: START - Cookbook sre.gerrit.topology-check Validate Gerrit topology (source=gerrit1003, replica=gerrit2003)
- {{safesubst:SAL entry|1=09:35 hashar@deploy2002: Finished scap sync-world: Backport for Add "Learn more" link below Baby Globe on Minerva (T417077), Update Qids to initial public version, Escape the unescaped i18n messages (T410091), Do not show companion when visual editor is active (T417078), [[gerrit:1239400|Setting $wgWp25EasterEggsEnable to true for Wikipedias. (T4171}}
- 09:22 hashar@deploy2002: jdrewniak, jhsoby, hashar, stang: Continuing with sync
- {{safesubst:SAL entry|1=09:16 hashar@deploy2002: jdrewniak, jhsoby, hashar, stang: Backport for Add "Learn more" link below Baby Globe on Minerva (T417077), Update Qids to initial public version, Escape the unescaped i18n messages (T410091), Do not show companion when visual editor is active (T417078), [[gerrit:1239400|Setting $wgWp25EasterEggsEnable to true for Wikipedias}}
- {{safesubst:SAL entry|1=08:52 hashar@deploy2002: Started scap sync-world: Backport for Add "Learn more" link below Baby Globe on Minerva (T417077), Update Qids to initial public version, Escape the unescaped i18n messages (T410091), Do not show companion when visual editor is active (T417078), [[gerrit:1239400|Setting $wgWp25EasterEggsEnable to true for Wikipedias. (T41711}}
- 08:39 mszwarc@deploy2002: Finished scap sync-world: Backport for Add infobox case handling for Special:IPContributions (T417250) (duration: 36m 38s)
- 08:26 mszwarc@deploy2002: mszwarc, kharlan: Continuing with sync
- 08:26 mszwarc@deploy2002: mszwarc, kharlan: Backport for Add infobox case handling for Special:IPContributions (T417250) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:24 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.sync-instances (exit_code=0) sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 08:21 arnaudb@cumin1003: START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 08:03 mszwarc@deploy2002: Started scap sync-world: Backport for Add infobox case handling for Special:IPContributions (T417250)
- 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1022.eqiad.wmnet with OS trixie
- 07:16 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.sync-instances (exit_code=0) sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 07:13 arnaudb@cumin1003: START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 07:12 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.sync-instances (exit_code=0) sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 07:08 arnaudb@cumin1003: START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 07:07 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1022.eqiad.wmnet with reason: host reimage
- 07:07 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.sync-instances (exit_code=0) sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 07:01 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1022.eqiad.wmnet with reason: host reimage
- 06:59 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.localbackup (exit_code=0) Prepare local backup on: gerrit1003.wikimedia.org
- 06:55 arnaudb@cumin1003: START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 06:55 arnaudb@cumin1003: START - Cookbook sre.gerrit.localbackup Prepare local backup on: gerrit1003.wikimedia.org
- 06:45 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host dbproxy1022.eqiad.wmnet with OS trixie
- 06:19 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2147 (T415786)', diff saved to https://phabricator.wikimedia.org/P88824 and previous config saved to /var/cache/conftool/dbconfig/20260216-061940-marostegui.json
- 06:19 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
- 06:18 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 46s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-15
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 58s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-14
- 21:09 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
- 21:09 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T410589)', diff saved to https://phabricator.wikimedia.org/P88823 and previous config saved to /var/cache/conftool/dbconfig/20260214-210906-ladsgroup.json
- 20:58 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P88822 and previous config saved to /var/cache/conftool/dbconfig/20260214-205858-ladsgroup.json
- 20:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P88821 and previous config saved to /var/cache/conftool/dbconfig/20260214-204849-ladsgroup.json
- 20:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T410589)', diff saved to https://phabricator.wikimedia.org/P88820 and previous config saved to /var/cache/conftool/dbconfig/20260214-203841-ladsgroup.json
- 17:57 dzahn@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Alexandros Kosiaris out of all services on: 2447 hosts
- 10:02 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1226 (T410589)', diff saved to https://phabricator.wikimedia.org/P88819 and previous config saved to /var/cache/conftool/dbconfig/20260214-100210-ladsgroup.json
- 10:02 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
- 10:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T410589)', diff saved to https://phabricator.wikimedia.org/P88818 and previous config saved to /var/cache/conftool/dbconfig/20260214-100145-ladsgroup.json
- 09:51 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P88817 and previous config saved to /var/cache/conftool/dbconfig/20260214-095137-ladsgroup.json
- 09:41 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P88816 and previous config saved to /var/cache/conftool/dbconfig/20260214-094129-ladsgroup.json
- 09:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T410589)', diff saved to https://phabricator.wikimedia.org/P88815 and previous config saved to /var/cache/conftool/dbconfig/20260214-093121-ladsgroup.json
- 03:37 brett: Import lua5.4-maxminddb 0.1.1~deb13u1 into trixie-wikimedia (T401832)
- 02:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 02s)
- 02:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:12 mutante: cumin1003 - race condition between puppet and systemd - puppet fails because userdel fails. userdel fails because there is still a pid used by the user. that process is /lib/systemdl/system --user - ran "loginctl terminate-user"; killed tmux PID; ran puppet again T417465
2026-02-13
- 23:25 brett: Import prometheus-varnishkafka-exporter 0.1~deb13u1 into trixie-wikimedia (T401832)
- 23:01 brett: Import varnishkafka 1.2.0~deb13+wmf1 into trixie-wikimedia (T401832)
- 22:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1214 (T410589)', diff saved to https://phabricator.wikimedia.org/P88814 and previous config saved to /var/cache/conftool/dbconfig/20260213-224820-ladsgroup.json
- 22:48 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
- 22:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T410589)', diff saved to https://phabricator.wikimedia.org/P88813 and previous config saved to /var/cache/conftool/dbconfig/20260213-224806-ladsgroup.json
- 22:37 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P88812 and previous config saved to /var/cache/conftool/dbconfig/20260213-223758-ladsgroup.json
- 22:30 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mwlog2003.codfw.wmnet with OS bookworm
- 22:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 22:29 sukhe@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on cp2043.codfw.wmnet with reason: host not provisioned yet
- 22:28 sukhe: disable puppet on cp2043: test host, wmfuniq service broken (needs upgrade for trixie)
- 22:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P88811 and previous config saved to /var/cache/conftool/dbconfig/20260213-222749-ladsgroup.json
- 22:25 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
- 22:17 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T410589)', diff saved to https://phabricator.wikimedia.org/P88810 and previous config saved to /var/cache/conftool/dbconfig/20260213-221741-ladsgroup.json
- 22:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mwlog2003.codfw.wmnet with reason: host reimage
- 22:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mwlog2003.codfw.wmnet with reason: host reimage
- 21:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2015.codfw.wmnet with OS trixie
- 21:40 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mwlog2003.codfw.wmnet with OS bookworm
- 21:39 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host mwlog2003.codfw.wmnet with OS bookworm
- 20:54 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host mwlog2003.codfw.wmnet with OS bookworm
- 20:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephosd2008-dev
- 20:41 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2008-dev
- 20:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd2008-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcephosd2008-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd2008-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host cloudcephosd2008-dev.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephosd2008-dev
- 20:38 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd2008-dev
- 20:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2015.codfw.wmnet with OS trixie
- 20:38 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 20:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudcephosd2008-dev to codfw - jhancock@cumin2002"
- 20:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding cloudcephosd2008-dev to codfw - jhancock@cumin2002"
- 20:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['backup2015']
- 20:37 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2015']
- 20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 20:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 20:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 19:55 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 365 days, 0:00:00 on cp[2043-2044].codfw.wmnet with reason: These are test instances, not prod yet
- 19:30 trueg@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 19:30 trueg@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mwlog2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:53 vgutierrez: upload lua5.4-maxminddb_0.1.1-1 to bullseye-wikimedia (apt.wm.o) - T417291
- 16:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host mwlog2003.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:50 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host mwlog2003
- 16:50 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host mwlog2003
- 16:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2019.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2018.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2020.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2019.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2018.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 16:15 jhancock@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2346.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:07 jhancock@cumin1003: START - Cookbook sre.hosts.provision for host wikikube-worker2346.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
- 16:07 jhancock@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2346
- 16:06 jhancock@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2346
- 16:06 jhancock@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:06 jhancock@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding backup2015 to codfw - jhancock@cumin1003"
- 16:06 jhancock@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding backup2015 to codfw - jhancock@cumin1003"
- 16:00 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 16:00 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 16:00 jhancock@cumin1003: START - Cookbook sre.dns.netbox
- 15:42 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:42 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:27 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts sretest1002.eqiad.wmnet
- 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 15:27 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 15:25 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:20 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 15:16 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts sretest1002.eqiad.wmnet
- 15:09 btullis@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host wdqs1028.eqiad.wmnet
- 14:25 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host wdqs1028.eqiad.wmnet
- 13:59 moritzm: upload a backport of OpenJDK 21.0.10 to component/jdk21 for the Gerrit migration to Bookworm T392465
- 13:44 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:39 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts sretest1004.eqiad.wmnet
- 13:39 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:39 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 13:33 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts sretest1004.eqiad.wmnet
- 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts sretest1003.eqiad.wmnet
- 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:32 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 13:32 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: sretest1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 13:26 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 13:20 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts sretest1003.eqiad.wmnet
- 12:49 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 12:42 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 12:16 marostegui: Deploy schema change on x1 master with replication https://phabricator.wikimedia.org/T417386
- 11:05 moritzm: uploaded wmf-laptop 1.0.5 to component/wmf-laptop
- 10:53 elukey@cumin1003: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
- 10:34 elukey@cumin1003: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
- 10:16 trueg@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 10:16 trueg@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 09:58 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1203 (T410589)', diff saved to https://phabricator.wikimedia.org/P88807 and previous config saved to /var/cache/conftool/dbconfig/20260213-095855-ladsgroup.json
- 09:58 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
- 09:58 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T410589)', diff saved to https://phabricator.wikimedia.org/P88806 and previous config saved to /var/cache/conftool/dbconfig/20260213-095830-ladsgroup.json
- 09:48 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P88805 and previous config saved to /var/cache/conftool/dbconfig/20260213-094822-ladsgroup.json
- 09:38 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P88804 and previous config saved to /var/cache/conftool/dbconfig/20260213-093814-ladsgroup.json
- 09:28 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T410589)', diff saved to https://phabricator.wikimedia.org/P88803 and previous config saved to /var/cache/conftool/dbconfig/20260213-092806-ladsgroup.json
- 07:44 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 07:09 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.sync-instances (exit_code=0) sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 07:08 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.localbackup (exit_code=0) Prepare local backup on: gerrit1003.wikimedia.org
- 07:08 arnaudb@cumin1003: START - Cookbook sre.gerrit.localbackup Prepare local backup on: gerrit1003.wikimedia.org
- 07:02 arnaudb@cumin1003: START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 02:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 15s)
- 02:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 00:22 zabe@deploy2002: Finished scap sync-world: Backport for file: Stop setting 'omit-nonlazy' while loading extra rows from the db (T417301) (duration: 08m 15s)
- 00:17 zabe@deploy2002: zabe: Continuing with sync
- 00:15 zabe@deploy2002: zabe: Backport for file: Stop setting 'omit-nonlazy' while loading extra rows from the db (T417301) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:13 zabe@deploy2002: Started scap sync-world: Backport for file: Stop setting 'omit-nonlazy' while loading extra rows from the db (T417301)
2026-02-12
- 22:32 jdrewniak@deploy2002: Finished scap sync-world: Backport for Enabling sitenotices on Minerva on all Wikipedias (T416644), Partially enable WP25EasterEggs on wikipedias. (T417115) (duration: 07m 58s)
- 22:30 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 22:30 ryankemper@cumin2002: END (ERROR) - Cookbook sre.hadoop.reboot-workers (exit_code=97) for Hadoop analytics cluster
- 22:30 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 22:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 22:30 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 22:28 jdrewniak@deploy2002: jdrewniak: Continuing with sync
- 22:26 jdrewniak@deploy2002: jdrewniak: Backport for Enabling sitenotices on Minerva on all Wikipedias (T416644), Partially enable WP25EasterEggs on wikipedias. (T417115) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:24 jdrewniak@deploy2002: Started scap sync-world: Backport for Enabling sitenotices on Minerva on all Wikipedias (T416644), Partially enable WP25EasterEggs on wikipedias. (T417115)
- 22:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:19 jdrewniak@deploy2002: Finished scap sync-world: Backport for Allow enabling extension via query parameter for testing (T416218), resources: Squeeze static images more without visible quality loss (T417307) (duration: 07m 44s)
- 22:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2017.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:15 jdrewniak@deploy2002: jdrewniak: Continuing with sync
- 22:14 jdrewniak@deploy2002: jdrewniak: Backport for Allow enabling extension via query parameter for testing (T416218), resources: Squeeze static images more without visible quality loss (T417307) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:12 jdrewniak@deploy2002: Started scap sync-world: Backport for Allow enabling extension via query parameter for testing (T416218), resources: Squeeze static images more without visible quality loss (T417307)
- 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2017.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2016.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:05 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2015.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 22:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup2020
- 22:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host backup2020
- 22:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup2019
- 22:04 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host backup2019
- 22:04 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup2018
- 22:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host backup2018
- 22:03 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup2017
- 22:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host backup2017
- 22:03 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup2016
- 22:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host backup2016
- 22:03 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup2015
- 22:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host backup2015
- 22:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 22:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding backup2015 to codfw - jhancock@cumin2002"
- 22:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding backup2015 to codfw - jhancock@cumin2002"
- 21:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
- 21:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1193 (T410589)', diff saved to https://phabricator.wikimedia.org/P88799 and previous config saved to /var/cache/conftool/dbconfig/20260212-211439-ladsgroup.json
- 21:14 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
- 21:14 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T410589)', diff saved to https://phabricator.wikimedia.org/P88798 and previous config saved to /var/cache/conftool/dbconfig/20260212-211414-ladsgroup.json
- 21:04 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P88797 and previous config saved to /var/cache/conftool/dbconfig/20260212-210406-ladsgroup.json
- 20:53 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260212-205353-ladsgroup.json
- 20:43 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T410589)', diff saved to https://phabricator.wikimedia.org/P88795 and previous config saved to /var/cache/conftool/dbconfig/20260212-204345-ladsgroup.json
- 18:10 dzahn@dns1004: END - running authdns-update
- 18:09 dzahn@dns1004: START - running authdns-update
- 17:51 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum7003.magru.wmnet with OS trixie
- 17:40 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum3006.esams.wmnet with OS trixie
- 17:34 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Revert^4 "IPReputation: Switch to OpenSearch backend" (duration: 09m 41s)
- 17:30 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 17:27 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
- 17:26 dreamyjazz@deploy2002: dreamyjazz: Backport for Revert^4 "IPReputation: Switch to OpenSearch backend" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:24 dreamyjazz@deploy2002: Started scap sync-world: Backport for Revert^4 "IPReputation: Switch to OpenSearch backend"
- 17:23 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3006.esams.wmnet with reason: host reimage
- 17:19 moritzm: installing zsh updates from Bookworm point release
- 17:19 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum7003.magru.wmnet with reason: host reimage
- 17:18 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3006.esams.wmnet with reason: host reimage
- 16:51 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum3006.esams.wmnet with OS trixie
- 16:50 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum7003.magru.wmnet with OS trixie
- 16:50 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 16:45 papaul: power off cr2-codfw fpc5
- 16:44 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum7004.magru.wmnet with OS trixie
- 16:38 ladsgroup@deploy2002: Finished scap sync-world: Backport for thumbUrl: Adjust the samll size to match common standard sizes (T414805), thumbUrl: Adjust the samll size to match common standard sizes (T414805) (duration: 07m 37s)
- 16:36 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host durum3005.esams.wmnet with OS trixie
- 16:34 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 16:33 ladsgroup@deploy2002: ladsgroup: Backport for thumbUrl: Adjust the samll size to match common standard sizes (T414805), thumbUrl: Adjust the samll size to match common standard sizes (T414805) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:31 ladsgroup@deploy2002: Started scap sync-world: Backport for thumbUrl: Adjust the samll size to match common standard sizes (T414805), thumbUrl: Adjust the samll size to match common standard sizes (T414805)
- 16:29 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum7004.magru.wmnet with reason: host reimage
- 16:24 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum7004.magru.wmnet with reason: host reimage
- 16:23 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
- 16:23 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
- 16:23 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
- 16:22 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
- 16:22 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
- 16:22 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
- 16:22 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
- 16:22 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
- 16:22 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
- 16:21 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
- 16:21 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
- 16:21 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
- 16:21 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-experimental: apply
- 16:21 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-experimental: apply
- 16:21 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-experimental: apply
- 16:21 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-experimental: apply
- 16:20 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
- 16:20 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
- 16:20 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
- 16:20 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
- 16:19 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
- 16:19 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
- 16:19 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
- 16:19 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
- 16:18 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
- 16:18 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
- 16:18 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
- 16:18 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
- 16:17 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
- 16:16 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
- 16:16 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
- 16:16 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
- 16:14 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on durum3005.esams.wmnet with reason: host reimage
- 16:08 sukhe@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on durum3005.esams.wmnet with reason: host reimage
- 15:57 vgutierrez: downgrade to HAProxy 2.8 in cp4052 - T417291
- 15:53 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum7004.magru.wmnet with OS trixie
- 15:47 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:43 sukhe@cumin1003: START - Cookbook sre.hosts.reimage for host durum3005.esams.wmnet with OS trixie
- 15:42 vgutierrez: testing HAProxy 3.0.15 in cp4052 - T417291
- 15:38 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1035.eqiad.wmnet
- 15:38 eevans@cumin1003: START - Cookbook sre.hosts.remove-downtime for restbase1035.eqiad.wmnet
- 15:37 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
- 15:37 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
- 15:37 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 15:36 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 15:31 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:30 eevans@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on restbase1035.eqiad.wmnet with reason: Checking mountpoint ownership & permissions
- 15:23 vgutierrez: fetch haproxy 3.0.15 on thirdparty/haproxy30 (bullseye-wikimedia) - T401832
- 15:18 jgreen@dns1004: END - running authdns-update
- 15:18 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 15:16 jgreen@dns1004: START - running authdns-update
- 15:07 Lucas_WMDE: UTC afternoon backport+config window done
- 15:07 Lucas_WMDE: printf 'https://en.wikipedia.org/static/images/mobile/copyright/wikipedia-tagline-kaj.svg\n' | mwscript-k8s --comment=T415038 --attach -- purgeList enwiki
- 15:06 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for kajwiki: fix tagline (T415038) (duration: 08m 02s)
- 15:02 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, anzx: Continuing with sync
- 15:01 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, anzx: Backport for kajwiki: fix tagline (T415038) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:58 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for kajwiki: fix tagline (T415038)
- 14:58 btullis@puppetserver1001: conftool action : set/pooled=no; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1019.eqiad.wmnet
- 14:58 btullis@puppetserver1001: conftool action : set/pooled=no; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1018.eqiad.wmnet
- 14:58 btullis@puppetserver1001: conftool action : set/pooled=no; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1017.eqiad.wmnet
- 14:58 btullis@puppetserver1001: conftool action : set/pooled=no; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1016.eqiad.wmnet
- 14:58 btullis@puppetserver1001: conftool action : set/pooled=no; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1015.eqiad.wmnet
- 14:58 btullis@puppetserver1001: conftool action : set/pooled=no; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1013.eqiad.wmnet
- 14:58 btullis@puppetserver1001: conftool action : set/pooled=no; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1012.eqiad.wmnet
- 14:57 btullis@puppetserver1001: conftool action : set/pooled=no; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1011.eqiad.wmnet
- 14:57 btullis@puppetserver1001: conftool action : set/pooled=no; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1010.eqiad.wmnet
- 14:56 blake@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
- 14:56 blake@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
- 14:55 btullis@puppetserver1001: conftool action : set/pooled=yes; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1014.eqiad.wmnet
- 14:55 blake@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
- 14:55 btullis@puppetserver1001: conftool action : set/weight=1; selector: service=kubesvc,cluster=dse-k8s,dc=eqiad,name=dse-k8s-worker1014.eqiad.wmnet
- 14:55 blake@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
- 14:52 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 9 hosts with reason: shut off 1Gbps hosts
- 14:49 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for pplwiki: add logos (T415046) (duration: 10m 00s)
- 14:49 moritzm: uploaded bird2 2.18-1~wmf12u2 to the main component of bookworm-wikimedia T413740
- 14:49 moritzm: upgrading cloudnet* to dnsmasq 2.92 T396864
- 14:45 lucaswerkmeister-wmde@deploy2002: anzx, lucaswerkmeister-wmde: Continuing with sync
- 14:43 Amir1: ladsgroup@deploy2002:~$ mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 echo-subscriptions-web-reverted
- 14:41 lucaswerkmeister-wmde@deploy2002: anzx, lucaswerkmeister-wmde: Backport for pplwiki: add logos (T415046) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:39 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for pplwiki: add logos (T415046)
- 14:37 moritzm: upgrading cloudvirt* to dnsmasq 2.92 T396864
- 14:35 Lucas_WMDE: printf 'https://en.wikipedia.org/static/images/project-logos/kajwiki%s.png\n' '-1.5x' '-2x' | mwscript-k8s --comment=T415038 --attach -- purgeList enwiki
- 14:35 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for kajwiki: add tagline (T415038) (duration: 07m 56s)
- 14:31 lucaswerkmeister-wmde@deploy2002: anzx, lucaswerkmeister-wmde: Continuing with sync
- 14:29 lucaswerkmeister-wmde@deploy2002: anzx, lucaswerkmeister-wmde: Backport for kajwiki: add tagline (T415038) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:27 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for kajwiki: add tagline (T415038)
- 14:25 lucaswerkmeister-wmde@deploy2002: mwscript-k8s job started: namespaceDupes pplwiki --fix # T415046
- 14:25 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for pplwiki: set sitename, projectnamespace and timezone (T415046) (duration: 08m 27s)
- 14:20 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, anzx: Continuing with sync
- 14:18 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, anzx: Backport for pplwiki: set sitename, projectnamespace and timezone (T415046) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for pplwiki: set sitename, projectnamespace and timezone (T415046)
- 14:12 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert^3 "IPReputation: Switch to OpenSearch backend" (T416164) (duration: 09m 13s)
- 14:07 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, stran: Continuing with sync
- 14:04 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, stran: Backport for Revert^3 "IPReputation: Switch to OpenSearch backend" (T416164) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:02 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert^3 "IPReputation: Switch to OpenSearch backend" (T416164)
- 12:14 btullis@cumin1003: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
- 11:40 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-etcd1003.eqiad.wmnet with OS bookworm
- 11:27 btullis@cumin1003: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
- 11:19 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd1003.eqiad.wmnet with reason: host reimage
- 11:14 marostegui: Rename tables on x1 T417172
- 11:12 dpogorzelski@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd1003.eqiad.wmnet with reason: host reimage
- 11:12 reedy@deploy2002: Finished scap sync-world: Backport for Allow a selection of third-level wmcloud/toolforge domains for UrlShortener (T413211), CommonSettings: Temporarily set $wgOATHUserHandlesTable = true (T416544) (duration: 06m 48s)
- 11:07 reedy@deploy2002: filippo, reedy: Continuing with sync
- 11:07 reedy@deploy2002: filippo, reedy: Backport for Allow a selection of third-level wmcloud/toolforge domains for UrlShortener (T413211), CommonSettings: Temporarily set $wgOATHUserHandlesTable = true (T416544) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:05 reedy@deploy2002: Started scap sync-world: Backport for Allow a selection of third-level wmcloud/toolforge domains for UrlShortener (T413211), CommonSettings: Temporarily set $wgOATHUserHandlesTable = true (T416544)
- 11:04 dpogorzelski@cumin1003: START - Cookbook sre.hosts.reimage for host ml-etcd1003.eqiad.wmnet with OS bookworm
- 10:47 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-etcd1002.eqiad.wmnet with OS bookworm
- 10:37 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1004.eqiad.wmnet
- 10:28 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-master1004.eqiad.wmnet
- 10:27 btullis@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1135.eqiad.wmnet
- 10:18 jmm@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling restart_daemons on A:kafka-test-eqiad
- 10:18 btullis@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-worker1135.eqiad.wmnet
- 10:09 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd1002.eqiad.wmnet with reason: host reimage
- 10:02 dpogorzelski@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd1002.eqiad.wmnet with reason: host reimage
- 10:01 jmm@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling restart_daemons on A:kafka-test-eqiad
- 09:49 dpogorzelski@cumin1003: START - Cookbook sre.hosts.reimage for host ml-etcd1002.eqiad.wmnet with OS bookworm
- 09:47 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-etcd1001.eqiad.wmnet with OS bookworm
- 09:37 ayounsi@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm
- 09:24 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling restart_daemons on A:ldap-replicas-codfw
- 09:23 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling restart_daemons on A:ldap-replicas-codfw
- 09:23 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm
- 09:12 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.15 refs T413806
- 08:31 moritzm: installing wmf-certificates 0_~20260209 on bullseye hosts T415255
- 08:28 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd1001.eqiad.wmnet with reason: host reimage
- 08:26 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1192 (T410589)', diff saved to https://phabricator.wikimedia.org/P88793 and previous config saved to /var/cache/conftool/dbconfig/20260212-082601-ladsgroup.json
- 08:25 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
- 08:25 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T410589)', diff saved to https://phabricator.wikimedia.org/P88792 and previous config saved to /var/cache/conftool/dbconfig/20260212-082537-ladsgroup.json
- 08:22 dpogorzelski@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd1001.eqiad.wmnet with reason: host reimage
- 08:15 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P88791 and previous config saved to /var/cache/conftool/dbconfig/20260212-081529-ladsgroup.json
- 08:11 dpogorzelski@cumin1003: START - Cookbook sre.hosts.reimage for host ml-etcd1001.eqiad.wmnet with OS bookworm
- 08:05 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260212-080516-ladsgroup.json
- 08:01 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.sync-instances (exit_code=0) sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 07:59 arnaudb@cumin1003: START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 07:55 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T410589)', diff saved to https://phabricator.wikimedia.org/P88789 and previous config saved to /var/cache/conftool/dbconfig/20260212-075508-ladsgroup.json
- 07:52 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.sync-instances (exit_code=0) sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 07:49 moritzm: installing wmf-certificates 0_~20260209 on bookworm hosts T415255
- 07:37 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.localbackup (exit_code=0) Prepare local backup on: gerrit1003.wikimedia.org
- 07:37 arnaudb@cumin1003: START - Cookbook sre.gerrit.localbackup Prepare local backup on: gerrit1003.wikimedia.org
- 07:36 arnaudb@cumin1003: START - Cookbook sre.gerrit.sync-instances sync Gerrit data from gerrit1003.wikimedia.org to gerrit2003.wikimedia.org
- 02:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 01s)
- 02:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-11
- {{safesubst:SAL entry|1=23:23 jdrewniak@deploy2002: Finished scap sync-world: Backport for Localisation updates from https://translatewiki.net., Use HashConfig in tests instead of mocking it, Add Baby Globe click interactions (T416362), Localisation updates from https://translatewiki.net., [[gerrit:1238843|Add configuration variable to control default companion visibility}}
- 23:19 jdrewniak@deploy2002: jdrewniak: Continuing with sync
- {{safesubst:SAL entry|1=23:14 jdrewniak@deploy2002: jdrewniak: Backport for Localisation updates from https://translatewiki.net., Use HashConfig in tests instead of mocking it, Add Baby Globe click interactions (T416362), Localisation updates from https://translatewiki.net., Add configuration variable to control default companion visibility (T417076), [[}}
- {{safesubst:SAL entry|1=23:09 jdrewniak@deploy2002: Started scap sync-world: Backport for Localisation updates from https://translatewiki.net., Use HashConfig in tests instead of mocking it, Add Baby Globe click interactions (T416362), Localisation updates from https://translatewiki.net., [[gerrit:1238843|Add configuration variable to control default companion visibility (}}
- 22:09 cjming: end of UTC late backport window
- 22:07 cjming@deploy2002: Finished scap sync-world: Backport for Fix contextual attributes when stream is set (T417091) (duration: 06m 41s)
- 22:03 cjming@deploy2002: cjming: Continuing with sync
- 22:03 cjming@deploy2002: cjming: Backport for Fix contextual attributes when stream is set (T417091) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:01 cjming@deploy2002: Started scap sync-world: Backport for Fix contextual attributes when stream is set (T417091)
- 21:53 cjming@deploy2002: Finished scap sync-world: Backport for Mock Experiment class in ExperimentTestKitchenManager unit test (duration: 06m 48s)
- 21:49 cjming@deploy2002: cjming: Continuing with sync
- 21:48 cjming@deploy2002: cjming: Backport for Mock Experiment class in ExperimentTestKitchenManager unit test synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:46 cjming@deploy2002: Started scap sync-world: Backport for Mock Experiment class in ExperimentTestKitchenManager unit test
- 21:34 cjming@deploy2002: Finished scap sync-world: Backport for Remove A/B test for hCaptcha editing (T410354) (duration: 06m 53s)
- 21:30 cjming@deploy2002: kharlan, cjming: Continuing with sync
- 21:30 cjming@deploy2002: kharlan, cjming: Backport for Remove A/B test for hCaptcha editing (T410354) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:27 cjming@deploy2002: Started scap sync-world: Backport for Remove A/B test for hCaptcha editing (T410354)
- 21:25 cjming@deploy2002: Finished scap sync-world: Backport for Configure rate limit class for global bots (T415588), Remove the wgGlobalWatchlistWikibaseSite variable values (T415440) (duration: 07m 43s)
- 21:20 cjming@deploy2002: cjming, matmarex, ikhitron: Continuing with sync
- 21:19 cjming@deploy2002: cjming, matmarex, ikhitron: Backport for Configure rate limit class for global bots (T415588), Remove the wgGlobalWatchlistWikibaseSite variable values (T415440) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:17 cjming@deploy2002: Started scap sync-world: Backport for Configure rate limit class for global bots (T415588), Remove the wgGlobalWatchlistWikibaseSite variable values (T415440)
- 21:12 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-worker1132.eqiad.wmnet
- 21:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 21:12 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1132.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
- 21:12 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: an-worker1132.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - ryankemper@cumin2002"
- 21:08 kharlan@deploy2002: Finished scap sync-world: Backport for IPoid: Add configurable connectTimeout (T416164), IPoid: Add configurable connectTimeout (T416164) (duration: 08m 11s)
- 21:06 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
- 21:04 kharlan@deploy2002: stran, kharlan: Continuing with sync
- 21:03 kharlan@deploy2002: stran, kharlan: Backport for IPoid: Add configurable connectTimeout (T416164), IPoid: Add configurable connectTimeout (T416164) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:02 ryankemper@cumin2002: START - Cookbook sre.hosts.decommission for hosts an-worker1132.eqiad.wmnet
- 21:00 kharlan@deploy2002: Started scap sync-world: Backport for IPoid: Add configurable connectTimeout (T416164), IPoid: Add configurable connectTimeout (T416164)
- 20:48 ryankemper: T411919 Rebooting `an-worker1148` again. First reboot went great, but just double-checking, and also I'd removed a duplicate fstab entry so this will sanity check my change there
- 20:19 ryankemper: T411919 Rebooting `an-worker1148` to see if it comes back up properly after having switched the RAID card
- 19:32 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1178 (T410589)', diff saved to https://phabricator.wikimedia.org/P88784 and previous config saved to /var/cache/conftool/dbconfig/20260211-193220-ladsgroup.json
- 19:32 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
- 19:31 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T410589)', diff saved to https://phabricator.wikimedia.org/P88783 and previous config saved to /var/cache/conftool/dbconfig/20260211-193155-ladsgroup.json
- 19:21 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P88779 and previous config saved to /var/cache/conftool/dbconfig/20260211-192146-ladsgroup.json
- 19:11 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P88778 and previous config saved to /var/cache/conftool/dbconfig/20260211-191138-ladsgroup.json
- 19:01 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T410589)', diff saved to https://phabricator.wikimedia.org/P88777 and previous config saved to /var/cache/conftool/dbconfig/20260211-190130-ladsgroup.json
- 18:16 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:16 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:15 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:15 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:14 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:14 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:14 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:14 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:11 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:11 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:10 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:10 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:09 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:09 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:08 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:08 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:07 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:07 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:07 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:07 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:04 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:04 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:03 urbanecm@deploy2002: Finished scap sync-world: Backport for fix(SuggestedEdits): Fix inversed condition (T417195) (duration: 08m 06s)
- 17:59 urbanecm@deploy2002: urbanecm: Continuing with sync
- 17:58 urbanecm@deploy2002: urbanecm: Backport for fix(SuggestedEdits): Fix inversed condition (T417195) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:55 urbanecm@deploy2002: Started scap sync-world: Backport for fix(SuggestedEdits): Fix inversed condition (T417195)
- 17:52 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:52 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:50 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:50 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:50 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:46 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:44 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:44 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:43 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:43 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:43 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.update-replication (exit_code=97)
- 17:43 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:43 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:42 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:42 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.update-replication (exit_code=97)
- 17:42 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:24 urbanecm@deploy2002: Finished scap sync-world: Backport for fix(SuggestedEdits): Fix inversed condition (T417195) (duration: 07m 36s)
- 17:19 urbanecm@deploy2002: urbanecm: Continuing with sync
- 17:18 urbanecm@deploy2002: urbanecm: Backport for fix(SuggestedEdits): Fix inversed condition (T417195) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:16 urbanecm@deploy2002: Started scap sync-world: Backport for fix(SuggestedEdits): Fix inversed condition (T417195)
- 16:31 urbanecm@deploy2002: Finished scap sync-world: Backport for Growth: Enable GrowthExperiments on 19 additional wikis (T417019) (duration: 10m 52s)
- 16:27 btullis@cumin1003: END (ERROR) - Cookbook sre.druid.roll-restart-workers (exit_code=97) for Druid analytics cluster: Roll restart of Druid jvm daemons.
- 16:24 urbanecm@deploy2002: urbanecm: Continuing with sync
- 16:23 urbanecm@deploy2002: urbanecm: Backport for Growth: Enable GrowthExperiments on 19 additional wikis (T417019) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:20 urbanecm@deploy2002: Started scap sync-world: Backport for Growth: Enable GrowthExperiments on 19 additional wikis (T417019)
- 16:20 elukey@deploy2002: Finished scap sync-world: rollout full images after the docker registry backends has been restored to swift. (duration: 39m 17s)
- 16:08 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: wmf-certificates update - jmm@cumin2002
- 15:58 btullis@cumin1003: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
- 15:48 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: wmf-certificates update - jmm@cumin2002
- 15:42 elukey@deploy2002: Started scap sync-world: rollout full images after the docker registry backends has been restored to swift.
- 15:38 elukey: [ROLLBACK] move the Docker Registry's /v2/restricted prefix (MW Images) to the s3/apus backend - T412951
- 15:30 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet [reason: [end] testing puppet 5 changes]
- 15:28 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-ats (exit_code=0) rolling restart_daemons on P{cp7001*} and A:cp
- 15:27 sukhe@cumin1003: START - Cookbook sre.cdn.roll-restart-ats rolling restart_daemons on P{cp7001*} and A:cp
- 15:25 sukhe@cumin1003: END (PASS) - Cookbook sre.cdn.roll-restart-haproxy (exit_code=0) rolling restart of HAProxy on P{cp7001*} and A:cp - testing Puppet 5 cert removal ()
- 15:24 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1023.eqiad.wmnet with OS bullseye
- 15:24 sukhe@cumin1003: START - Cookbook sre.cdn.roll-restart-haproxy rolling restart of HAProxy on P{cp7001*} and A:cp - testing Puppet 5 cert removal ()
- 15:22 gengh@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:22 gengh@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:22 gengh@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:21 gengh@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:21 gengh@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:20 gengh@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:16 gengh@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
- 15:16 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet [reason: testing puppet 5 changes: moritzm]
- 15:15 gengh@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
- 15:15 gengh@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
- 15:15 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host aqs1023
- 15:15 eevans@cumin1003: START - Cookbook sre.hosts.move-vlan for host aqs1023
- 15:15 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host aqs1023.eqiad.wmnet with OS bullseye
- 15:14 gengh@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
- 15:14 eevans@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aqs1023.eqiad.wmnet with OS bullseye
- 15:14 gengh@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
- 15:13 gengh@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
- 15:12 moritzm: upload wmf-certificates 0_~20260209 to bullseye-wikimedia/main (drops the Puppet 5 CA from the cert bundle) T415255
- 15:08 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.15 refs T413806
- 15:08 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-etcd2002.codfw.wmnet with OS bookworm
- 15:07 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on sretest2003.codfw.wmnet with reason: test redfish
- 14:59 Emperor: restart codfw apus frontends T412951
- 14:54 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd2002.codfw.wmnet with reason: host reimage
- 14:48 dpogorzelski@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2002.codfw.wmnet with reason: host reimage
- 14:37 eevans@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host aqs1023
- 14:37 eevans@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aqs1023
- 14:36 eevans@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host aqs1023
- 14:36 eevans@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aqs1023.eqiad.wmnet 98.48.64.10.in-addr.arpa 8.9.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 14:36 eevans@cumin1003: START - Cookbook sre.dns.wipe-cache aqs1023.eqiad.wmnet 98.48.64.10.in-addr.arpa 8.9.0.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 14:36 eevans@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 14:36 eevans@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host aqs1023 - eevans@cumin1003"
- 14:36 eevans@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host aqs1023 - eevans@cumin1003"
- 14:34 ayounsi@dns1004: END - running authdns-update
- 14:32 ayounsi@dns1004: START - running authdns-update
- 14:32 eevans@cumin1003: START - Cookbook sre.dns.netbox
- 14:31 eevans@cumin1003: START - Cookbook sre.hosts.move-vlan for host aqs1023
- 14:31 eevans@cumin1003: START - Cookbook sre.hosts.reimage for host aqs1023.eqiad.wmnet with OS bullseye
- 14:31 dpogorzelski@cumin1003: START - Cookbook sre.hosts.reimage for host ml-etcd2002.codfw.wmnet with OS bookworm
- 14:29 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-etcd2003.codfw.wmnet with OS bookworm
- 14:19 jnuche@deploy2002: Finished scap sync-world: Backport for Add 'menus' option to skin.json (T416981) (duration: 10m 02s)
- 14:15 jnuche@deploy2002: jnuche: Continuing with sync
- 14:15 jnuche@deploy2002: jnuche: Backport for Add 'menus' option to skin.json (T416981) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:13 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd2003.codfw.wmnet with reason: host reimage
- 14:10 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp2043.codfw.wmnet with OS trixie
- 14:09 jnuche@deploy2002: Started scap sync-world: Backport for Add 'menus' option to skin.json (T416981)
- 14:08 jnuche@deploy2002: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.14,1.46.0-wmf.15,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/me
- 14:07 dpogorzelski@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2003.codfw.wmnet with reason: host reimage
- 14:04 btullis@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on an-worker1148.eqiad.wmnet with reason: Replacing RAID card
- 14:01 jnuche@deploy2002: Started scap sync-world: Backport for Add 'menus' option to skin.json (T416981)
- 13:53 dpogorzelski@cumin1003: START - Cookbook sre.hosts.reimage for host ml-etcd2003.codfw.wmnet with OS bookworm
- 13:52 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-etcd2001.codfw.wmnet with OS bookworm
- 13:52 slyngshede@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp2043.codfw.wmnet with reason: host reimage
- 13:46 slyngshede@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on cp2043.codfw.wmnet with reason: host reimage
- 13:43 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 gettingstarted-task-toolbar-show-intro
- 13:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2248: Upgrade mariadb
- 13:37 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2249: Upgrade mariadb
- 13:33 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-etcd2001.codfw.wmnet with reason: host reimage
- 13:32 slyngshede@cumin1003: START - Cookbook sre.hosts.reimage for host cp2043.codfw.wmnet with OS trixie
- 13:28 Amir1: re-enabling rate limit of non-standard thumbnail sizes on medium browser score (T414805 T402792)
- 13:28 dpogorzelski@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-etcd2001.codfw.wmnet with reason: host reimage
- 13:12 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Stop writing old for CheckUser user agent table migration everywhere (T361206) (duration: 14m 10s)
- 13:11 dpogorzelski@cumin1003: START - Cookbook sre.hosts.reimage for host ml-etcd2001.codfw.wmnet with OS bookworm
- 13:08 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 13:03 dreamyjazz@deploy2002: dreamyjazz: Backport for Stop writing old for CheckUser user agent table migration everywhere (T361206) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:58 dreamyjazz@deploy2002: Started scap sync-world: Backport for Stop writing old for CheckUser user agent table migration everywhere (T361206)
- 12:53 kharlan@deploy2002: Finished scap sync-world: Backport for Revert^2 "IPReputation: Switch to OpenSearch backend" (T416164) (duration: 11m 06s)
- 12:52 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2248: Upgrade mariadb
- 12:52 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2249: Upgrade mariadb
- 12:51 moritzm: uploaded nodejs 16.15.1+dfsg-1~wmf12u1 to component/node16 for bookworm-wikimedia T416117
- 12:49 marostegui: Install mariadb 10.11.16 on debian trixie on db2249 and db2248 T416561
- 12:49 kharlan@deploy2002: kharlan: Continuing with sync
- 12:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2248.codfw.wmnet with reason: Upgrade mariadb
- 12:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2249.codfw.wmnet with reason: Upgrade mariadb
- 12:47 kharlan@deploy2002: kharlan: Backport for Revert^2 "IPReputation: Switch to OpenSearch backend" (T416164) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:46 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2249: Upgrade mariadb
- 12:45 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2249: Upgrade mariadb
- 12:42 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db2248: Upgrade mariadb
- 12:42 kharlan@deploy2002: Started scap sync-world: Backport for Revert^2 "IPReputation: Switch to OpenSearch backend" (T416164)
- 12:42 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db2248: Upgrade mariadb
- 12:39 kharlan@deploy2002: sync-world failed: <CalledProcessError> Command 'sudo -u mwbuilder /srv/mwbuilder/release/make-container-image/build-images.py --http-proxy http://webproxy:8080 --https-proxy http://webproxy:8080 /srv/mediawiki-staging/scap/image-build --staging-dir /srv/mediawiki-staging --mediawiki-versions 1.46.0-wmf.14,1.46.0-wmf.15,next --multiversion-image-basename docker-registry.discovery.wmnet/restricted/m
- 12:31 kharlan@deploy2002: Started scap sync-world: Backport for Revert^2 "IPReputation: Switch to OpenSearch backend" (T416164)
- 12:31 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on sretest2010.codfw.wmnet with reason: test redfish
- 12:23 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.15 refs T413806
- 12:06 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
- 12:06 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
- 12:05 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
- 12:05 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
- 12:04 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
- 12:03 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
- 11:54 elukey@deploy2002: Finished scap sync-world: Test the new s3/apus docker registry backend - full reimage rebuild (duration: 38m 01s)
- 11:16 elukey@deploy2002: Started scap sync-world: Test the new s3/apus docker registry backend - full reimage rebuild
- 11:14 blake@deploy2002: Finished scap sync-world: (no justification provided) (duration: 03m 01s)
- 11:12 blake@deploy2002: Started scap sync-world: (no justification provided)
- 11:08 elukey@deploy2002: Finished scap sync-world: Test the new s3/apus docker registry backend (duration: 03m 08s)
- 11:06 elukey@deploy2002: Started scap sync-world: Test the new s3/apus docker registry backend
- 11:04 elukey: move the Docker Registry's /v2/restricted prefix (MW Images) to the s3/apus backend - T412951
- 10:58 ammarpad@deploy2002: mwscript-k8s job started: extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=foundationwiki --logwiki=metawiki 'Katya blinda' 'Renamed user 540d715cf480e5aab16c5eeb86d6eca2' # T417144
- 10:43 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging-etcd2002.codfw.wmnet with OS bookworm
- 10:20 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on sretest2010.codfw.wmnet with reason: test redfish
- 10:05 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
- 09:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 999 days, 0:00:00 on db2230.codfw.wmnet with reason: testbed used for experiments
- 09:55 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 999 days, 0:00:00 on db1176.eqiad.wmnet with reason: testbed used for experiments
- 09:50 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.15 refs T413806
- 09:41 XioNoX: cloudsw1-b1-codfw> request system reboot - T416443
- 09:39 jnuche@deploy2002: Finished scap sync-world: Backport for WikimediaApiPortal should declare menus it uses (T416981) (duration: 06m 55s)
- 09:35 jnuche@deploy2002: jnuche: Continuing with sync
- 09:35 jnuche@deploy2002: jnuche: Backport for WikimediaApiPortal should declare menus it uses (T416981) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:32 jnuche@deploy2002: Started scap sync-world: Backport for WikimediaApiPortal should declare menus it uses (T416981)
- 09:19 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 999 days, 0:00:00 on db-test[2001-2002].codfw.wmnet,db-test[1001-1003].eqiad.wmnet with reason: testbed used for experiments
- 09:16 fceratto@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on dborch1002.wikimedia.org with reason: in setup
- 09:11 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.15 refs T413806
- 08:57 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 08:54 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 08:52 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 08:52 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 08:48 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 08:48 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 08:45 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
- 08:24 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
- 08:18 dpogorzelski@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2002.codfw.wmnet with reason: host reimage
- 08:16 moritzm: installing wmf-certificates 0_~20260209 on bookworm hosts T415255
- 08:15 cjming@deploy2002: Finished scap sync-world: Backport for Add `mediawiki.product_metrics.contributors.experiments` to `wgTestKitchenExperimentStreamNames` (T417091) (duration: 09m 09s)
- 08:11 cjming@deploy2002: cjming: Continuing with sync
- 08:08 cjming@deploy2002: cjming: Backport for Add `mediawiki.product_metrics.contributors.experiments` to `wgTestKitchenExperimentStreamNames` (T417091) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:06 cjming@deploy2002: Started scap sync-world: Backport for Add `mediawiki.product_metrics.contributors.experiments` to `wgTestKitchenExperimentStreamNames` (T417091)
- 08:01 dpogorzelski@cumin1003: START - Cookbook sre.hosts.reimage for host ml-staging-etcd2002.codfw.wmnet with OS bookworm
- 07:31 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1026.eqiad.wmnet with OS trixie
- 07:18 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool pc1011: Repool pc1 after MariaDB upgrade
- 07:18 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
- 07:18 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
- 07:18 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool pc1011: Repool pc1 after MariaDB upgrade
- 07:15 marostegui: Upgrade pc1 to 10.11.16 T416561
- 07:14 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
- 07:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on pc2011.codfw.wmnet,pc1011.eqiad.wmnet with reason: Upgrade mariadb
- 07:12 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool pc1011: Depool for MariaDB upgrade
- 07:12 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.parsercache (exit_code=0)
- 07:12 marostegui@cumin1003: START - Cookbook sre.mysql.parsercache
- 07:12 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool pc1011: Depool for MariaDB upgrade
- 07:10 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1026.eqiad.wmnet with reason: host reimage
- 06:54 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host dbproxy1026.eqiad.wmnet with OS trixie
- 06:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1177 (T410589)', diff saved to https://phabricator.wikimedia.org/P88762 and previous config saved to /var/cache/conftool/dbconfig/20260211-063051-ladsgroup.json
- 06:30 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
- 06:30 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T410589)', diff saved to https://phabricator.wikimedia.org/P88761 and previous config saved to /var/cache/conftool/dbconfig/20260211-063027-ladsgroup.json
- 06:20 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P88760 and previous config saved to /var/cache/conftool/dbconfig/20260211-062018-ladsgroup.json
- 06:10 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P88759 and previous config saved to /var/cache/conftool/dbconfig/20260211-061010-ladsgroup.json
- 06:00 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T410589)', diff saved to https://phabricator.wikimedia.org/P88758 and previous config saved to /var/cache/conftool/dbconfig/20260211-060002-ladsgroup.json
- 04:10 ryankemper: `[kafka-jumbo]` T411568 `kafka-jumbo` reboots done, and everything looks healthy
- 03:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-jumbo-eqiad
- 02:04 ryankemper: `[kafka-jumbo]`T411568 Rebooting kafka-jumbo one host at a time (sec updates)
- 02:03 ryankemper@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-jumbo-eqiad
- 02:01 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 00m 25s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 00:33 brett: Import libvmod-re2 2.0.0-2~deb13+wmf2 into trixie-wikimedia - T401832
- 00:08 ryankemper@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-test-eqiad
2026-02-10
- 23:32 ryankemper@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-test-eqiad
- 23:04 ryankemper: `[opensearch-ipoid]` T416345 Restarted (~16 mins ago) `opensearch-ipoid-masters-1` to force it to schedule to a k8s node with 10G networking, hopefully this helps w/ latency
- 23:03 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Stop writing old for CheckUser user agent table migration on group1 (T361206) (duration: 09m 05s)
- 23:02 jasmine_: "homer on lsw*codfw* to remove bgp for wikikube-workers T409104"
- 22:59 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 22:57 dreamyjazz@deploy2002: dreamyjazz: Backport for Stop writing old for CheckUser user agent table migration on group1 (T361206) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:54 dreamyjazz@deploy2002: Started scap sync-world: Backport for Stop writing old for CheckUser user agent table migration on group1 (T361206)
- 22:40 kharlan@deploy2002: Finished scap sync-world: Backport for ConfirmEdit: Add error count threshold for apiUrl health checks (T416817) (duration: 12m 34s)
- 22:34 kharlan@deploy2002: kharlan: Continuing with sync
- 22:32 kharlan@deploy2002: kharlan: Backport for ConfirmEdit: Add error count threshold for apiUrl health checks (T416817) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:31 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release 20260210
- 22:28 kharlan@deploy2002: Started scap sync-world: Backport for ConfirmEdit: Add error count threshold for apiUrl health checks (T416817)
- 22:23 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release 20260210
- 22:21 dzahn@cumin2002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release 20260210
- 22:12 dzahn@cumin2002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release 20260210
- 21:59 kemayo@deploy2002: Finished scap sync-world: Backport for Edit check suggestions beta-feature in allowlist (T399611), Turn on Parsoid read views by default on labs (take 2) (T357054), Update title / desc of Special:LintTemplateErrors (T170874) (duration: 39m 46s)
- 21:46 kemayo@deploy2002: cscott, arlolra, kemayo: Continuing with sync
- 21:44 kemayo@deploy2002: cscott, arlolra, kemayo: Backport for Edit check suggestions beta-feature in allowlist (T399611), Turn on Parsoid read views by default on labs (take 2) (T357054), Update title / desc of Special:LintTemplateErrors (T170874) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:36 xcollazo@deploy2002: Finished deploy [analytics/refinery@f84b3cc] (thin): Regular analytics weekly train THIN [analytics/refinery@f84b3cc2] (duration: 01m 55s)
- 21:34 xcollazo@deploy2002: Started deploy [analytics/refinery@f84b3cc] (thin): Regular analytics weekly train THIN [analytics/refinery@f84b3cc2]
- 21:32 xcollazo@deploy2002: Finished deploy [analytics/refinery@f84b3cc]: Regular analytics weekly train [analytics/refinery@f84b3cc2] (duration: 04m 59s)
- 21:27 xcollazo@deploy2002: Started deploy [analytics/refinery@f84b3cc]: Regular analytics weekly train [analytics/refinery@f84b3cc2]
- 21:23 xcollazo@deploy2002: Finished deploy [analytics/refinery@f84b3cc] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f84b3cc2] (duration: 01m 57s)
- 21:21 xcollazo@deploy2002: Started deploy [analytics/refinery@f84b3cc] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@f84b3cc2]
- 21:19 kemayo@deploy2002: Started scap sync-world: Backport for Edit check suggestions beta-feature in allowlist (T399611), Turn on Parsoid read views by default on labs (take 2) (T357054), Update title / desc of Special:LintTemplateErrors (T170874)
- 21:13 kemayo@deploy2002: Finished scap sync-world: Backport for Edit check: turn off the addReference a/b test on enwiki (T367343 T416378), Edit check: enable suggestions beta-feature on enwiki (T399611) (duration: 09m 58s)
- 21:09 kemayo@deploy2002: kemayo: Continuing with sync
- 21:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
- 21:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 21:05 kemayo@deploy2002: kemayo: Backport for Edit check: turn off the addReference a/b test on enwiki (T367343 T416378), Edit check: enable suggestions beta-feature on enwiki (T399611) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:03 kemayo@deploy2002: Started scap sync-world: Backport for Edit check: turn off the addReference a/b test on enwiki (T367343 T416378), Edit check: enable suggestions beta-feature on enwiki (T399611)
- 19:58 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker2241.codfw.wmnet
- 19:58 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:58 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker2241.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 19:58 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker2241.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 19:56 ryankemper@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on an-worker[1132,1148].eqiad.wmnet with reason: T411919
- 19:54 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 19:48 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site magru [reason: no reason specified, ]
- 19:48 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site magru [reason: no reason specified, ]
- 19:48 jasmine@cumin2002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker2241.codfw.wmnet
- 19:48 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2236-2240].codfw.wmnet
- 19:48 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:48 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2236-2240].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 19:47 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2236-2240].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 19:47 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs7003*} and A:liberica
- 19:47 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs7003.magru.wmnet} and A:liberica
- 19:47 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P{lvs7003.magru.wmnet} and A:liberica
- 19:46 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs7003.magru.wmnet} and A:liberica
- 19:46 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P{lvs7003.magru.wmnet} and A:liberica
- 19:46 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs7003*} and A:liberica
- 19:45 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs7002*} and A:liberica
- 19:45 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs7002.magru.wmnet} and A:liberica
- 19:45 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P{lvs7002.magru.wmnet} and A:liberica
- 19:45 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs7002.magru.wmnet} and A:liberica
- 19:44 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P{lvs7002.magru.wmnet} and A:liberica
- 19:44 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs7002*} and A:liberica
- 19:44 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 19:41 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs7001*} and A:liberica
- 19:41 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) pooling P{lvs7001.magru.wmnet} and A:liberica
- 19:41 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin pooling P{lvs7001.magru.wmnet} and A:liberica
- 19:40 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling P{lvs7001.magru.wmnet} and A:liberica
- 19:40 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin depooling P{lvs7001.magru.wmnet} and A:liberica
- 19:40 sukhe@cumin1003: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs7001*} and A:liberica
- 19:32 jasmine@cumin2002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2236-2240].codfw.wmnet
- 19:32 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2231-2235].codfw.wmnet
- 19:31 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:31 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2231-2235].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 19:31 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2231-2235].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 19:27 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 19:13 jasmine@cumin2002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2231-2235].codfw.wmnet
- 19:13 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2226-2230].codfw.wmnet
- 19:13 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:13 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2226-2230].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 19:12 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2226-2230].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 19:12 sukhe@cumin1003: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs7001*} and A:liberica
- 19:12 sukhe@cumin1003: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs7001*} and A:liberica
- 19:08 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 19:00 catrope@deploy2002: Finished scap sync-world: Backport for Revert "WebAuthnAuthenticator: Replace deprecated ::createFromArray()", Revert "WebAuthnKey: Replace deprecated PublicKeyCredentialLoader" (duration: 10m 03s)
- 18:56 catrope@deploy2002: catrope: Continuing with sync
- 18:53 catrope@deploy2002: catrope: Backport for Revert "WebAuthnAuthenticator: Replace deprecated ::createFromArray()", Revert "WebAuthnKey: Replace deprecated PublicKeyCredentialLoader" synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 18:51 jasmine@cumin2002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2226-2230].codfw.wmnet
- 18:50 catrope@deploy2002: Started scap sync-world: Backport for Revert "WebAuthnAuthenticator: Replace deprecated ::createFromArray()", Revert "WebAuthnKey: Replace deprecated PublicKeyCredentialLoader"
- 18:50 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2221-2225].codfw.wmnet
- 18:50 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:50 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2221-2225].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 18:50 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2221-2225].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 18:45 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 18:37 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site magru [reason: no reason specified, ]
- 18:36 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool site magru [reason: no reason specified, ]
- 18:29 cdobbins@cumin2002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site magru [reason: no reason specified, ]
- 18:29 jasmine@cumin2002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2221-2225].codfw.wmnet
- 18:29 cdobbins@cumin2002: START - Cookbook sre.dns.admin DNS admin: pool site magru [reason: no reason specified, ]
- 18:28 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2216-2220].codfw.wmnet
- 18:28 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:28 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2216-2220].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 18:27 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2216-2220].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 18:23 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 18:21 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting A:liberica and A:magru and A:liberica
- 18:12 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site magru [reason: no reason specified, ]
- 18:12 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool site magru [reason: no reason specified, ]
- 18:11 jasmine@cumin2002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2216-2220].codfw.wmnet
- 18:08 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting A:liberica and A:magru and A:liberica
- 18:08 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2121-2123].codfw.wmnet
- 18:08 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:08 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2121-2123].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 18:07 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2121-2123].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 18:05 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:05 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:03 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:03 catrope@deploy2002: Finished scap sync-world: Backport for Revert "WebAuthnAuthenticator: Manually serialize PublicKeyCredentialCreationOptions/PublicKeyCredentialRequestOptions" (T417022) (duration: 16m 29s)
- 18:03 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:02 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:02 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 18:01 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 18:01 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 18:01 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:59 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:59 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:59 catrope@deploy2002: catrope: Continuing with sync
- 17:55 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:55 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:55 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.update-replication (exit_code=97)
- 17:54 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:53 jasmine@cumin2002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2121-2123].codfw.wmnet
- 17:51 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2116-2120].codfw.wmnet
- 17:51 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:51 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2116-2120].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 17:51 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2116-2120].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 17:49 catrope@deploy2002: catrope: Backport for Revert "WebAuthnAuthenticator: Manually serialize PublicKeyCredentialCreationOptions/PublicKeyCredentialRequestOptions" (T417022) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:47 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:47 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:47 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 17:47 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:47 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:47 catrope@deploy2002: Started scap sync-world: Backport for Revert "WebAuthnAuthenticator: Manually serialize PublicKeyCredentialCreationOptions/PublicKeyCredentialRequestOptions" (T417022)
- 17:47 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:47 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:46 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:46 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:46 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 17:46 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 17:40 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1172 (T410589)', diff saved to https://phabricator.wikimedia.org/P88754 and previous config saved to /var/cache/conftool/dbconfig/20260210-174011-ladsgroup.json
- 17:40 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
- 17:34 jasmine@cumin2002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2116-2120].codfw.wmnet
- 17:26 jasmine@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2116-2123,2216-2241].codfw.wmnet
- 17:25 jasmine@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2116-2123,2216-2241].codfw.wmnet
- 16:40 moritzm: uploaded dnsmasq 2.92-1~wmf12 to trixie-wikimedia/main T396864
- 16:21 cmooney@cumin1003: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 20940
- 16:09 moritzm: installing Django security updates
- 16:09 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site drmrs [reason: work done, T416441]
- 16:09 sukhe@cumin1003: START - Cookbook sre.dns.admin DNS admin: pool site drmrs [reason: work done, T416441]
- 16:07 dzahn@dns1004: END - running authdns-update
- 16:06 mutante: switching gerrit to behind CDN
- 16:05 dzahn@dns1004: START - running authdns-update
- 16:04 sukhe@dns1004: END - running authdns-update
- 16:03 sukhe@dns1004: START - running authdns-update
- 16:03 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: cluster=dnsbox,dc=drmrs [reason: drmrs maintenance]
- 16:01 sukhe@cumin1003: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 39 hosts
- 16:00 sukhe@cumin1003: START - Cookbook sre.hosts.remove-downtime for 39 hosts
- 15:58 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 20940
- 15:57 cmooney@cumin1003: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 20940
- 15:55 topranks: remove ACL entry permitting Cloud VPS private IP addresses direct access to gerrit.wikimedia.org T411895
- 15:53 cmooney@cumin1003: START - Cookbook sre.network.peering with action 'configure' for AS: 20940
- 15:46 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:46 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:44 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:44 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:44 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:44 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:41 XioNoX: mr1-drmrs> request system reboot - T416441
- 15:41 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mr1-drmrs,mr1-drmrs IPv6,mr1-drmrs.oob with reason: router upgrade
- 15:39 ayounsi@cumin1003: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on mr1-drmrs,mr1-drmrs IPv6,mr1-drmrs.mgmt with reason: router upgrade
- 15:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:37 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:37 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:37 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:36 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:36 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:35 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:35 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:35 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:35 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:34 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:34 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:33 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:33 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:33 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:33 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:33 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:33 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:32 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:32 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:30 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:30 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:30 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging-etcd2003.codfw.wmnet with OS bookworm
- 15:30 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:30 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:29 fceratto@cumin1003: END (PASS) - Cookbook sre.mysql.update-replication (exit_code=0)
- 15:29 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:14 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2003.codfw.wmnet with reason: host reimage
- 15:10 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix mysql has gone away error in hp - oblivian@cumin1003"
- 15:10 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix mysql has gone away error in hp - oblivian@cumin1003
- 15:10 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix mysql has gone away error in hp - oblivian@cumin1003
- 15:09 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix mysql has gone away error in hp - oblivian@cumin1003"
- 15:08 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.update-replication (exit_code=99)
- 15:08 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:08 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.update-replication (exit_code=97)
- 15:08 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:08 dpogorzelski@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2003.codfw.wmnet with reason: host reimage
- 15:07 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.update-replication (exit_code=97)
- 15:07 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 15:04 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudgw1004.eqiad.wmnet with OS trixie
- 15:02 XioNoX: cr2-drmrs> request vmhost reboot - T416441
- 14:55 fceratto@cumin1003: END (ERROR) - Cookbook sre.mysql.update-replication (exit_code=97)
- 14:55 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 14:53 dpogorzelski@cumin1003: START - Cookbook sre.hosts.reimage for host ml-staging-etcd2003.codfw.wmnet with OS bookworm
- 14:53 fceratto@cumin1003: END (FAIL) - Cookbook sre.mysql.update-replication (exit_code=99)
- 14:53 fceratto@cumin1003: START - Cookbook sre.mysql.update-replication
- 14:51 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-drmrs,cr2-drmrs IPv6,cr2-drmrs.mgmt with reason: router upgrade
- 14:47 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging-etcd2001.codfw.wmnet with OS bookworm
- 14:43 andrew@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudgw1004.eqiad.wmnet with reason: host reimage
- 14:41 XioNoX: asw1-b12-drmrs> request system reboot - T416441
- 14:40 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 20 hosts with reason: Switch upgrade
- 14:37 andrew@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudgw1004.eqiad.wmnet with reason: host reimage
- 14:24 XioNoX: cr1-drmrs> request vmhost reboot - T416441
- 14:20 moritzm: installing wmf-certificates 0_~20260209 on bookworm hosts T415255
- 14:20 andrew@cumin2002: START - Cookbook sre.hosts.reimage for host cloudgw1004.eqiad.wmnet with OS trixie
- 14:19 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: cluster=dnsbox,dc=drmrs [reason: drmrs maintenance]
- 14:05 ayounsi@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cr1-drmrs,cr1-drmrs IPv6,cr1-drmrs.mgmt with reason: router upgrade
- 14:03 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dborch1002.wikimedia.org with OS trixie
- 14:02 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site drmrs [reason: no reason specified, ]
- 14:02 ayounsi@cumin1003: START - Cookbook sre.dns.admin DNS admin: depool site drmrs [reason: no reason specified, ]
- 13:59 jforrester@deploy2002: Finished scap sync-world: Backport for EmailAuthHookHandler: Check if WikimediaEvents loaded before using WikimediaEventsCountryCodeLookup (T416983), More robust SkinTemplateNavigation hook handler (T416978) (duration: 07m 55s)
- 13:54 jforrester@deploy2002: jforrester: Continuing with sync
- 13:53 jforrester@deploy2002: jforrester: Backport for EmailAuthHookHandler: Check if WikimediaEvents loaded before using WikimediaEventsCountryCodeLookup (T416983), More robust SkinTemplateNavigation hook handler (T416978) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:51 jforrester@deploy2002: Started scap sync-world: Backport for EmailAuthHookHandler: Check if WikimediaEvents loaded before using WikimediaEventsCountryCodeLookup (T416983), More robust SkinTemplateNavigation hook handler (T416978)
- 13:50 moritzm: upload wmf-certificates 0_~20260209 to bookworm-wikimedia/main (drops the Puppet 5 CA from the cert bundle) T415255
- 13:48 fceratto@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
- 13:42 fceratto@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dborch1002.wikimedia.org with reason: host reimage
- 13:29 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host dborch1002.wikimedia.org with OS trixie
- 13:20 ladsgroup@deploy2002: Finished scap sync-world: Backport for Page: Bump the size of pdf thumbnail size to a standard size (T416620) (duration: 08m 12s)
- 13:18 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
- 13:16 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 13:14 ladsgroup@deploy2002: ladsgroup: Backport for Page: Bump the size of pdf thumbnail size to a standard size (T416620) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:13 dpogorzelski@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-etcd2001.codfw.wmnet with reason: host reimage
- 13:12 ladsgroup@deploy2002: Started scap sync-world: Backport for Page: Bump the size of pdf thumbnail size to a standard size (T416620)
- 12:57 dpogorzelski@cumin1003: START - Cookbook sre.hosts.reimage for host ml-staging-etcd2001.codfw.wmnet with OS bookworm
- 12:50 kharlan@deploy2002: Finished scap sync-world: Backport for Revert "IPReputation: Switch to OpenSearch backend" (T416164) (duration: 06m 40s)
- 12:46 kharlan@deploy2002: kharlan: Continuing with sync
- 12:46 kharlan@deploy2002: kharlan: Backport for Revert "IPReputation: Switch to OpenSearch backend" (T416164) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:44 kharlan@deploy2002: Started scap sync-world: Backport for Revert "IPReputation: Switch to OpenSearch backend" (T416164)
- 12:40 ladsgroup@deploy2002: Finished scap sync-world: Backport for LinksTable: Use replica database for fetching existing links (T416171), LinksTable: Use replica database for fetching existing links (T416171) (duration: 07m 48s)
- 12:36 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 12:35 ladsgroup@deploy2002: ladsgroup: Backport for LinksTable: Use replica database for fetching existing links (T416171), LinksTable: Use replica database for fetching existing links (T416171) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:32 ladsgroup@deploy2002: Started scap sync-world: Backport for LinksTable: Use replica database for fetching existing links (T416171), LinksTable: Use replica database for fetching existing links (T416171)
- 12:23 Amir1: mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 rememberpassword
- 12:23 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 12:21 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 12:18 kharlan@deploy2002: Finished scap sync-world: Backport for IPReputation: Bump request timeout limit (T416164) (duration: 06m 42s)
- 12:13 kharlan@deploy2002: kharlan: Continuing with sync
- 12:13 kharlan@deploy2002: kharlan: Backport for IPReputation: Bump request timeout limit (T416164) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 12:11 kharlan@deploy2002: Started scap sync-world: Backport for IPReputation: Bump request timeout limit (T416164)
- 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1005.wikimedia.org
- 11:58 kharlan@deploy2002: Finished scap sync-world: Backport for IPReputation: Switch to OpenSearch backend (T416164) (duration: 10m 08s)
- 11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1005.wikimedia.org
- 11:53 kharlan@deploy2002: kharlan: Continuing with sync
- 11:50 kharlan@deploy2002: kharlan: Backport for IPReputation: Switch to OpenSearch backend (T416164) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 11:48 kharlan@deploy2002: Started scap sync-world: Backport for IPReputation: Switch to OpenSearch backend (T416164)
- 11:40 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 11:38 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 11:38 moritzm: installing wmf-certificates 0_~20260209 on trixie hosts T415255
- 11:37 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 11:36 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 11:35 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 11:19 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 11:18 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 11:16 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 11:15 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 10:22 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 10:21 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host bast1004.wikimedia.org
- 10:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host bast1004.wikimedia.org
- 09:30 moritzm: upload wmf-certificates 0_~20260209 to trixie-wikimedia/main (drops the Puppet 5 CA from the cert bundle) T415255
- 09:20 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 09:19 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 09:15 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.15 refs T413806
- 09:10 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db2218: After schema change
- 09:07 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 09:06 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 09:05 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 09:05 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 08:25 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db2218: After schema change
- 08:06 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1025.eqiad.wmnet with OS trixie
- 07:55 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) pool db1169: Upgrade mariadb
- 07:44 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
- 07:37 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1025.eqiad.wmnet with reason: host reimage
- 07:18 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host dbproxy1025.eqiad.wmnet with OS trixie
- 07:09 marostegui@cumin1003: START - Cookbook sre.mysql.pool pool db1169: Upgrade mariadb
- 07:02 marostegui: Upgrade mariadb on db1169 trixie T416561
- 07:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on db1169.eqiad.wmnet with reason: Upgrade mariadb
- 07:01 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool db1169: Upgrade mariadb
- 07:01 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool db1169: Upgrade mariadb
- 06:43 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2218.codfw.wmnet with reason: Maintenance
- 06:41 marostegui@dns1006: END - running authdns-update
- 06:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2218 T416555', diff saved to https://phabricator.wikimedia.org/P88742 and previous config saved to /var/cache/conftool/dbconfig/20260210-064045-marostegui.json
- 06:40 marostegui@dns1006: START - running authdns-update
- 06:39 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2220 to s7 primary and set section read-write T416555', diff saved to https://phabricator.wikimedia.org/P88741 and previous config saved to /var/cache/conftool/dbconfig/20260210-063939-marostegui.json
- 06:39 marostegui@cumin1003: dbctl commit (dc=all): 'Set s7 codfw as read-only for maintenance - T416555', diff saved to https://phabricator.wikimedia.org/P88740 and previous config saved to /var/cache/conftool/dbconfig/20260210-063916-marostegui.json
- 06:35 marostegui: Starting s7 codfw failover from db2218 to db2220 - T416555
- 06:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T416555
- 06:34 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2220 with weight 0 T416555', diff saved to https://phabricator.wikimedia.org/P88739 and previous config saved to /var/cache/conftool/dbconfig/20260210-063435-marostegui.json
- 05:23 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 05:23 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T410589)', diff saved to https://phabricator.wikimedia.org/P88738 and previous config saved to /var/cache/conftool/dbconfig/20260210-052303-ladsgroup.json
- 05:12 ladsgroup@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P88737 and previous config saved to /var/cache/conftool/dbconfig/20260210-051255-ladsgroup.json
- 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.46.0-wmf.15 refs T413806
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 38s)
- 02:01 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:07 zabe@deploy2002: Finished scap sync-world: Backport for Start reading from file table on testwikis (T416548) (duration: 06m 19s)
- 01:03 zabe@deploy2002: zabe: Continuing with sync
- 01:02 zabe@deploy2002: zabe: Backport for Start reading from file table on testwikis (T416548) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 01:00 zabe@deploy2002: Started scap sync-world: Backport for Start reading from file table on testwikis (T416548)
- 01:00 zabe@deploy2002: Sync cancelled.
- 00:59 zabe@deploy2002: zabe: Backport for Start reading from file table on testwikis (T416548) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:57 zabe@deploy2002: Started scap sync-world: Backport for Start reading from file table on testwikis (T416548)
- 00:20 zabe@deploy2002: Finished scap sync-world: Backport for Add config variable for MultiTitle (T404461), Reenable MostCategories on frwiki (T413362) (duration: 11m 41s)
- 00:13 zabe@deploy2002: tbodt, zabe: Continuing with sync
- 00:12 zabe@deploy2002: tbodt, zabe: Backport for Add config variable for MultiTitle (T404461), Reenable MostCategories on frwiki (T413362) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:08 zabe@deploy2002: Started scap sync-world: Backport for Add config variable for MultiTitle (T404461), Reenable MostCategories on frwiki (T413362)
- 00:06 zabe@deploy2002: Finished scap sync-world: Backport for Revert^2 "EditCheck: add instrumentation for checks seen during edit session" (T413419 T412334), Add MultiTitle to extension list (T404461) (duration: 39m 37s)
2026-02-09
- 23:53 zabe@deploy2002: tbodt, zabe: Continuing with sync
- 23:51 zabe@deploy2002: tbodt, zabe: Backport for Revert^2 "EditCheck: add instrumentation for checks seen during edit session" (T413419 T412334), Add MultiTitle to extension list (T404461) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 23:26 zabe@deploy2002: Started scap sync-world: Backport for Revert^2 "EditCheck: add instrumentation for checks seen during edit session" (T413419 T412334), Add MultiTitle to extension list (T404461)
- 22:25 cscott@deploy2002: Finished scap sync-world: Backport for Disable magic links on nlwiki (T145604), Turn on Parsoid read views by default on labs (T357054), Enable site notices on Minerva (T416644) (duration: 11m 20s)
- 22:21 cscott@deploy2002: jdlrobson, cscott: Continuing with sync
- 22:16 cscott@deploy2002: jdlrobson, cscott: Backport for Disable magic links on nlwiki (T145604), Turn on Parsoid read views by default on labs (T357054), Enable site notices on Minerva (T416644) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 22:14 cscott@deploy2002: Started scap sync-world: Backport for Disable magic links on nlwiki (T145604), Turn on Parsoid read views by default on labs (T357054), Enable site notices on Minerva (T416644)
- 22:05 zabe@deploy2002: Finished scap sync-world: Backport for Add alias for arwikibooks namespace, Change wgSiteName and wgMetaNamespace for Arabic Wikibooks (ويكي الكتب => ويكي كتب). (T416779) (duration: 37m 29s)
- 21:52 zabe@deploy2002: gergesshamon, zabe: Continuing with sync
- 21:51 zabe@deploy2002: gergesshamon, zabe: Backport for Add alias for arwikibooks namespace, Change wgSiteName and wgMetaNamespace for Arabic Wikibooks (ويكي الكتب => ويكي كتب). (T416779) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:28 zabe@deploy2002: Started scap sync-world: Backport for Add alias for arwikibooks namespace, Change wgSiteName and wgMetaNamespace for Arabic Wikibooks (ويكي الكتب => ويكي كتب). (T416779)
- 21:23 zabe@deploy2002: sync-world aborted: T416779 (duration: 00m 43s)
- 21:22 zabe@deploy2002: Started scap sync-world: T416779
- 21:21 zabe@deploy2002: sync-world aborted: Backport for Add alias for arwikibooks namespace, Change wgSiteName and wgMetaNamespace for Arabic Wikibooks (ويكي الكتب => ويكي كتب). (T416779), EditCheck: add instrumentation for checks seen during edit session (T413419 T412334) (duration: 02m 38s)
- 21:18 zabe@deploy2002: Started scap sync-world: Backport for Add alias for arwikibooks namespace, Change wgSiteName and wgMetaNamespace for Arabic Wikibooks (ويكي الكتب => ويكي كتب). (T416779), EditCheck: add instrumentation for checks seen during edit session (T413419 T412334)
- 20:40 zabe@deploy2002: Finished scap sync-world: Backport for Use Hadoop for Mostcategories on testwiki (T413362) (duration: 06m 23s)
- 20:36 zabe@deploy2002: zabe: Continuing with sync
- 20:36 zabe@deploy2002: zabe: Backport for Use Hadoop for Mostcategories on testwiki (T413362) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:34 zabe@deploy2002: Started scap sync-world: Backport for Use Hadoop for Mostcategories on testwiki (T413362)
- 20:19 zabe@deploy2002: Finished scap sync-world: Backport for Configure Hadoop source for Mostcategories computations (T413362) (duration: 06m 51s)
- 20:15 zabe@deploy2002: zabe: Continuing with sync
- 20:14 zabe@deploy2002: zabe: Backport for Configure Hadoop source for Mostcategories computations (T413362) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:12 zabe@deploy2002: Started scap sync-world: Backport for Configure Hadoop source for Mostcategories computations (T413362)
- 19:53 fceratto@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dborch1002.wikimedia.org with OS trixie
- 19:42 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on zuul1002.eqiad.wmnet with reason: WIP
- 19:42 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on zuul2002.codfw.wmnet with reason: WIP
- 19:34 hashar: restarting Gerrit to fix broken replication to GitHub # T416912
- 19:02 jasmine_: “homer following T409103”
- 19:01 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker2101.codfw.wmnet
- 19:01 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:01 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker2101.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 19:00 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker2101.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 18:57 dzahn@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
- 18:56 dzahn@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
- 18:55 dzahn@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
- 18:54 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 18:54 dzahn@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
- 18:54 dzahn@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
- 18:53 dzahn@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
- 18:52 dzahn@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
- 18:49 jasmine@cumin2002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker2101.codfw.wmnet
- 18:49 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2096-2100].codfw.wmnet
- 18:49 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:49 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2096-2100].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 18:48 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2096-2100].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 18:45 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 18:42 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host dborch1002.wikimedia.org with OS trixie
- 18:32 jasmine@cumin2002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2096-2100].codfw.wmnet
- 18:32 dzahn@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
- 18:31 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2083-2084].codfw.wmnet
- 18:31 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:31 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2083-2084].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 18:31 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2083-2084].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 18:25 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 18:18 jasmine@cumin2002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2083-2084].codfw.wmnet
- 18:18 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2063,2079-2082].codfw.wmnet
- 18:18 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 18:18 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2063,2079-2082].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 18:18 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2063,2079-2082].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 18:14 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 18:01 jasmine@cumin2002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2063,2079-2082].codfw.wmnet
- 17:54 jasmine@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-worker[2052-2054].codfw.wmnet
- 17:54 jasmine@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:54 jasmine@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2052-2054].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 17:53 jasmine@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-worker[2052-2054].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jasmine@cumin2002"
- 17:50 jasmine@cumin2002: START - Cookbook sre.dns.netbox
- 17:42 jasmine@cumin2002: START - Cookbook sre.hosts.decommission for hosts wikikube-worker[2052-2054].codfw.wmnet
- 17:38 fceratto@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dborch1002.wikimedia.org with OS trixie
- 17:34 mutante: LDAP - added jacobthwaites to groups wmde and nda - T416358
- 17:32 jasmine@cumin2002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host wikikube-worker[2052-2054,2063,2079-2084,2096-2101].codfw.wmnet
- 17:31 jasmine@cumin2002: START - Cookbook sre.k8s.pool-depool-node depool for host wikikube-worker[2052-2054,2063,2079-2084,2096-2101].codfw.wmnet
- 17:23 sukhe@dns1004: END - running authdns-update
- 17:21 sukhe@dns1004: START - running authdns-update
- 17:19 jgreen@dns1004: END - running authdns-update
- 17:19 otto@deploy2002: Finished scap sync-world: Backport for component: mediawiki.page_html_content_change.dev0 (T360794) (duration: 07m 34s)
- 17:18 jgreen@dns1004: START - running authdns-update
- 17:15 otto@deploy2002: otto, javiermonton: Continuing with sync
- 17:13 otto@deploy2002: otto, javiermonton: Backport for component: mediawiki.page_html_content_change.dev0 (T360794) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:11 otto@deploy2002: Started scap sync-world: Backport for component: mediawiki.page_html_content_change.dev0 (T360794)
- 16:50 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 16:39 moritzm: installing intel-microcode bugfix updates from Trixie point release
- 16:27 ladsgroup@cumin1003: dbctl commit (dc=all): 'Depooling db1167 (T410589)', diff saved to https://phabricator.wikimedia.org/P88732 and previous config saved to /var/cache/conftool/dbconfig/20260209-162720-ladsgroup.json
- 16:27 ladsgroup@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
- 16:26 fceratto@cumin1003: START - Cookbook sre.hosts.reimage for host dborch1002.wikimedia.org with OS trixie
- 16:08 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 16:08 urbanecm@deploy2002: Finished scap sync-world: Backport for GrowthExperimentsUserImpactUpdater: Do not compute data on every edit (T416171) (duration: 10m 02s)
- 16:06 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 16:04 urbanecm@deploy2002: ladsgroup, urbanecm: Continuing with sync
- 16:01 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 16:00 urbanecm@deploy2002: ladsgroup, urbanecm: Backport for GrowthExperimentsUserImpactUpdater: Do not compute data on every edit (T416171) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 16:00 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:58 urbanecm@deploy2002: Started scap sync-world: Backport for GrowthExperimentsUserImpactUpdater: Do not compute data on every edit (T416171)
- 15:57 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:57 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:56 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns4004.wikimedia.org [reason: [end] bird2 upgrade]
- 15:55 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:54 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
- 15:54 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:54 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
- 15:54 trueg@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 15:54 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
- 15:54 trueg@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 15:54 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:53 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:53 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
- 15:53 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
- 15:53 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
- 15:51 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns4004.wikimedia.org [reason: bird2 upgrade]
- 15:51 ladsgroup@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:49 ladsgroup@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
- 15:47 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:46 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 15:46 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:45 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 15:45 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
- 15:44 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:44 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
- 15:44 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
- 15:44 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
- 15:44 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:42 fceratto@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=2) for new host dborch1002.wikimedia.org
- 15:42 fceratto@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dborch1002.wikimedia.org - fceratto@cumin1003"
- 15:42 fceratto@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM dborch1002.wikimedia.org - fceratto@cumin1003"
- 15:42 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:42 fceratto@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) dborch1002.wikimedia.org on all recursors
- 15:41 fceratto@cumin1003: START - Cookbook sre.dns.wipe-cache dborch1002.wikimedia.org on all recursors
- 15:41 fceratto@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:41 fceratto@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - fceratto@cumin1003"
- 15:41 fceratto@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM dborch1002.wikimedia.org - fceratto@cumin1003"
- 15:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wikidata: apply
- 15:38 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wikidata: apply
- 15:38 fceratto@cumin1003: START - Cookbook sre.dns.netbox
- 15:38 fceratto@cumin1003: START - Cookbook sre.ganeti.makevm for new host dborch1002.wikimedia.org
- 15:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 15:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 15:36 fceratto@cumin1003: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host dborch1002.eqiad.wmnet
- 15:36 fceratto@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 15:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 15:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 15:33 fceratto@cumin1003: START - Cookbook sre.dns.netbox
- 15:32 fceratto@cumin1003: START - Cookbook sre.ganeti.makevm for new host dborch1002.eqiad.wmnet
- 15:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-sre: apply
- 15:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-sre: apply
- 15:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
- 15:28 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
- 15:27 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
- 15:26 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
- 15:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 15:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-platform-eng: apply
- 15:23 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 15:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
- 15:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
- 15:17 ladsgroup@deploy2002: Finished scap sync-world: Backport for MediaViewer: Adjust bucket sizes with the new thumb standard sizes (T412971) (duration: 07m 54s)
- 15:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
- 15:13 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 15:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
- 15:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:11 ladsgroup@deploy2002: ladsgroup: Backport for MediaViewer: Adjust bucket sizes with the new thumb standard sizes (T412971) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
- 15:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 15:11 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
- 15:11 ladsgroup@deploy2002: Started scap sync-world: Backport for MediaViewer: Adjust bucket sizes with the new thumb standard sizes (T412971)
- 15:02 Lucas_WMDE: UTC afternoon backport+config window done
- 15:02 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert^2 "EventStreamConfig: Bump product_metrics.web_base* streams to large size" (T416719), Test Kitchen renaming: Updated references to old names (T415843) (duration: 08m 16s)
- 14:58 lucaswerkmeister-wmde@deploy2002: jforrester, lucaswerkmeister-wmde, sfaci: Continuing with sync
- 14:56 lucaswerkmeister-wmde@deploy2002: jforrester, lucaswerkmeister-wmde, sfaci: Backport for Revert^2 "EventStreamConfig: Bump product_metrics.web_base* streams to large size" (T416719), Test Kitchen renaming: Updated references to old names (T415843) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:55 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 14:54 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert^2 "EventStreamConfig: Bump product_metrics.web_base* streams to large size" (T416719), Test Kitchen renaming: Updated references to old names (T415843)
- 14:52 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Enable PageImages for hewikisource (T362851) (duration: 07m 41s)
- 14:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, jhsoby: Continuing with sync
- 14:46 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, jhsoby: Backport for Enable PageImages for hewikisource (T362851) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:44 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable PageImages for hewikisource (T362851)
- 14:43 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Rework InterwikiSortOrders.php (duration: 08m 24s)
- 14:39 lucaswerkmeister-wmde@deploy2002: jhsoby, lucaswerkmeister-wmde: Continuing with sync
- 14:36 lucaswerkmeister-wmde@deploy2002: jhsoby, lucaswerkmeister-wmde: Backport for Rework InterwikiSortOrders.php synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:34 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Rework InterwikiSortOrders.php
- 14:29 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Stop writing old for CheckUser user agent table migration on group0 (T361206) (duration: 10m 03s)
- 14:25 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
- 14:21 dreamyjazz@deploy2002: dreamyjazz: Backport for Stop writing old for CheckUser user agent table migration on group0 (T361206) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:19 dreamyjazz@deploy2002: Started scap sync-world: Backport for Stop writing old for CheckUser user agent table migration on group0 (T361206)
- 13:56 Amir1: ladsgroup@deploy2002:~$ mwscript-k8s --dblist=all -- purgeUserOptions.php --login-age 5 visualeditor-autodisable
- 13:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest2010.codfw.wmnet
- 13:47 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.localbackup (exit_code=0) Prepare local backup on: gerrit1003.wikimedia.org
- 13:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest2010.codfw.wmnet
- 13:43 arnaudb@cumin1003: START - Cookbook sre.gerrit.localbackup Prepare local backup on: gerrit1003.wikimedia.org
- 13:42 arnaudb@cumin1003: END (PASS) - Cookbook sre.gerrit.localbackup (exit_code=0) Prepare local backup on: gerrit2003.wikimedia.org
- 13:34 arnaudb@cumin1003: START - Cookbook sre.gerrit.localbackup Prepare local backup on: gerrit2003.wikimedia.org
- 13:28 daniel@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/redioscope: apply
- 13:27 daniel@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/redioscope: apply
- 13:24 daniel@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/redioscope: apply
- 13:23 daniel@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/redioscope: apply
- 13:22 daniel@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/redioscope: apply
- 13:21 daniel@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/redioscope: apply
- 13:01 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 13:00 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 12:53 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 12:52 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 12:48 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 12:46 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 12:27 moritzm: upgrade centrallog1002 to Bird 2.18 T413740
- 12:18 moritzm: upgrade centrallog2002 to Bird 2.18 T413740
- 12:05 vgutierrez: rolling out host header validation in haproxy on magru, revert https://gerrit.wikimedia.org/r/c/operations/puppet/+/1237876 if needed
- 11:51 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host es1033.eqiad.wmnet
- 11:46 fceratto@deploy2002: helmfile [aux-k8s-eqiad] 'sync' command on namespace 'zarcillo' for release 'main' .
- 11:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host es1033.eqiad.wmnet
- 11:29 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestagemaster2003.codfw.wmnet
- 11:29 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host kubestagemaster2003.codfw.wmnet
- 11:28 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestagemaster2003.codfw.wmnet
- 11:23 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host kubestagemaster2003.codfw.wmnet
- 11:23 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) pool for host kubestage2001.codfw.wmnet
- 11:23 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node pool for host kubestage2001.codfw.wmnet
- 11:16 jayme@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage2001.codfw.wmnet
- 11:13 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestage2001.codfw.wmnet
- 11:10 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host kubestage2001.codfw.wmnet
- 11:10 jayme@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) depool for host kubestagemaster2003.codfw.wmnet
- 11:10 jayme@cumin1003: START - Cookbook sre.k8s.pool-depool-node depool for host kubestagemaster2003.codfw.wmnet
- 11:08 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet
- 11:08 jayme@cumin1003: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubestage2001.codfw.wmnet
- 11:07 jayme@cumin1003: START - Cookbook sre.hosts.reboot-single for host kubestage2001.codfw.wmnet
- 11:07 ayounsi@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm
- 10:49 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) depool es1033: Will be depooled
- 10:49 marostegui@cumin1003: START - Cookbook sre.mysql.depool depool es1033: Will be depooled
- 10:33 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm
- 10:32 ayounsi@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm
- 10:17 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm
- 10:17 ayounsi@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm
- 10:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host aux-k8s-worker1007
- 10:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aux-k8s-worker1007
- 10:00 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host aux-k8s-worker1007
- 10:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1007.eqiad.wmnet 131.48.64.10.in-addr.arpa 1.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 10:00 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1007.eqiad.wmnet 131.48.64.10.in-addr.arpa 1.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 10:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 10:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host aux-k8s-worker1007 - ayounsi@cumin1003"
- 10:00 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host aux-k8s-worker1007 - ayounsi@cumin1003"
- 09:55 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 09:55 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host aux-k8s-worker1007
- 09:55 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1007.eqiad.wmnet with OS bookworm
- 09:49 phuedx: End of UTC morning backport window
- 09:48 phuedx@deploy2002: Finished scap sync-world: Backport for metrics(ReviseTone): Use Experiment::send to send metrics (T416612), metrics(ReviseTone): send consistent experiment exposure event (T416199) (duration: 27m 34s)
- 09:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
- 09:42 phuedx@deploy2002: phuedx: Continuing with sync
- 09:41 jayme: kubectl delete node wikikube-worker2019.codfw.wmnet - T409102
- 09:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
- 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
- 09:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
- 09:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
- 09:29 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm
- 09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
- 09:25 phuedx@deploy2002: phuedx: Backport for metrics(ReviseTone): Use Experiment::send to send metrics (T416612), metrics(ReviseTone): send consistent experiment exposure event (T416199) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:21 phuedx@deploy2002: Started scap sync-world: Backport for metrics(ReviseTone): Use Experiment::send to send metrics (T416612), metrics(ReviseTone): send consistent experiment exposure event (T416199)
- 09:13 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on aux-k8s-worker1006.eqiad.wmnet with reason: host reimage
- 09:10 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on aux-k8s-worker1006.eqiad.wmnet with reason: host reimage
- 09:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host aux-k8s-worker1006
- 09:00 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host aux-k8s-worker1006
- 08:59 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host aux-k8s-worker1006
- 08:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-worker1006.eqiad.wmnet 132.48.64.10.in-addr.arpa 2.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 08:59 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache aux-k8s-worker1006.eqiad.wmnet 132.48.64.10.in-addr.arpa 2.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 08:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 08:56 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 08:55 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host aux-k8s-worker1006
- 08:55 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm
- 08:44 jforrester@deploy2002: Finished scap sync-world: Backport for [wikifunctions] Grant sysops permission to edit function of attached implementation and tester (T399934) (duration: 37m 15s)
- 08:39 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2203: After schema change
- 08:31 jforrester@deploy2002: daphnesmit, jforrester: Continuing with sync
- 08:30 jforrester@deploy2002: daphnesmit, jforrester: Backport for [wikifunctions] Grant sysops permission to edit function of attached implementation and tester (T399934) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:25 brouberol@cumin1003: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM an-launcher1003.eqiad.wmnet
- 08:21 brouberol@cumin1003: START - Cookbook sre.ganeti.reboot-vm for VM an-launcher1003.eqiad.wmnet
- 08:19 brouberol@cumin1003: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-launcher1003.eqiad.wmnet
- 08:15 brouberol@cumin1003: START - Cookbook sre.hosts.reboot-single for host an-launcher1003.eqiad.wmnet
- 08:06 jforrester@deploy2002: Started scap sync-world: Backport for [wikifunctions] Grant sysops permission to edit function of attached implementation and tester (T399934)
- 07:54 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db2203: After schema change
- 07:40 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Nettrom out of all services on: 2497 hosts
- 07:26 ayounsi@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm
- 07:26 ayounsi@cumin1003: END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=99) for host aux-k8s-worker1006
- 07:25 ayounsi@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 06:23 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2203.codfw.wmnet with reason: Maintenance
- 06:22 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2203.codfw.wmnet with reason: Schema change
- 06:19 marostegui@dns1006: END - running authdns-update
- 06:19 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2203 T416554', diff saved to https://phabricator.wikimedia.org/P88725 and previous config saved to /var/cache/conftool/dbconfig/20260209-061904-marostegui.json
- 06:18 marostegui@dns1006: START - running authdns-update
- 06:17 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2212 to s1 primary and set section read-write T416554', diff saved to https://phabricator.wikimedia.org/P88724 and previous config saved to /var/cache/conftool/dbconfig/20260209-061756-marostegui.json
- 06:17 marostegui@cumin1003: dbctl commit (dc=all): 'Set s1 codfw as read-only for maintenance - T416554', diff saved to https://phabricator.wikimedia.org/P88723 and previous config saved to /var/cache/conftool/dbconfig/20260209-061732-marostegui.json
- 06:13 marostegui: Starting s1 codfw failover from db2203 to db2212 - T416554
- 06:12 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2212 with weight 0 T416554', diff saved to https://phabricator.wikimedia.org/P88722 and previous config saved to /var/cache/conftool/dbconfig/20260209-061218-marostegui.json
- 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s1 T416554
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 52s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-08
- 02:01 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 01m 01s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-07
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 52s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-06
- 18:09 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on zuul2001.codfw.wmnet with reason: WIP
- 18:08 dzahn@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on zuul1001.eqiad.wmnet with reason: WIP
- 17:28 cdobbins@cumin2002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) rebooting P{lvs7003*} and A:liberica
- 17:25 cdobbins@cumin2002: START - Cookbook sre.loadbalancer.admin rebooting P{lvs7003*} and A:liberica
- 16:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 16:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 16:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 16:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 14:57 hashar@deploy2002: Finished scap sync-world: Backport for TypeError: Unsupported operand types: array + null (T416619) (duration: 11m 23s)
- 14:53 hashar@deploy2002: hashar: Continuing with sync
- 14:50 hashar@deploy2002: hashar: Backport for TypeError: Unsupported operand types: array + null (T416619) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:46 hashar@deploy2002: Started scap sync-world: Backport for TypeError: Unsupported operand types: array + null (T416619)
- 14:42 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:42 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:39 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:39 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:39 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-test: apply
- 14:38 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-test: apply
- 13:19 trueg@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 13:19 trueg@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 13:13 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 13:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 12:58 trueg@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 12:58 trueg@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 12:52 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-eqord
- 12:52 cmooney@cumin1003: START - Cookbook sre.network.tls for network device cr2-eqord
- 12:50 trueg@deploy2002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
- 12:49 trueg@deploy2002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
- 12:47 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-e15b-eqiad
- 12:47 cmooney@cumin1003: START - Cookbook sre.network.tls for network device fasw2-e15b-eqiad
- 12:47 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-e15a-eqiad
- 12:47 cmooney@cumin1003: START - Cookbook sre.network.tls for network device fasw2-e15a-eqiad
- 12:47 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-e16b-eqiad
- 12:47 cmooney@cumin1003: START - Cookbook sre.network.tls for network device fasw2-e16b-eqiad
- 12:47 cmooney@cumin1003: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-e16a-eqiad
- 12:46 cmooney@cumin1003: START - Cookbook sre.network.tls for network device fasw2-e16a-eqiad
- 12:04 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
- 11:11 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:11 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update entries for private1-d8-eqiad gateway IPs - cmooney@cumin1003"
- 11:11 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update entries for private1-d8-eqiad gateway IPs - cmooney@cumin1003"
- 11:05 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 10:56 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab
- 10:27 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 10:26 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host aux-k8s-worker1006
- 10:26 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host aux-k8s-worker1006.eqiad.wmnet with OS bookworm
- 10:05 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1002.eqiad.wmnet with OS trixie
- 09:47 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
- 09:42 elukey@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
- 09:25 elukey@cumin1003: START - Cookbook sre.hosts.reimage for host pki1002.eqiad.wmnet with OS trixie
- 06:50 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 04:19 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 04:13 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 00s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 00:29 rzl@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/sophroid: apply
- 00:29 rzl@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/sophroid: apply
- 00:28 rzl@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aux-k8s-services/sophroid: apply
- 00:28 rzl@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/aux-k8s-services/sophroid: apply
2026-02-05
- 23:50 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 23:50 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 23:50 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 23:28 maryum: Deployed security fix for T410429
- 22:59 maryum: Deployed security fix for T416502
- 22:40 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 22:40 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 22:38 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 22:38 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 21:55 kemayo@deploy2002: Finished scap sync-world: Backport for EditCheck: Adjust copy of experimental checks, TextMatchEditCheck: Place 'dismiss' action last, TextMatch: allow links in descriptions (T416511) (duration: 08m 19s)
- 21:51 kemayo@deploy2002: kemayo: Continuing with sync
- 21:48 kemayo@deploy2002: kemayo: Backport for EditCheck: Adjust copy of experimental checks, TextMatchEditCheck: Place 'dismiss' action last, TextMatch: allow links in descriptions (T416511) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:46 kemayo@deploy2002: Started scap sync-world: Backport for EditCheck: Adjust copy of experimental checks, TextMatchEditCheck: Place 'dismiss' action last, TextMatch: allow links in descriptions (T416511)
- 21:31 jdrewniak@deploy2002: Finished scap sync-world: Backport for Enable Extension:WP25EasterEggs on testwiki. (duration: 07m 45s)
- 21:27 jdrewniak@deploy2002: jdrewniak: Continuing with sync
- 21:25 jdrewniak@deploy2002: jdrewniak: Backport for Enable Extension:WP25EasterEggs on testwiki. synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:24 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 21:23 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 21:23 jdrewniak@deploy2002: Started scap sync-world: Backport for Enable Extension:WP25EasterEggs on testwiki.
- 21:22 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 21:22 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 21:22 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 21:21 jdrewniak@deploy2002: Finished scap sync-world: Backport for Renaming `MetricsPlatform` => `TestKitchen` (T414435), readingListAB.js: Updated to use mw.testKitchen (T414435) (duration: 08m 16s)
- 21:17 jdrewniak@deploy2002: sfaci, jdrewniak: Continuing with sync
- 21:15 jdrewniak@deploy2002: sfaci, jdrewniak: Backport for Renaming `MetricsPlatform` => `TestKitchen` (T414435), readingListAB.js: Updated to use mw.testKitchen (T414435) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:13 jdrewniak@deploy2002: Started scap sync-world: Backport for Renaming `MetricsPlatform` => `TestKitchen` (T414435), readingListAB.js: Updated to use mw.testKitchen (T414435)
- 20:36 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 20:36 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 20:35 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 20:28 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 20:26 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 20:19 ladsgroup@deploy2002: Finished scap sync-world: Backport for Stop thumbnail pre-gen jobs altogether (T408062) (duration: 06m 29s)
- 20:15 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 20:14 ladsgroup@deploy2002: ladsgroup: Backport for Stop thumbnail pre-gen jobs altogether (T408062) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 20:12 ladsgroup@deploy2002: Started scap sync-world: Backport for Stop thumbnail pre-gen jobs altogether (T408062)
- 20:02 phuedx@deploy2002: Finished scap sync-world: Backport for Fix instrument to not send when not in sample (duration: 09m 20s)
- 19:58 phuedx@deploy2002: phuedx, milimetric: Continuing with sync
- 19:55 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 19:54 phuedx@deploy2002: phuedx, milimetric: Backport for Fix instrument to not send when not in sample synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:53 phuedx@deploy2002: Started scap sync-world: Backport for Fix instrument to not send when not in sample
- 19:45 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 19:44 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 19:40 kamila@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 19:39 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:39 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:38 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:38 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:36 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "rename fmsw1-c1-eqiad to fmsw1-e15-eqiad - cmooney@cumin1003 - T403035"
- 19:36 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "rename fmsw1-c1-eqiad to fmsw1-e15-eqiad - cmooney@cumin1003 - T403035"
- 19:36 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:35 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:34 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "rename fmsw1-c1-eqiad to fmsw1-e15-eqiad - cmooney@cumin1003 - T403035"
- 19:34 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "rename fmsw1-c1-eqiad to fmsw1-e15-eqiad - cmooney@cumin1003 - T403035"
- 19:28 brennen@deploy2002: Finished scap sync-world: Backport for Collect data four ways to find discrepancies (T416472) (duration: 10m 03s)
- 19:24 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on fasw2-e15a-eqiad,fasw2-e15b-eqiad with reason: fundraising migration eqiad
- 19:24 brennen@deploy2002: milimetric, brennen: Continuing with sync
- 19:23 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:23 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:20 brennen@deploy2002: milimetric, brennen: Backport for Collect data four ways to find discrepancies (T416472) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 19:18 brennen@deploy2002: Started scap sync-world: Backport for Collect data four ways to find discrepancies (T416472)
- 19:13 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:13 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-ipoid: apply
- 19:11 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 19:11 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 19:09 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:09 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change fasw2-c1 to fasw2-e15 to match new location - cmooney@cumin1003"
- 19:09 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change fasw2-c1 to fasw2-e15 to match new location - cmooney@cumin1003"
- 19:04 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 17:35 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pki1002.eqiad.wmnet with OS bullseye
- 17:18 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
- 17:15 ladsgroup@deploy2002: Finished scap sync-world: Backport for Stop relying on ThumbRenderMap and use a standard size instead (T415282), Stop relying on ThumbRenderMap and use a standard size instead (T415282) (duration: 14m 04s)
- 17:14 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on pki1002.eqiad.wmnet with reason: host reimage
- 17:11 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 17:03 ladsgroup@deploy2002: ladsgroup: Backport for Stop relying on ThumbRenderMap and use a standard size instead (T415282), Stop relying on ThumbRenderMap and use a standard size instead (T415282) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 17:01 ladsgroup@deploy2002: Started scap sync-world: Backport for Stop relying on ThumbRenderMap and use a standard size instead (T415282), Stop relying on ThumbRenderMap and use a standard size instead (T415282)
- 16:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host pki1002
- 16:59 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host pki1002
- 16:58 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host pki1002
- 16:58 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki1002.eqiad.wmnet 44.32.64.10.in-addr.arpa 4.4.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 16:58 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache pki1002.eqiad.wmnet 44.32.64.10.in-addr.arpa 4.4.0.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 16:58 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 16:58 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host pki1002 - ayounsi@cumin1003"
- 16:58 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host pki1002 - ayounsi@cumin1003"
- 16:47 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 16:47 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host pki1002
- 16:46 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host pki1002.eqiad.wmnet with OS bullseye
- 16:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 16:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 16:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 16:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 16:42 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 16:42 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 16:38 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1005.eqiad.wmnet with OS bookworm
- 16:37 topranks: deactivate BGP session from cr1-eqiad to pfw1a-eqiad fundraising migration T403035
- 16:32 akosiaris: manually sudo sysctl net.ipv4.conf.all.rp_filter=0 on tcp-proxy6001
- 16:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 16:23 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 16:23 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 16:23 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 16:19 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
- 16:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthboo-next: apply
- 16:17 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore1*.eqiad.wmnet: Applying upgrade to Java 11.0.30 — T416492 - eevans@cumin1003
- 16:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook-next: apply
- 16:14 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
- 16:01 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 16:01 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:59 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore1*.eqiad.wmnet: Applying upgrade to Java 11.0.30 — T416492 - eevans@cumin1003
- 15:55 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 15:47 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2*.codfw.wmnet: Applying upgrade to Java 11.0.30 — T416492 - eevans@cumin1003
- 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts puppetmaster2001.codfw.wmnet
- 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:40 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 15:36 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: puppetmaster2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
- 15:33 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 15:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
- 15:29 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2*.codfw.wmnet: Applying upgrade to Java 11.0.30 — T416492 - eevans@cumin1003
- 15:29 javiermonton@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-page-html-content-change-enrich-next: apply
- 15:29 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:29 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:28 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:28 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts puppetmaster2001.codfw.wmnet
- 15:25 topranks: deactivate BGP session from cr2-eqiad to pfw1b-eqiad fundraising migration T403035
- 15:21 cmooney@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on fasw2-c1a-eqiad,fasw2-c1b-eqiad,pfw1-eqiad with reason: fundraising migration eqiad
- 15:19 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search-test: apply
- 15:13 bking@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 15:12 bking@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 15:06 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 15:05 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 15:04 bking@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 15:03 bking@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 14:55 cmooney@cumin1003: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-e16b-eqiad.mgmt.eqiad.wmnet
- 14:52 cmooney@cumin1003: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device fasw2-e16a-eqiad.mgmt.eqiad.wmnet
- 14:44 bking@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) opensearch-semantic-search.discovery.wmnet on all recursors
- 14:44 bking@cumin2002: START - Cookbook sre.dns.wipe-cache opensearch-semantic-search.discovery.wmnet on all recursors
- 14:42 bking@dns1004: END - running authdns-update
- 14:41 bking@dns1004: START - running authdns-update
- 14:21 Lucas_WMDE: UTC afternoon backport+config window done
- 14:16 phuedx@deploy2002: Finished scap sync-world: Backport for ext.wikimediaEvents: Add code for synth-aaa-test-mw-js experiment code (duration: 11m 14s)
- 14:12 phuedx@deploy2002: phuedx: Continuing with sync
- 14:12 brouberol@dns1004: END - running authdns-update
- 14:11 brouberol@dns1004: START - running authdns-update
- 14:07 phuedx@deploy2002: phuedx: Backport for ext.wikimediaEvents: Add code for synth-aaa-test-mw-js experiment code synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:05 phuedx@deploy2002: Started scap sync-world: Backport for ext.wikimediaEvents: Add code for synth-aaa-test-mw-js experiment code
- 13:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-sre: apply
- 13:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-sre: apply
- 13:49 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:46 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 13:46 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:43 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 13:43 cmooney@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 13:43 cmooney@cumin1003: START - Cookbook sre.network.provision for device fasw2-e16b-eqiad.mgmt.eqiad.wmnet
- 13:42 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) fasw2-e16b-eqiad.mgmt.eqiad.wmnet on all recursors
- 13:42 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache fasw2-e16b-eqiad.mgmt.eqiad.wmnet on all recursors
- 13:42 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) fasw2-e16a-eqiad.mgmt.eqiad.wmnet on all recursors
- 13:42 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache fasw2-e16a-eqiad.mgmt.eqiad.wmnet on all recursors
- 13:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2204 gradually with 4 steps - After schema change
- 13:41 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 13:41 cmooney@cumin1003: START - Cookbook sre.network.provision for device fasw2-e16a-eqiad.mgmt.eqiad.wmnet
- 13:40 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-sre: apply
- 13:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-sre: apply
- 13:02 taavi@dns1004: END - running authdns-update
- 13:00 taavi@dns1004: START - running authdns-update
- 12:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2204 gradually with 4 steps - After schema change
- 12:30 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 12:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 12:27 jmm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
- 12:26 jmm@deploy2002: helmfile [eqiad] START helmfile.d/services/proton: apply
- 12:24 jmm@deploy2002: helmfile [codfw] DONE helmfile.d/services/proton: apply
- 12:23 jmm@deploy2002: helmfile [codfw] START helmfile.d/services/proton: apply
- 12:18 jmm@deploy2002: helmfile [staging] DONE helmfile.d/services/proton: apply
- 12:17 jmm@deploy2002: helmfile [staging] START helmfile.d/services/proton: apply
- 12:01 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1005.eqiad.wmnet with OS bookworm
- 11:55 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen: apply
- 11:55 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen: apply
- 11:46 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1184: After schema change
- 11:43 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
- 11:39 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1005.eqiad.wmnet with reason: host reimage
- 11:34 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 11:33 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 11:21 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host sretest1005
- 11:21 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1005
- 11:20 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1005
- 11:20 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1005.eqiad.wmnet 130.32.64.10.in-addr.arpa 0.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:20 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache sretest1005.eqiad.wmnet 130.32.64.10.in-addr.arpa 0.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:20 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:20 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest1005 - ayounsi@cumin1003"
- 11:20 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest1005 - ayounsi@cumin1003"
- 11:16 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 11:15 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host sretest1005
- 11:15 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 11:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
- 11:11 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 11:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
- 11:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1002.eqiad.wmnet with OS bookworm
- 11:06 ayounsi@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest1005.eqiad.wmnet with OS bookworm
- 11:06 ayounsi@cumin1003: END (FAIL) - Cookbook sre.hosts.move-vlan (exit_code=99) for host sretest1005
- 11:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1005.eqiad.wmnet 3.141.64.10.in-addr.arpa 3.0.0.0.1.4.1.0.4.6.0.0.0.1.0.0.3.1.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:06 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache sretest1005.eqiad.wmnet 3.141.64.10.in-addr.arpa 3.0.0.0.1.4.1.0.4.6.0.0.0.1.0.0.3.1.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:06 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rollback records for host sretest1005 - ayounsi@cumin1003"
- 11:06 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Rollback records for host sretest1005 - ayounsi@cumin1003"
- 11:02 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 11:02 ayounsi@cumin1003: END (ERROR) - Cookbook sre.network.configure-switch-interfaces (exit_code=97) for host sretest1005
- 11:02 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1005
- 11:02 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1005.eqiad.wmnet 130.32.64.10.in-addr.arpa 0.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:02 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache sretest1005.eqiad.wmnet 130.32.64.10.in-addr.arpa 0.3.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 11:02 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:02 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest1005 - ayounsi@cumin1003"
- 11:02 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest1005 - ayounsi@cumin1003"
- 11:01 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 11:01 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db1184: After schema change
- 11:00 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 10:58 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 10:58 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host sretest1005
- 10:57 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1005.eqiad.wmnet with OS bookworm
- 10:54 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 10:48 moritzm: upgrade cloudcumin1001 to bookworm T403153
- 10:48 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
- 10:42 ayounsi@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1002.eqiad.wmnet with reason: host reimage
- 10:36 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.46.0-wmf.14 refs T413805
- 10:24 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 10:23 ayounsi@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
- 10:19 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 10:18 ayounsi@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
- 10:13 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 10:13 ayounsi@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
- 09:49 ammarpad@deploy2002: mwscript-k8s job started: refreshImageMetadata.php --wiki=commonswiki --mediatype=AUDIO --mime=application/ogg '--metadata-contains=Stream Undecodable' --force # T414348
- 09:48 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab replica
- 09:45 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 09:45 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.14 refs T413805
- 09:44 ayounsi@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1002.eqiad.wmnet with OS bookworm
- 09:41 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2205 gradually with 4 steps - After schema change
- 09:39 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab replica
- 09:37 jelto@cumin1003: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
- 09:28 ayounsi@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host sretest1002
- 09:28 ayounsi@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host sretest1002
- 09:28 jelto@cumin1003: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab replica
- 09:27 ayounsi@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host sretest1002
- 09:27 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest1002.eqiad.wmnet 139.48.64.10.in-addr.arpa 9.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 09:27 ayounsi@cumin1003: START - Cookbook sre.dns.wipe-cache sretest1002.eqiad.wmnet 139.48.64.10.in-addr.arpa 9.3.1.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 09:27 ayounsi@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 09:27 ayounsi@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest1002 - ayounsi@cumin1003"
- 09:27 ayounsi@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host sretest1002 - ayounsi@cumin1003"
- 09:23 ayounsi@cumin1003: START - Cookbook sre.dns.netbox
- 09:21 ayounsi@cumin1003: START - Cookbook sre.hosts.move-vlan for host sretest1002
- 09:21 ayounsi@cumin1003: START - Cookbook sre.hosts.reimage for host sretest1002.eqiad.wmnet with OS bookworm
- 09:09 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1184.eqiad.wmnet with reason: Maintenance
- 09:07 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1184.eqiad.wmnet with reason: Schema change
- 09:07 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1184 T416480', diff saved to https://phabricator.wikimedia.org/P88703 and previous config saved to /var/cache/conftool/dbconfig/20260205-090702-marostegui.json
- 09:06 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1163 to s1 primary T416480', diff saved to https://phabricator.wikimedia.org/P88702 and previous config saved to /var/cache/conftool/dbconfig/20260205-090623-marostegui.json
- 09:04 moritzm: update hosts running routed Ganeti to dnsmasq 2.92-1~wmf12u1 T396864
- 09:02 marostegui: Starting s1 eqiad failover from db1184 to db1163 - T416480
- 09:02 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s1 T416480
- 09:01 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1163 with weight 0 T416480', diff saved to https://phabricator.wikimedia.org/P88701 and previous config saved to /var/cache/conftool/dbconfig/20260205-090145-marostegui.json
- 08:58 jmm@dns1004: END - running authdns-update
- 08:57 jmm@dns1004: START - running authdns-update
- 08:56 marostegui@cumin1003: START - Cookbook sre.mysql.pool db2205 gradually with 4 steps - After schema change
- 08:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T415786)', diff saved to https://phabricator.wikimedia.org/P88698 and previous config saved to /var/cache/conftool/dbconfig/20260205-081536-marostegui.json
- 08:12 Msz2001: Morning backport window finished
- 08:11 mszwarc@deploy2002: Finished scap sync-world: Backport for Remove unused 'editor' right from plwiki (duration: 08m 33s)
- 08:07 mszwarc@deploy2002: matmarex, mszwarc: Continuing with sync
- 08:05 mszwarc@deploy2002: matmarex, mszwarc: Backport for Remove unused 'editor' right from plwiki synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:02 mszwarc@deploy2002: Started scap sync-world: Backport for Remove unused 'editor' right from plwiki
- 08:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P88697 and previous config saved to /var/cache/conftool/dbconfig/20260205-080027-marostegui.json
- 07:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222', diff saved to https://phabricator.wikimedia.org/P88696 and previous config saved to /var/cache/conftool/dbconfig/20260205-074519-marostegui.json
- 07:42 moritzm: installing openjdk-21 security updates
- 07:36 moritzm: installing openjdk-25 security updates
- 07:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2222 (T415786)', diff saved to https://phabricator.wikimedia.org/P88695 and previous config saved to /var/cache/conftool/dbconfig/20260205-073011-marostegui.json
- 06:33 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2205.codfw.wmnet with reason: Maintenance
- 06:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2205.codfw.wmnet with reason: Schema change
- 06:32 marostegui@dns1006: END - running authdns-update
- 06:31 marostegui@dns1006: START - running authdns-update
- 06:27 marostegui@dns1006: END - running authdns-update
- 06:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2205 T416299', diff saved to https://phabricator.wikimedia.org/P88694 and previous config saved to /var/cache/conftool/dbconfig/20260205-062737-marostegui.json
- 06:26 marostegui@dns1006: START - running authdns-update
- 06:26 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2209 to s3 primary and set section read-write T416299', diff saved to https://phabricator.wikimedia.org/P88693 and previous config saved to /var/cache/conftool/dbconfig/20260205-062617-marostegui.json
- 06:25 marostegui@cumin1003: dbctl commit (dc=all): 'Set s3 codfw as read-only for maintenance - T416299', diff saved to https://phabricator.wikimedia.org/P88692 and previous config saved to /var/cache/conftool/dbconfig/20260205-062557-marostegui.json
- 06:23 marostegui: Starting s3 codfw failover from db2205 to db2209 - T416299
- 06:22 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T416299
- 06:22 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2209 with weight 0 T416299', diff saved to https://phabricator.wikimedia.org/P88691 and previous config saved to /var/cache/conftool/dbconfig/20260205-062215-marostegui.json
- 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2222 (T415786)', diff saved to https://phabricator.wikimedia.org/P88690 and previous config saved to /var/cache/conftool/dbconfig/20260205-060031-marostegui.json
- 06:00 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2222.codfw.wmnet with reason: Maintenance
- 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T415786)', diff saved to https://phabricator.wikimedia.org/P88689 and previous config saved to /var/cache/conftool/dbconfig/20260205-060006-marostegui.json
- 05:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P88688 and previous config saved to /var/cache/conftool/dbconfig/20260205-054457-marostegui.json
- 05:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221', diff saved to https://phabricator.wikimedia.org/P88687 and previous config saved to /var/cache/conftool/dbconfig/20260205-052949-marostegui.json
- 05:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2221 (T415786)', diff saved to https://phabricator.wikimedia.org/P88686 and previous config saved to /var/cache/conftool/dbconfig/20260205-051441-marostegui.json
- 03:44 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2221 (T415786)', diff saved to https://phabricator.wikimedia.org/P88685 and previous config saved to /var/cache/conftool/dbconfig/20260205-034435-marostegui.json
- 03:44 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2221.codfw.wmnet with reason: Maintenance
- 03:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T415786)', diff saved to https://phabricator.wikimedia.org/P88684 and previous config saved to /var/cache/conftool/dbconfig/20260205-034410-marostegui.json
- 03:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P88683 and previous config saved to /var/cache/conftool/dbconfig/20260205-032902-marostegui.json
- 03:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P88682 and previous config saved to /var/cache/conftool/dbconfig/20260205-031354-marostegui.json
- 02:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T415786)', diff saved to https://phabricator.wikimedia.org/P88681 and previous config saved to /var/cache/conftool/dbconfig/20260205-025845-marostegui.json
- 02:40 samwilson@deploy2002: Finished scap sync-world: Backport for Revert "Support WikiEditor's resizing drag bar for Page editing" (T393231) (duration: 07m 20s)
- 02:36 samwilson@deploy2002: samwilson, bhsd: Continuing with sync
- 02:35 samwilson@deploy2002: samwilson, bhsd: Backport for Revert "Support WikiEditor's resizing drag bar for Page editing" (T393231) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 02:33 samwilson@deploy2002: Started scap sync-world: Backport for Revert "Support WikiEditor's resizing drag bar for Page editing" (T393231)
- 02:23 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 22m 21s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:29 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2220 (T415786)', diff saved to https://phabricator.wikimedia.org/P88680 and previous config saved to /var/cache/conftool/dbconfig/20260205-012942-marostegui.json
- 01:29 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2220.codfw.wmnet with reason: Maintenance
- 01:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T415786)', diff saved to https://phabricator.wikimedia.org/P88679 and previous config saved to /var/cache/conftool/dbconfig/20260205-012918-marostegui.json
- 01:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P88678 and previous config saved to /var/cache/conftool/dbconfig/20260205-011410-marostegui.json
- 01:06 samwilson@deploy2002: Finished scap sync-world: Backport for jquery.wikiEditor.js: disable resizing bar on proofread-page (T393231) (duration: 08m 21s)
- 01:02 samwilson@deploy2002: samwilson: Continuing with sync
- 01:00 samwilson@deploy2002: samwilson: Backport for jquery.wikiEditor.js: disable resizing bar on proofread-page (T393231) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P88677 and previous config saved to /var/cache/conftool/dbconfig/20260205-005902-marostegui.json
- 00:57 samwilson@deploy2002: Started scap sync-world: Backport for jquery.wikiEditor.js: disable resizing bar on proofread-page (T393231)
- 00:51 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 00:51 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 00:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T415786)', diff saved to https://phabricator.wikimedia.org/P88676 and previous config saved to /var/cache/conftool/dbconfig/20260205-004353-marostegui.json
- 00:36 reedy@deploy2002: Finished scap sync-world: Backport for Revert "Updated lcobucci/jwt from 4.1.5 to 4.3.0" (T416456), Revert "Updated lcobucci/jwt from 4.1.5 to 4.3.0" (T416456) (duration: 06m 50s)
- 00:32 reedy@deploy2002: reedy, zabe: Continuing with sync
- 00:32 reedy@deploy2002: reedy, zabe: Backport for Revert "Updated lcobucci/jwt from 4.1.5 to 4.3.0" (T416456), Revert "Updated lcobucci/jwt from 4.1.5 to 4.3.0" (T416456) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 00:30 reedy@deploy2002: Started scap sync-world: Backport for Revert "Updated lcobucci/jwt from 4.1.5 to 4.3.0" (T416456), Revert "Updated lcobucci/jwt from 4.1.5 to 4.3.0" (T416456)
2026-02-04
- 23:16 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2208 (T415786)', diff saved to https://phabricator.wikimedia.org/P88674 and previous config saved to /var/cache/conftool/dbconfig/20260204-231600-marostegui.json
- 23:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2208.codfw.wmnet with reason: Maintenance
- 23:10 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Applying upgrade to Java 11.0.30 — T416492 - eevans@cumin1003
- 22:28 ryankemper@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 22:27 ryankemper@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 22:27 ryankemper@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 22:26 ryankemper@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/dse-k8s-services/opensearch-semantic-search: apply
- 22:03 tgr_: late UTC deploys done
- 22:02 tgr@deploy2002: Finished scap sync-world: Backport for Migrate EmailAuth config, step 1 (T404334) (duration: 11m 28s)
- 21:56 tgr@deploy2002: tgr: Continuing with sync
- 21:55 tgr@deploy2002: tgr: Backport for Migrate EmailAuth config, step 1 (T404334) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:51 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2200.codfw.wmnet with reason: Maintenance
- 21:51 tgr@deploy2002: Started scap sync-world: Backport for Migrate EmailAuth config, step 1 (T404334)
- 21:47 dancy@deploy2002: Finished scap sync-world: Backport for Add messages for 'local-bot' global group (T415588), Add messages for 'local-bot' global group (T415588) (duration: 40m 00s)
- 21:34 dancy@deploy2002: matmarex, dancy: Continuing with sync
- 21:34 dancy@deploy2002: matmarex, dancy: Backport for Add messages for 'local-bot' global group (T415588), Add messages for 'local-bot' global group (T415588) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 21:07 dancy@deploy2002: Started scap sync-world: Backport for Add messages for 'local-bot' global group (T415588), Add messages for 'local-bot' global group (T415588)
- 21:06 urandom: restart Cassandra to apply Java 11.0.30 upgrade, restbase/codfw — T416492
- 21:06 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Applying upgrade to Java 11.0.30 — T416492 - eevans@cumin1003
- 20:52 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Applying upgrade to Java 11.0.30 - eevans@cumin1003
- 20:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2198.codfw.wmnet with reason: Maintenance
- 20:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T415786)', diff saved to https://phabricator.wikimedia.org/P88673 and previous config saved to /var/cache/conftool/dbconfig/20260204-200512-marostegui.json
- 19:52 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
- 19:52 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
- 19:51 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
- 19:51 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
- 19:50 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
- 19:50 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
- 19:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P88672 and previous config saved to /var/cache/conftool/dbconfig/20260204-195004-marostegui.json
- 19:47 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:47 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for fasw2-e16-eqiad pair - cmooney@cumin1003"
- 19:47 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns entries for fasw2-e16-eqiad pair - cmooney@cumin1003"
- 19:43 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 19:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P88671 and previous config saved to /var/cache/conftool/dbconfig/20260204-193455-marostegui.json
- 19:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T415786)', diff saved to https://phabricator.wikimedia.org/P88670 and previous config saved to /var/cache/conftool/dbconfig/20260204-191947-marostegui.json
- 18:48 urandom: restart Cassandra to apply Java 11.0.30 upgrade, restbase/eqiad — T416492
- 18:47 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Applying upgrade to Java 11.0.30 - eevans@cumin1003
- 18:29 daniel@deploy2002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
- 18:29 daniel@deploy2002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
- 18:22 dzahn@dns1004: END - running authdns-update
- 18:21 daniel@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
- 18:21 eevans@cumin1003: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs[2001,2003-2012,1011-1021]*: Applying upgrade to Java 11.0.30 - eevans@cumin1003
- 18:21 dzahn@dns1004: START - running authdns-update
- 18:21 daniel@deploy2002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
- 18:20 dzahn@dns1004: END - running authdns-update
- 18:19 dzahn@dns1004: START - running authdns-update
- 17:44 dwisehaupt@dns1004: END - running authdns-update
- 17:42 dwisehaupt@dns1004: START - running authdns-update
- 17:41 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast1004.wikimedia.org with OS trixie
- 17:36 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2182 (T415786)', diff saved to https://phabricator.wikimedia.org/P88668 and previous config saved to /var/cache/conftool/dbconfig/20260204-173612-marostegui.json
- 17:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2182.codfw.wmnet with reason: Maintenance
- 17:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T415786)', diff saved to https://phabricator.wikimedia.org/P88667 and previous config saved to /var/cache/conftool/dbconfig/20260204-173547-marostegui.json
- 17:24 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1004.wikimedia.org with reason: host reimage
- 17:21 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1004.wikimedia.org with reason: host reimage
- 17:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P88666 and previous config saved to /var/cache/conftool/dbconfig/20260204-172039-marostegui.json
- 17:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P88665 and previous config saved to /var/cache/conftool/dbconfig/20260204-170530-marostegui.json
- 17:03 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host bast1004.wikimedia.org with OS trixie
- 17:02 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast1004.wikimedia.org with OS trixie
- 17:02 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 17:02 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 16:55 urbanecm@deploy2002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
- 16:53 urbanecm@deploy2002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
- 16:52 urbanecm@deploy2002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
- 16:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T415786)', diff saved to https://phabricator.wikimedia.org/P88664 and previous config saved to /var/cache/conftool/dbconfig/20260204-165022-marostegui.json
- 16:50 urbanecm@deploy2002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
- 16:49 urbanecm@deploy2002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
- 16:47 urbanecm@deploy2002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
- 16:44 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1004.wikimedia.org with reason: host reimage
- 16:39 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1004.wikimedia.org with reason: host reimage
- 16:34 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1236 gradually with 4 steps - After schema change
- 16:22 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host bast1004.wikimedia.org with OS trixie
- 16:18 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast1004.wikimedia.org with OS trixie
- 16:18 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host bast1004.wikimedia.org with OS trixie
- 16:13 Amir1: bumping rate limit of non-standard thumb sizes to medium browser score (T402792 T414805)
- 15:56 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host bast1004
- 15:54 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host bast1004
- 15:54 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 15:54 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt bast1004 - jclark@cumin1003"
- 15:54 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt bast1004 - jclark@cumin1003"
- 15:51 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging ChandraWMDE out of all services on: 2497 hosts
- 15:50 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 15:49 jclark@cumin1003: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
- 15:48 marostegui@cumin1003: START - Cookbook sre.mysql.pool db1236 gradually with 4 steps - After schema change
- 15:47 eevans@cumin1003: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs[2001,2003-2012,1011-1021]*: Applying upgrade to Java 11.0.30 - eevans@cumin1003
- 15:47 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 15:46 jclark@cumin1003: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host bast1004
- 15:46 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host bast1004
- 15:39 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns4003.wikimedia.org [reason: [end] bird2 upgrade]
- 15:39 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 15:38 urandom: restarting Cassandra on aqs[2001,2003-2012] & aqs[1011,1014-1027 to apply Java 11.0.30 — T416492
- 15:34 sukhe: upgrade to bird 2.18 on dns4003: T413740
- 15:34 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host bast1004.eqiad.wmnet with OS trixie
- 15:33 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host bast1004.eqiad.wmnet with OS trixie
- 15:32 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns4003.wikimedia.org [reason: bird2 upgrade]
- 15:29 ladsgroup@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
- 15:28 ladsgroup@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
- 15:26 urbanecm@deploy2002: Finished scap sync-world: Backport for Revert "DatabaseUserImpactStore: log attempts to save zero pageviews values" (T414080), Revert "DatabaseUserImpactStore: log attempts to save zero pageviews values" (T414080) (duration: 16m 20s)
- 15:21 urbanecm@deploy2002: urbanecm: Continuing with sync
- 15:12 urandom: restarting Cassandra on [aqs2002.codfw.wmnet,aqs1010.eqiad.wmnet] to canary Java 11.0.30 — T416492
- 15:12 urbanecm@deploy2002: urbanecm: Backport for Revert "DatabaseUserImpactStore: log attempts to save zero pageviews values" (T414080), Revert "DatabaseUserImpactStore: log attempts to save zero pageviews values" (T414080) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 15:12 urandom: restarting Cassandra on [aqs2002.codfw.wmnet,aqs1010.eqiad.wmnet] to canary Java 11.0.30 —
- 15:10 urbanecm@deploy2002: Started scap sync-world: Backport for Revert "DatabaseUserImpactStore: log attempts to save zero pageviews values" (T414080), Revert "DatabaseUserImpactStore: log attempts to save zero pageviews values" (T414080)
- 15:01 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2168 (T415786)', diff saved to https://phabricator.wikimedia.org/P88657 and previous config saved to /var/cache/conftool/dbconfig/20260204-150138-marostegui.json
- 15:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2168.codfw.wmnet with reason: Maintenance
- 15:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88656 and previous config saved to /var/cache/conftool/dbconfig/20260204-150124-marostegui.json
- 14:54 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1022.eqiad.wmnet with OS bullseye
- 14:54 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:54 daniel@deploy2002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
- 14:54 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1236.eqiad.wmnet with reason: Maintenance
- 14:53 daniel@deploy2002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
- 14:53 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1023.eqiad.wmnet with OS bullseye
- 14:53 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:52 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 14:52 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1236.eqiad.wmnet with reason: Schema change
- 14:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1236 T416356', diff saved to https://phabricator.wikimedia.org/P88655 and previous config saved to /var/cache/conftool/dbconfig/20260204-144951-marostegui.json
- 14:49 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1181 to s7 primary T416356', diff saved to https://phabricator.wikimedia.org/P88654 and previous config saved to /var/cache/conftool/dbconfig/20260204-144914-marostegui.json
- 14:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P88653 and previous config saved to /var/cache/conftool/dbconfig/20260204-144616-marostegui.json
- 14:46 marostegui: Starting s7 eqiad failover from db1236 to db1181 - T416356
- 14:45 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T416356
- 14:45 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1181 with weight 0 T416356', diff saved to https://phabricator.wikimedia.org/P88652 and previous config saved to /var/cache/conftool/dbconfig/20260204-144508-marostegui.json
- 14:43 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 14:37 urbanecm@deploy2002: Finished scap sync-world: Backport for Add client.tag_metadata_categories field support (T414571) (duration: 09m 26s)
- 14:36 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1022.eqiad.wmnet with reason: host reimage
- 14:34 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1023.eqiad.wmnet with reason: host reimage
- 14:33 urbanecm@deploy2002: kharlan, urbanecm: Continuing with sync
- 14:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P88651 and previous config saved to /var/cache/conftool/dbconfig/20260204-143108-marostegui.json
- 14:30 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1022.eqiad.wmnet with reason: host reimage
- 14:30 urbanecm@deploy2002: kharlan, urbanecm: Backport for Add client.tag_metadata_categories field support (T414571) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:30 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1023.eqiad.wmnet with reason: host reimage
- 14:27 urbanecm@deploy2002: Started scap sync-world: Backport for Add client.tag_metadata_categories field support (T414571)
- 14:27 moritzm: remove legacy kibana discovery certificate T365798
- {{safesubst:SAL entry|1=14:24 urbanecm@deploy2002: Finished scap sync-world: Backport for Fix audio transcodes, DatabaseUserImpactStore: log attempts to save zero pageviews values (T414080), DatabaseUserImpactStore: log attempts to save zero pageviews values (T414080), IPReputationIPoidDataLookup: Allow returning stale values for 72 hours (T416316), [[gerrit:1236689|IPRepu}}
- 14:20 sukhe: sudo cumin -b1 -s5 "A:dnsbox" "run-puppet-agent --enable 'merging CR 1228560'"
- 14:20 urbanecm@deploy2002: hartman, kharlan, urbanecm: Continuing with sync
- {{safesubst:SAL entry|1=14:18 urbanecm@deploy2002: hartman, kharlan, urbanecm: Backport for Fix audio transcodes, DatabaseUserImpactStore: log attempts to save zero pageviews values (T414080), DatabaseUserImpactStore: log attempts to save zero pageviews values (T414080), IPReputationIPoidDataLookup: Allow returning stale values for 72 hours (T416316), [[gerrit:1236689|IPRe}}
- {{safesubst:SAL entry|1=14:16 urbanecm@deploy2002: Started scap sync-world: Backport for Fix audio transcodes, DatabaseUserImpactStore: log attempts to save zero pageviews values (T414080), DatabaseUserImpactStore: log attempts to save zero pageviews values (T414080), IPReputationIPoidDataLookup: Allow returning stale values for 72 hours (T416316), [[gerrit:1236689|IPReput}}
- 14:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88650 and previous config saved to /var/cache/conftool/dbconfig/20260204-141559-marostegui.json
- 14:15 sukhe: sudo cumin "A:dnsbox" "disable-puppet 'merging CR 1228560'"
- 14:14 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-fe1023
- 14:14 jclark@cumin1003: START - Cookbook sre.hosts.move-vlan for host ms-fe1023
- 14:14 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1023.eqiad.wmnet with OS bullseye
- 14:14 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1022.eqiad.wmnet with OS bullseye
- 14:13 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1023.eqiad.wmnet with OS bullseye
- 14:08 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1022.eqiad.wmnet with OS bullseye
- 14:06 moritzm: installing php7.4 security updates
- 14:03 dpogorzelski@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 14:01 dpogorzelski@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:52 moritzm: disable nrpe2nodexp check for ferm on cloudcumin*
- 13:46 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1022.eqiad.wmnet with OS bullseye
- 13:46 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:37 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1024.eqiad.wmnet with OS bullseye
- 13:37 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 13:30 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:29 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:29 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
- 13:28 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 13:26 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:24 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: OpenJDK 11 security updates - jmm@cumin2002
- 13:12 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-fe1021.eqiad.wmnet with OS bullseye
- 13:12 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 13:07 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 13:06 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: OpenJDK 11 security updates - jmm@cumin2002
- 13:05 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1024.eqiad.wmnet with reason: host reimage
- 13:02 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: OpenJDK 11 security updates - jmm@cumin2002
- 13:00 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-fe1023
- 13:00 jclark@cumin1003: START - Cookbook sre.hosts.move-vlan for host ms-fe1023
- 13:00 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1023.eqiad.wmnet with OS bullseye
- 13:00 moritzm: remove legacy wdqs-internal discovery certificate T365798
- 13:00 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1023.eqiad.wmnet with OS bullseye
- 12:58 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1024.eqiad.wmnet with reason: host reimage
- 12:53 moritzm: remove legacy eventstreams-internal discovery certificate T365798
- 12:44 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-fe1021.eqiad.wmnet with reason: host reimage
- 12:44 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: OpenJDK 11 security updates - jmm@cumin2002
- 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: OpenJDK 11 security updates - jmm@cumin2002
- 12:42 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:42 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1024.eqiad.wmnet with OS bullseye
- 12:42 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1023.eqiad.wmnet with OS bullseye
- 12:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:41 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:40 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:40 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:40 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:40 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-fe1021.eqiad.wmnet with reason: host reimage
- 12:33 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2159 (T415786)', diff saved to https://phabricator.wikimedia.org/P88648 and previous config saved to /var/cache/conftool/dbconfig/20260204-123308-marostegui.json
- 12:33 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2159.codfw.wmnet with reason: Maintenance
- 12:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T415786)', diff saved to https://phabricator.wikimedia.org/P88647 and previous config saved to /var/cache/conftool/dbconfig/20260204-123243-marostegui.json
- 12:32 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
- 12:31 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/zotero: apply
- 12:23 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1021.eqiad.wmnet with OS bullseye
- 12:23 jmm@cumin2002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: OpenJDK 11 security updates - jmm@cumin2002
- 12:22 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
- 12:21 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/zotero: apply
- 12:18 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/zotero: apply
- 12:18 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/zotero: apply
- 12:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P88646 and previous config saved to /var/cache/conftool/dbconfig/20260204-121735-marostegui.json
- 12:07 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup1015.eqiad.wmnet with OS trixie
- 12:07 jynus@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jynus@cumin1003"
- 12:06 jynus@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jynus@cumin1003"
- 12:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P88645 and previous config saved to /var/cache/conftool/dbconfig/20260204-120227-marostegui.json
- 11:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T415786)', diff saved to https://phabricator.wikimedia.org/P88644 and previous config saved to /var/cache/conftool/dbconfig/20260204-114718-marostegui.json
- 11:45 jynus@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1015.eqiad.wmnet with reason: host reimage
- 11:42 moritzm: installing openjdk-11 security updates
- 11:41 jynus@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1015.eqiad.wmnet with reason: host reimage
- 11:39 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
- 11:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 11:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T415786)', diff saved to https://phabricator.wikimedia.org/P88643 and previous config saved to /var/cache/conftool/dbconfig/20260204-113854-marostegui.json
- 11:38 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: sync
- 11:35 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
- 11:35 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
- 11:32 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
- 11:32 elukey@deploy2002: helmfile [staging] START helmfile.d/services/thumbor: sync
- 11:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P88642 and previous config saved to /var/cache/conftool/dbconfig/20260204-112846-marostegui.json
- 11:26 jynus@cumin1003: START - Cookbook sre.hosts.reimage for host backup1015.eqiad.wmnet with OS trixie
- 11:23 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:22 elukey@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251', diff saved to https://phabricator.wikimedia.org/P88641 and previous config saved to /var/cache/conftool/dbconfig/20260204-111837-marostegui.json
- 11:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1251 (T415786)', diff saved to https://phabricator.wikimedia.org/P88640 and previous config saved to /var/cache/conftool/dbconfig/20260204-110829-marostegui.json
- 10:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
- 10:47 moritzm: installing openjdk-17 security updates
- 10:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
- 10:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T415786)', diff saved to https://phabricator.wikimedia.org/P88639 and previous config saved to /var/cache/conftool/dbconfig/20260204-104035-marostegui.json
- 10:39 moritzm: upgrade cloudcumin2001 to bookworm T403153
- 10:36 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.14 refs T413805
- 10:29 hashar: Rolling back to group0 due to an issue with OAuth on metawiki # T413805
- 10:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P88638 and previous config saved to /var/cache/conftool/dbconfig/20260204-102527-marostegui.json
- 10:11 hashar: Restarted CI Jenkins
- 10:10 hashar: Gerrit is back
- 10:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P88637 and previous config saved to /var/cache/conftool/dbconfig/20260204-101018-marostegui.json
- 10:06 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2150 (T415786)', diff saved to https://phabricator.wikimedia.org/P88636 and previous config saved to /var/cache/conftool/dbconfig/20260204-100638-marostegui.json
- 10:06 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2150.codfw.wmnet with reason: Maintenance
- 10:06 hashar: Restarting Gerrit instances
- 09:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T415786)', diff saved to https://phabricator.wikimedia.org/P88635 and previous config saved to /var/cache/conftool/dbconfig/20260204-095510-marostegui.json
- 09:38 moritzm: installing openssl security updates
- 09:37 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.46.0-wmf.14 refs T413805
- 09:34 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1251 (T415786)', diff saved to https://phabricator.wikimedia.org/P88634 and previous config saved to /var/cache/conftool/dbconfig/20260204-093421-marostegui.json
- 09:34 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1251.eqiad.wmnet with reason: Maintenance
- 09:10 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
- 09:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T415786)', diff saved to https://phabricator.wikimedia.org/P88632 and previous config saved to /var/cache/conftool/dbconfig/20260204-091015-marostegui.json
- 08:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P88631 and previous config saved to /var/cache/conftool/dbconfig/20260204-085506-marostegui.json
- 08:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253', diff saved to https://phabricator.wikimedia.org/P88630 and previous config saved to /var/cache/conftool/dbconfig/20260204-083958-marostegui.json
- 08:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1253 (T415786)', diff saved to https://phabricator.wikimedia.org/P88629 and previous config saved to /var/cache/conftool/dbconfig/20260204-082450-marostegui.json
- 08:23 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2216 (T415786)', diff saved to https://phabricator.wikimedia.org/P88628 and previous config saved to /var/cache/conftool/dbconfig/20260204-082324-marostegui.json
- 08:23 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2216.codfw.wmnet with reason: Maintenance
- 08:23 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T415786)', diff saved to https://phabricator.wikimedia.org/P88627 and previous config saved to /var/cache/conftool/dbconfig/20260204-082259-marostegui.json
- 08:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 08:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P88626 and previous config saved to /var/cache/conftool/dbconfig/20260204-080751-marostegui.json
- 07:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P88625 and previous config saved to /var/cache/conftool/dbconfig/20260204-075243-marostegui.json
- 07:39 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 07:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T415786)', diff saved to https://phabricator.wikimedia.org/P88624 and previous config saved to /var/cache/conftool/dbconfig/20260204-073735-marostegui.json
- 07:35 marostegui: Deploy schema change on db2204 (old s2 codfw master) T415786
- 07:34 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2204.codfw.wmnet with reason: Schema change
- 07:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1253 (T415786)', diff saved to https://phabricator.wikimedia.org/P88623 and previous config saved to /var/cache/conftool/dbconfig/20260204-072658-marostegui.json
- 07:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1253.eqiad.wmnet with reason: Maintenance
- 07:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T415786)', diff saved to https://phabricator.wikimedia.org/P88622 and previous config saved to /var/cache/conftool/dbconfig/20260204-072632-marostegui.json
- 07:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P88621 and previous config saved to /var/cache/conftool/dbconfig/20260204-071124-marostegui.json
- 06:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P88620 and previous config saved to /var/cache/conftool/dbconfig/20260204-065616-marostegui.json
- 06:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1239.eqiad.wmnet with reason: Maintenance
- 06:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T415786)', diff saved to https://phabricator.wikimedia.org/P88619 and previous config saved to /var/cache/conftool/dbconfig/20260204-064118-marostegui.json
- 06:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T415786)', diff saved to https://phabricator.wikimedia.org/P88618 and previous config saved to /var/cache/conftool/dbconfig/20260204-064107-marostegui.json
- 06:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P88617 and previous config saved to /var/cache/conftool/dbconfig/20260204-063103-marostegui.json
- 06:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P88616 and previous config saved to /var/cache/conftool/dbconfig/20260204-062055-marostegui.json
- 06:18 marostegui@dns1006: END - running authdns-update
- 06:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2204 T416300', diff saved to https://phabricator.wikimedia.org/P88615 and previous config saved to /var/cache/conftool/dbconfig/20260204-061739-marostegui.json
- 06:17 marostegui@dns1006: START - running authdns-update
- 06:16 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2207 to s2 primary and set section read-write T416300', diff saved to https://phabricator.wikimedia.org/P88614 and previous config saved to /var/cache/conftool/dbconfig/20260204-061637-marostegui.json
- 06:16 marostegui@cumin1003: dbctl commit (dc=all): 'Set s2 codfw as read-only for maintenance - T416300', diff saved to https://phabricator.wikimedia.org/P88613 and previous config saved to /var/cache/conftool/dbconfig/20260204-061613-marostegui.json
- 06:13 marostegui: Starting s2 codfw failover from db2204 to db2207 - T416300
- 06:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T416300
- 06:11 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2207 with weight 0 T416300', diff saved to https://phabricator.wikimedia.org/P88612 and previous config saved to /var/cache/conftool/dbconfig/20260204-061122-marostegui.json
- 06:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T415786)', diff saved to https://phabricator.wikimedia.org/P88611 and previous config saved to /var/cache/conftool/dbconfig/20260204-061047-marostegui.json
- 06:05 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2212 (T415786)', diff saved to https://phabricator.wikimedia.org/P88610 and previous config saved to /var/cache/conftool/dbconfig/20260204-060516-marostegui.json
- 06:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2212.codfw.wmnet with reason: Maintenance
- 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1231 (T415786)', diff saved to https://phabricator.wikimedia.org/P88609 and previous config saved to /var/cache/conftool/dbconfig/20260204-054542-marostegui.json
- 05:45 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1231.eqiad.wmnet with reason: Maintenance
- 05:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T415786)', diff saved to https://phabricator.wikimedia.org/P88608 and previous config saved to /var/cache/conftool/dbconfig/20260204-054518-marostegui.json
- 05:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P88607 and previous config saved to /var/cache/conftool/dbconfig/20260204-053009-marostegui.json
- 05:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P88606 and previous config saved to /var/cache/conftool/dbconfig/20260204-051501-marostegui.json
- 04:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T415786)', diff saved to https://phabricator.wikimedia.org/P88605 and previous config saved to /var/cache/conftool/dbconfig/20260204-045953-marostegui.json
- 04:41 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2202.codfw.wmnet with reason: Maintenance
- 04:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T415786)', diff saved to https://phabricator.wikimedia.org/P88604 and previous config saved to /var/cache/conftool/dbconfig/20260204-044137-marostegui.json
- 04:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1235 (T415786)', diff saved to https://phabricator.wikimedia.org/P88603 and previous config saved to /var/cache/conftool/dbconfig/20260204-044022-marostegui.json
- 04:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1235.eqiad.wmnet with reason: Maintenance
- 04:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T415786)', diff saved to https://phabricator.wikimedia.org/P88602 and previous config saved to /var/cache/conftool/dbconfig/20260204-043958-marostegui.json
- 04:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P88601 and previous config saved to /var/cache/conftool/dbconfig/20260204-042950-marostegui.json
- 04:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P88600 and previous config saved to /var/cache/conftool/dbconfig/20260204-042629-marostegui.json
- 04:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P88599 and previous config saved to /var/cache/conftool/dbconfig/20260204-041941-marostegui.json
- 04:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P88598 and previous config saved to /var/cache/conftool/dbconfig/20260204-041121-marostegui.json
- 04:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T415786)', diff saved to https://phabricator.wikimedia.org/P88597 and previous config saved to /var/cache/conftool/dbconfig/20260204-040933-marostegui.json
- 03:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T415786)', diff saved to https://phabricator.wikimedia.org/P88596 and previous config saved to /var/cache/conftool/dbconfig/20260204-035612-marostegui.json
- 03:31 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1227 (T415786)', diff saved to https://phabricator.wikimedia.org/P88595 and previous config saved to /var/cache/conftool/dbconfig/20260204-033110-marostegui.json
- 03:31 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1227.eqiad.wmnet with reason: Maintenance
- 03:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T415786)', diff saved to https://phabricator.wikimedia.org/P88594 and previous config saved to /var/cache/conftool/dbconfig/20260204-033046-marostegui.json
- 03:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P88593 and previous config saved to /var/cache/conftool/dbconfig/20260204-031537-marostegui.json
- 03:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P88592 and previous config saved to /var/cache/conftool/dbconfig/20260204-030029-marostegui.json
- 02:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T415786)', diff saved to https://phabricator.wikimedia.org/P88591 and previous config saved to /var/cache/conftool/dbconfig/20260204-024521-marostegui.json
- 02:37 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1234 (T415786)', diff saved to https://phabricator.wikimedia.org/P88590 and previous config saved to /var/cache/conftool/dbconfig/20260204-023659-marostegui.json
- 02:36 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1234.eqiad.wmnet with reason: Maintenance
- 02:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T415786)', diff saved to https://phabricator.wikimedia.org/P88589 and previous config saved to /var/cache/conftool/dbconfig/20260204-023634-marostegui.json
- 02:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2188 (T415786)', diff saved to https://phabricator.wikimedia.org/P88588 and previous config saved to /var/cache/conftool/dbconfig/20260204-022717-marostegui.json
- 02:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2188.codfw.wmnet with reason: Maintenance
- 02:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T415786)', diff saved to https://phabricator.wikimedia.org/P88587 and previous config saved to /var/cache/conftool/dbconfig/20260204-022652-marostegui.json
- 02:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P88586 and previous config saved to /var/cache/conftool/dbconfig/20260204-022626-marostegui.json
- 02:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P88585 and previous config saved to /var/cache/conftool/dbconfig/20260204-021617-marostegui.json
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 50s)
- 02:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P88584 and previous config saved to /var/cache/conftool/dbconfig/20260204-021144-marostegui.json
- 02:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T415786)', diff saved to https://phabricator.wikimedia.org/P88583 and previous config saved to /var/cache/conftool/dbconfig/20260204-020609-marostegui.json
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 01:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P88582 and previous config saved to /var/cache/conftool/dbconfig/20260204-015635-marostegui.json
- 01:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T415786)', diff saved to https://phabricator.wikimedia.org/P88581 and previous config saved to /var/cache/conftool/dbconfig/20260204-014127-marostegui.json
- 01:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1202 (T415786)', diff saved to https://phabricator.wikimedia.org/P88580 and previous config saved to /var/cache/conftool/dbconfig/20260204-013958-marostegui.json
- 01:39 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1202.eqiad.wmnet with reason: Maintenance
- 01:39 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T415786)', diff saved to https://phabricator.wikimedia.org/P88579 and previous config saved to /var/cache/conftool/dbconfig/20260204-013944-marostegui.json
- 01:36 ladsgroup@deploy2002: Finished scap sync-world: Backport for UserImpact: Remove zeros in per-article view stats (T414080), UserImpact: Remove zeros in per-article view stats (T414080) (duration: 10m 38s)
- 01:29 ladsgroup@deploy2002: ladsgroup: Continuing with sync
- 01:29 ladsgroup@deploy2002: ladsgroup: Backport for UserImpact: Remove zeros in per-article view stats (T414080), UserImpact: Remove zeros in per-article view stats (T414080) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 01:27 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1006.eqiad.wmnet with OS trixie
- 01:27 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:26 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:25 ladsgroup@deploy2002: Started scap sync-world: Backport for UserImpact: Remove zeros in per-article view stats (T414080), UserImpact: Remove zeros in per-article view stats (T414080)
- 01:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P88578 and previous config saved to /var/cache/conftool/dbconfig/20260204-012436-marostegui.json
- 01:24 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1008.eqiad.wmnet with OS trixie
- 01:24 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:23 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:20 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1007.eqiad.wmnet with OS trixie
- 01:20 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:19 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:15 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1005.eqiad.wmnet with OS trixie
- 01:15 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:15 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1003"
- 01:10 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1006.eqiad.wmnet with reason: host reimage
- 01:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P88577 and previous config saved to /var/cache/conftool/dbconfig/20260204-010928-marostegui.json
- 01:07 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1008.eqiad.wmnet with reason: host reimage
- 01:03 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1007.eqiad.wmnet with reason: host reimage
- 01:01 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1008.eqiad.wmnet with reason: host reimage
- 00:59 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1005.eqiad.wmnet with reason: host reimage
- 00:57 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1007.eqiad.wmnet with reason: host reimage
- 00:56 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1006.eqiad.wmnet with reason: host reimage
- 00:55 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1005.eqiad.wmnet with reason: host reimage
- 00:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T415786)', diff saved to https://phabricator.wikimedia.org/P88576 and previous config saved to /var/cache/conftool/dbconfig/20260204-005419-marostegui.json
- 00:50 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1008.eqiad.wmnet with OS trixie
- 00:49 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1004.eqiad.wmnet with OS trixie
- 00:49 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host tools-k8s-worker1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:46 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1007.eqiad.wmnet with OS trixie
- 00:45 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host tools-k8s-worker1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:45 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1003.eqiad.wmnet with OS trixie
- 00:45 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1006.eqiad.wmnet with OS trixie
- 00:44 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1005.eqiad.wmnet with OS trixie
- 00:42 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host tools-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:42 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1002.eqiad.wmnet with OS trixie
- 00:41 jclark@cumin1003: START - Cookbook sre.hosts.provision for host tools-k8s-worker1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:40 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host tools-k8s-worker1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:38 jclark@cumin1003: START - Cookbook sre.hosts.provision for host tools-k8s-worker1007.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:37 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-worker1001.eqiad.wmnet with OS trixie
- 00:35 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1232 (T415786)', diff saved to https://phabricator.wikimedia.org/P88575 and previous config saved to /var/cache/conftool/dbconfig/20260204-003551-marostegui.json
- 00:35 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1232.eqiad.wmnet with reason: Maintenance
- 00:35 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T415786)', diff saved to https://phabricator.wikimedia.org/P88574 and previous config saved to /var/cache/conftool/dbconfig/20260204-003526-marostegui.json
- 00:34 jclark@cumin1003: START - Cookbook sre.hosts.provision for host tools-k8s-worker1006.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:33 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1004.eqiad.wmnet with reason: host reimage
- 00:33 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-ctrl1001.eqiad.wmnet with OS trixie
- 00:33 jclark@cumin1003: START - Cookbook sre.hosts.provision for host tools-k8s-worker1005.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
- 00:30 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host tools-k8s-ctrl1002.eqiad.wmnet with OS trixie
- 00:29 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1003.eqiad.wmnet with reason: host reimage
- 00:25 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1002.eqiad.wmnet with reason: host reimage
- 00:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P88573 and previous config saved to /var/cache/conftool/dbconfig/20260204-002518-marostegui.json
- 00:24 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1003.eqiad.wmnet with reason: host reimage
- 00:23 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1004.eqiad.wmnet with reason: host reimage
- 00:21 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-worker1001.eqiad.wmnet with reason: host reimage
- 00:17 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1002.eqiad.wmnet with reason: host reimage
- 00:17 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-worker1001.eqiad.wmnet with reason: host reimage
- 00:17 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
- 00:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P88572 and previous config saved to /var/cache/conftool/dbconfig/20260204-001509-marostegui.json
- 00:13 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on tools-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
- 00:11 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1003.eqiad.wmnet with OS trixie
- 00:11 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1004.eqiad.wmnet with OS trixie
- 00:09 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
- 00:09 jclark@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on tools-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
- 00:05 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1002.eqiad.wmnet with OS trixie
- 00:05 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-worker1001.eqiad.wmnet with OS trixie
- 00:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T415786)', diff saved to https://phabricator.wikimedia.org/P88571 and previous config saved to /var/cache/conftool/dbconfig/20260204-000501-marostegui.json
2026-02-03
- 23:57 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-ctrl1002.eqiad.wmnet with OS trixie
- 23:57 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host tools-k8s-ctrl1001.eqiad.wmnet with OS trixie
- 23:56 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2176 (T415786)', diff saved to https://phabricator.wikimedia.org/P88570 and previous config saved to /var/cache/conftool/dbconfig/20260203-235634-marostegui.json
- 23:56 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2176.codfw.wmnet with reason: Maintenance
- 23:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T415786)', diff saved to https://phabricator.wikimedia.org/P88569 and previous config saved to /var/cache/conftool/dbconfig/20260203-235609-marostegui.json
- 23:55 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1008
- 23:55 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1008
- 23:55 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1007
- 23:55 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1007
- 23:55 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1006
- 23:54 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1006
- 23:54 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1005
- 23:54 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1005
- 23:54 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1004
- 23:54 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1004
- 23:54 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1003
- 23:53 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1003
- 23:49 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1194 (T415786)', diff saved to https://phabricator.wikimedia.org/P88568 and previous config saved to /var/cache/conftool/dbconfig/20260203-234932-marostegui.json
- 23:49 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1194.eqiad.wmnet with reason: Maintenance
- 23:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T415786)', diff saved to https://phabricator.wikimedia.org/P88567 and previous config saved to /var/cache/conftool/dbconfig/20260203-234908-marostegui.json
- 23:48 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1002
- 23:48 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1002
- 23:47 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-worker1001
- 23:47 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-worker1001
- 23:47 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-ctrl1002
- 23:47 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-ctrl1002
- 23:47 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host tools-k8s-ctrl1001
- 23:46 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host tools-k8s-ctrl1001
- 23:45 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 23:45 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt tools-k8 - jclark@cumin1003"
- 23:45 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt tools-k8 - jclark@cumin1003"
- 23:41 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 23:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P88566 and previous config saved to /var/cache/conftool/dbconfig/20260203-234100-marostegui.json
- 23:40 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1015.eqiad.wmnet with OS bookworm
- 23:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P88565 and previous config saved to /var/cache/conftool/dbconfig/20260203-233400-marostegui.json
- 23:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P88564 and previous config saved to /var/cache/conftool/dbconfig/20260203-232552-marostegui.json
- 23:23 mutante: vrts1003 - fix systemd state: sed -i 's/vrts_rsync/rsync/' /lib/systemd/system/wmf_auto_restart_vrts_rsync.service ; systemctl daemon-reload - T416380 T135991
- 23:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P88563 and previous config saved to /var/cache/conftool/dbconfig/20260203-231851-marostegui.json
- 23:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T415786)', diff saved to https://phabricator.wikimedia.org/P88562 and previous config saved to /var/cache/conftool/dbconfig/20260203-231044-marostegui.json
- 23:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T415786)', diff saved to https://phabricator.wikimedia.org/P88561 and previous config saved to /var/cache/conftool/dbconfig/20260203-230343-marostegui.json
- 23:00 inflatador: bking@laptop roll-restarting wdqs codfw as it's lagging heavily
- 22:35 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 22:34 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 22:34 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
- 22:33 ryankemper@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
- 22:32 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1219 (T415786)', diff saved to https://phabricator.wikimedia.org/P88560 and previous config saved to /var/cache/conftool/dbconfig/20260203-223216-marostegui.json
- 22:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1219.eqiad.wmnet with reason: Maintenance
- 22:32 ryankemper@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 22:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T415786)', diff saved to https://phabricator.wikimedia.org/P88559 and previous config saved to /var/cache/conftool/dbconfig/20260203-223151-marostegui.json
- 22:31 ryankemper@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 22:29 ryankemper@deploy2002: helmfile [dse-k8s-codfw] DONE helmfile.d/admin 'apply'.
- 22:29 ryankemper@deploy2002: helmfile [dse-k8s-codfw] START helmfile.d/admin 'apply'.
- 22:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P88558 and previous config saved to /var/cache/conftool/dbconfig/20260203-222142-marostegui.json
- 22:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P88556 and previous config saved to /var/cache/conftool/dbconfig/20260203-221134-marostegui.json
- 22:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T415786)', diff saved to https://phabricator.wikimedia.org/P88555 and previous config saved to /var/cache/conftool/dbconfig/20260203-220126-marostegui.json
- 21:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1191 (T415786)', diff saved to https://phabricator.wikimedia.org/P88554 and previous config saved to /var/cache/conftool/dbconfig/20260203-215751-marostegui.json
- 21:57 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1191.eqiad.wmnet with reason: Maintenance
- 21:57 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T415786)', diff saved to https://phabricator.wikimedia.org/P88553 and previous config saved to /var/cache/conftool/dbconfig/20260203-215726-marostegui.json
- 21:54 dwisehaupt@dns1004: END - running authdns-update
- 21:52 dwisehaupt@dns1004: START - running authdns-update
- 21:42 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P88552 and previous config saved to /var/cache/conftool/dbconfig/20260203-214218-marostegui.json
- 21:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P88551 and previous config saved to /var/cache/conftool/dbconfig/20260203-212709-marostegui.json
- 21:26 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2174 (T415786)', diff saved to https://phabricator.wikimedia.org/P88550 and previous config saved to /var/cache/conftool/dbconfig/20260203-212616-marostegui.json
- 21:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2174.codfw.wmnet with reason: Maintenance
- 21:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T415786)', diff saved to https://phabricator.wikimedia.org/P88549 and previous config saved to /var/cache/conftool/dbconfig/20260203-212550-marostegui.json
- 21:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T415786)', diff saved to https://phabricator.wikimedia.org/P88548 and previous config saved to /var/cache/conftool/dbconfig/20260203-211201-marostegui.json
- 21:10 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P88547 and previous config saved to /var/cache/conftool/dbconfig/20260203-211041-marostegui.json
- 20:55 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P88545 and previous config saved to /var/cache/conftool/dbconfig/20260203-205532-marostegui.json
- 20:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T415786)', diff saved to https://phabricator.wikimedia.org/P88544 and previous config saved to /var/cache/conftool/dbconfig/20260203-204024-marostegui.json
- 20:30 cmooney@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1024.eqiad.wmnet with OS bullseye
- 20:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1218 (T415786)', diff saved to https://phabricator.wikimedia.org/P88543 and previous config saved to /var/cache/conftool/dbconfig/20260203-202743-marostegui.json
- 20:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1218.eqiad.wmnet with reason: Maintenance
- 20:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T415786)', diff saved to https://phabricator.wikimedia.org/P88542 and previous config saved to /var/cache/conftool/dbconfig/20260203-202718-marostegui.json
- 20:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P88541 and previous config saved to /var/cache/conftool/dbconfig/20260203-201709-marostegui.json
- 20:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P88540 and previous config saved to /var/cache/conftool/dbconfig/20260203-200700-marostegui.json
- 20:01 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1181 (T415786)', diff saved to https://phabricator.wikimedia.org/P88539 and previous config saved to /var/cache/conftool/dbconfig/20260203-200130-marostegui.json
- 20:01 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1181.eqiad.wmnet with reason: Maintenance
- 20:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T415786)', diff saved to https://phabricator.wikimedia.org/P88538 and previous config saved to /var/cache/conftool/dbconfig/20260203-200106-marostegui.json
- 19:59 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host backup1015.eqiad.wmnet with OS bookworm
- 19:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T415786)', diff saved to https://phabricator.wikimedia.org/P88537 and previous config saved to /var/cache/conftool/dbconfig/20260203-195652-marostegui.json
- 19:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P88536 and previous config saved to /var/cache/conftool/dbconfig/20260203-194557-marostegui.json
- 19:39 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1024.eqiad.wmnet with OS bullseye
- 19:38 cmooney@cumin1003: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ms-fe1024.eqiad.wmnet with OS bullseye
- 19:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P88534 and previous config saved to /var/cache/conftool/dbconfig/20260203-193049-marostegui.json
- 19:28 cmooney@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-fe1024
- 19:28 cmooney@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe1024
- 19:27 cmooney@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe1024
- 19:27 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-fe1024.eqiad.wmnet 205.48.64.10.in-addr.arpa 5.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 19:27 cmooney@cumin1003: START - Cookbook sre.dns.wipe-cache ms-fe1024.eqiad.wmnet 205.48.64.10.in-addr.arpa 5.0.2.0.8.4.0.0.4.6.0.0.0.1.0.0.7.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 19:27 cmooney@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:27 cmooney@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-fe1024 - cmooney@cumin1003"
- 19:27 cmooney@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-fe1024 - cmooney@cumin1003"
- 19:23 cmooney@cumin1003: START - Cookbook sre.dns.netbox
- 19:23 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-fe1021.eqiad.wmnet with OS bullseye
- 19:23 cmooney@cumin1003: START - Cookbook sre.hosts.move-vlan for host ms-fe1024
- 19:23 cmooney@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1024.eqiad.wmnet with OS bullseye
- 19:21 sukhe@dns1004: END - running authdns-update
- 19:20 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 19:20 sukhe: testing authdns-update (NOOP run)
- 19:20 sukhe@dns1004: START - running authdns-update
- 19:19 ryankemper@cumin2002: END (ERROR) - Cookbook sre.hadoop.reboot-workers (exit_code=97) for Hadoop analytics cluster
- 19:19 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 19:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T415786)', diff saved to https://phabricator.wikimedia.org/P88533 and previous config saved to /var/cache/conftool/dbconfig/20260203-191541-marostegui.json
- 19:15 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 19:15 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 19:12 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-fe1023
- 19:12 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-fe1023
- 19:11 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host ms-fe1023
- 19:11 jclark@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-fe1023.eqiad.wmnet 170.32.64.10.in-addr.arpa 0.7.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 19:11 jclark@cumin1003: START - Cookbook sre.dns.wipe-cache ms-fe1023.eqiad.wmnet 170.32.64.10.in-addr.arpa 0.7.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 19:11 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 19:11 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-fe1023 - jclark@cumin1003"
- 19:11 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-fe1023 - jclark@cumin1003"
- 19:09 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup1015.eqiad.wmnet with OS bookworm
- 19:04 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 19:04 jclark@cumin1003: START - Cookbook sre.hosts.move-vlan for host ms-fe1023
- 19:04 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1023.eqiad.wmnet with OS bullseye
- 18:56 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host ms-fe1021.eqiad.wmnet with OS bullseye
- 18:53 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2173 (T415786)', diff saved to https://phabricator.wikimedia.org/P88532 and previous config saved to /var/cache/conftool/dbconfig/20260203-185326-marostegui.json
- 18:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2173.codfw.wmnet with reason: Maintenance
- 18:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T415786)', diff saved to https://phabricator.wikimedia.org/P88531 and previous config saved to /var/cache/conftool/dbconfig/20260203-185302-marostegui.json
- 18:52 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 18:52 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 18:52 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
- 18:52 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
- 18:49 swfrench@deploy2002: Finished scap sync-world: Rebuild deployment to pick up new production image (duration: 46m 41s)
- 18:39 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P88530 and previous config saved to /var/cache/conftool/dbconfig/20260203-183753-marostegui.json
- 18:25 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:23 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1206 (T415786)', diff saved to https://phabricator.wikimedia.org/P88529 and previous config saved to /var/cache/conftool/dbconfig/20260203-182302-marostegui.json
- 18:22 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1206.eqiad.wmnet with reason: Maintenance
- 18:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P88528 and previous config saved to /var/cache/conftool/dbconfig/20260203-182245-marostegui.json
- 18:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T415786)', diff saved to https://phabricator.wikimedia.org/P88527 and previous config saved to /var/cache/conftool/dbconfig/20260203-182238-marostegui.json
- 18:20 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P88526 and previous config saved to /var/cache/conftool/dbconfig/20260203-181229-marostegui.json
- 18:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T415786)', diff saved to https://phabricator.wikimedia.org/P88525 and previous config saved to /var/cache/conftool/dbconfig/20260203-180737-marostegui.json
- 18:07 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:06 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1174 (T415786)', diff saved to https://phabricator.wikimedia.org/P88524 and previous config saved to /var/cache/conftool/dbconfig/20260203-180650-marostegui.json
- 18:06 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:06 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1174.eqiad.wmnet with reason: Maintenance
- 18:04 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:03 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 18:03 swfrench@deploy2002: Started scap sync-world: Rebuild deployment to pick up new production image
- 18:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P88523 and previous config saved to /var/cache/conftool/dbconfig/20260203-180221-marostegui.json
- 17:55 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:53 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:52 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T415786)', diff saved to https://phabricator.wikimedia.org/P88522 and previous config saved to /var/cache/conftool/dbconfig/20260203-175213-marostegui.json
- 17:51 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:49 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host backup1015
- 17:49 jclark@cumin1003: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host backup1015
- 17:48 jclark@cumin1003: START - Cookbook sre.network.configure-switch-interfaces for host backup1015
- 17:48 jclark@cumin1003: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) backup1015.eqiad.wmnet 169.32.64.10.in-addr.arpa 9.6.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 17:48 jclark@cumin1003: START - Cookbook sre.dns.wipe-cache backup1015.eqiad.wmnet 169.32.64.10.in-addr.arpa 9.6.1.0.2.3.0.0.4.6.0.0.0.1.0.0.3.0.1.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
- 17:48 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 17:48 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host backup1015 - jclark@cumin1003"
- 17:48 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host backup1015 - jclark@cumin1003"
- 17:48 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:46 sukhe: sudo cumin -b1 -s120 "A:dnsbox and not P{dns1004* or dns7001*}" "run-puppet-agent --enable 'merging CR 1230351'": T81605
- 17:46 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 17:46 swfrench-wmf: reprepro include php8.3_8.3.30-1+wmf11u2 in component/php83
- 17:45 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 17:45 jclark@cumin1003: START - Cookbook sre.hosts.move-vlan for host backup1015
- 17:45 jclark@cumin1003: START - Cookbook sre.hosts.reimage for host backup1015.eqiad.wmnet with OS bookworm
- 17:02 mutante: gerrit - deployed gerrit:1234269 to remove separate *qos* apache logs - deleted *qos* logs to fix disk space issues - back to 83% usage on / on gerrit1003
- 16:48 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
- 16:47 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
- 16:47 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
- 16:46 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
- 16:44 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
- 16:44 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
- 16:28 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1171.eqiad.wmnet with reason: Maintenance
- 16:28 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T415786)', diff saved to https://phabricator.wikimedia.org/P88521 and previous config saved to /var/cache/conftool/dbconfig/20260203-162833-marostegui.json
- 16:15 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2170 (T415786)', diff saved to https://phabricator.wikimedia.org/P88520 and previous config saved to /var/cache/conftool/dbconfig/20260203-161530-marostegui.json
- 16:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2170.codfw.wmnet with reason: Maintenance
- 16:15 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T415786)', diff saved to https://phabricator.wikimedia.org/P88519 and previous config saved to /var/cache/conftool/dbconfig/20260203-161506-marostegui.json
- 16:13 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P88517 and previous config saved to /var/cache/conftool/dbconfig/20260203-161325-marostegui.json
- 16:13 topranks: disable Hurricane Electric IPv6 BGP session on cr2-magru to troubleshoot ns2 IPv6 routing issue
- 16:11 tchin@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
- 16:10 tchin@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
- 16:06 tchin@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
- 16:05 tchin@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
- 16:04 tchin@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
- 16:04 tchin@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
- 15:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P88515 and previous config saved to /var/cache/conftool/dbconfig/20260203-155957-marostegui.json
- 15:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P88513 and previous config saved to /var/cache/conftool/dbconfig/20260203-155816-marostegui.json
- 15:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1196 (T415786)', diff saved to https://phabricator.wikimedia.org/P88511 and previous config saved to /var/cache/conftool/dbconfig/20260203-155713-marostegui.json
- 15:57 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
- 15:56 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1196.eqiad.wmnet with reason: Maintenance
- 15:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T415786)', diff saved to https://phabricator.wikimedia.org/P88510 and previous config saved to /var/cache/conftool/dbconfig/20260203-155628-marostegui.json
- 15:51 sfaci@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 15:50 sfaci@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/test-kitchen-next: apply
- 15:47 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org,service=authdns-ns2 [reason: testing authdns IPv6 change]
- 15:47 slyngshede@dns1004: END - running authdns-update
- 15:46 slyngshede@dns1004: START - running authdns-update
- 15:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P88508 and previous config saved to /var/cache/conftool/dbconfig/20260203-154619-marostegui.json
- 15:44 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 15:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P88507 and previous config saved to /var/cache/conftool/dbconfig/20260203-154449-marostegui.json
- 15:44 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org,service=authdns-ns2 [reason: testing authdns IPv6 change]
- 15:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T415786)', diff saved to https://phabricator.wikimedia.org/P88506 and previous config saved to /var/cache/conftool/dbconfig/20260203-154308-marostegui.json
- 15:39 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: testing authdns IPv6 change]
- 15:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P88505 and previous config saved to /var/cache/conftool/dbconfig/20260203-153611-marostegui.json
- 15:32 sukhe: sudo cumin "A:dnsbox" "disable-puppet 'merging CR 1230351'": T81605
- 15:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T415786)', diff saved to https://phabricator.wikimedia.org/P88504 and previous config saved to /var/cache/conftool/dbconfig/20260203-152941-marostegui.json
- 15:28 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: testing authdns IPv6 change]
- 15:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T415786)', diff saved to https://phabricator.wikimedia.org/P88503 and previous config saved to /var/cache/conftool/dbconfig/20260203-152602-marostegui.json
- 15:25 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 15:24 sukhe@cumin1003: END (PASS) - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns (exit_code=0) rolling restart_daemons on A:wikidough
- 15:18 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:17 cmooney@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.1 - enable IPv6 SAFI for DNS hosts - cmooney@cumin1003
- 15:16 cmooney@cumin1003: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1003.eqiad.wmnet with reason: Release v0.11.1 - enable IPv6 SAFI for DNS hosts - cmooney@cumin1003
- 15:15 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:15 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:12 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 15:10 sukhe@cumin1003: START - Cookbook sre.dns.roll-restart-reboot-wikimedia-dns rolling restart_daemons on A:wikidough
- 15:09 moritzm: installing openjdk-17 security updates
- 15:01 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:59 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:57 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 14:56 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:53 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:53 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:53 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:52 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:51 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 14:36 ayounsi@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 14:35 ayounsi@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 14:30 moritzm: installing bind9 security updates
- 14:26 ayounsi@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 14:25 ayounsi@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 14:25 elukey@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:24 ayounsi@cumin1003: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
- 14:24 ayounsi@cumin1003: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
- 14:24 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:23 elukey@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 14:04 elukey@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:58 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1170 (T415786)', diff saved to https://phabricator.wikimedia.org/P88501 and previous config saved to /var/cache/conftool/dbconfig/20260203-135840-marostegui.json
- 13:58 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1170.eqiad.wmnet with reason: Maintenance
- 13:58 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T415786)', diff saved to https://phabricator.wikimedia.org/P88500 and previous config saved to /var/cache/conftool/dbconfig/20260203-135813-marostegui.json
- 13:58 samtar@deploy2002: Finished scap sync-world: Backport for Remove unused SpecialMobileEditWatchlist::outputSubtitle() (T416294) (duration: 08m 03s)
- 13:53 samtar@deploy2002: samwilson, samtar: Continuing with sync
- 13:52 elukey@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:52 elukey@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:52 samtar@deploy2002: samwilson, samtar: Backport for Remove unused SpecialMobileEditWatchlist::outputSubtitle() (T416294) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 13:52 elukey@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:50 samtar@deploy2002: Started scap sync-world: Backport for Remove unused SpecialMobileEditWatchlist::outputSubtitle() (T416294)
- 13:45 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2153 (T415786)', diff saved to https://phabricator.wikimedia.org/P88498 and previous config saved to /var/cache/conftool/dbconfig/20260203-134514-marostegui.json
- 13:45 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2153.codfw.wmnet with reason: Maintenance
- 13:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T415786)', diff saved to https://phabricator.wikimedia.org/P88497 and previous config saved to /var/cache/conftool/dbconfig/20260203-134445-marostegui.json
- 13:43 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P88496 and previous config saved to /var/cache/conftool/dbconfig/20260203-134303-marostegui.json
- 13:38 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1195 (T415786)', diff saved to https://phabricator.wikimedia.org/P88495 and previous config saved to /var/cache/conftool/dbconfig/20260203-133818-marostegui.json
- 13:38 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1195.eqiad.wmnet with reason: Maintenance
- 13:37 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T415786)', diff saved to https://phabricator.wikimedia.org/P88494 and previous config saved to /var/cache/conftool/dbconfig/20260203-133754-marostegui.json
- 13:31 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P88493 and previous config saved to /var/cache/conftool/dbconfig/20260203-132936-marostegui.json
- 13:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P88492 and previous config saved to /var/cache/conftool/dbconfig/20260203-132755-marostegui.json
- 13:27 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P88491 and previous config saved to /var/cache/conftool/dbconfig/20260203-132745-marostegui.json
- 13:21 joal@deploy2002: Finished deploy [analytics/refinery@fc72bd3]: Regular analytics weekly train [analytics/refinery@fc72bd31] (duration: 07m 11s)
- 13:20 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:20 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:19 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P88490 and previous config saved to /var/cache/conftool/dbconfig/20260203-131735-marostegui.json
- 13:16 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:14 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1024.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:14 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1023.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:14 joal@deploy2002: Started deploy [analytics/refinery@fc72bd3]: Regular analytics weekly train [analytics/refinery@fc72bd31]
- 13:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P88489 and previous config saved to /var/cache/conftool/dbconfig/20260203-131424-marostegui.json
- 13:14 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1022.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:14 joal@deploy2002: Finished deploy [analytics/refinery@fc72bd3] (thin): Regular analytics weekly train THIN [analytics/refinery@fc72bd31] (duration: 01m 20s)
- 13:14 jclark@cumin1003: START - Cookbook sre.hosts.provision for host ms-fe1021.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 13:12 joal@deploy2002: Started deploy [analytics/refinery@fc72bd3] (thin): Regular analytics weekly train THIN [analytics/refinery@fc72bd31]
- 13:12 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T415786)', diff saved to https://phabricator.wikimedia.org/P88488 and previous config saved to /var/cache/conftool/dbconfig/20260203-131245-marostegui.json
- 13:12 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 13:12 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ms-fe - jclark@cumin1003"
- 13:12 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt ms-fe - jclark@cumin1003"
- 13:12 joal@deploy2002: Finished deploy [analytics/refinery@fc72bd3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc72bd31] (duration: 01m 01s)
- 13:10 joal@deploy2002: Started deploy [analytics/refinery@fc72bd3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@fc72bd31]
- 13:08 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 13:07 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T415786)', diff saved to https://phabricator.wikimedia.org/P88487 and previous config saved to /var/cache/conftool/dbconfig/20260203-130724-marostegui.json
- 12:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T415786)', diff saved to https://phabricator.wikimedia.org/P88486 and previous config saved to /var/cache/conftool/dbconfig/20260203-125912-marostegui.json
- 12:28 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:25 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:24 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:22 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:22 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:20 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:17 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:09 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1158 (T415786)', diff saved to https://phabricator.wikimedia.org/P88485 and previous config saved to /var/cache/conftool/dbconfig/20260203-120905-marostegui.json
- 12:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
- 12:08 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1158.eqiad.wmnet with reason: Maintenance
- 12:06 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 12:06 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1223: After schema change
- 12:06 jclark@cumin1003: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:52 jclark@cumin1003: START - Cookbook sre.hosts.provision for host backup1015.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:47 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:47 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1015 - jclark@cumin1003"
- 11:47 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt backup1015 - jclark@cumin1003"
- 11:43 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 11:41 jclark@cumin1003: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host bast1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:22 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1186 (T415786)', diff saved to https://phabricator.wikimedia.org/P88481 and previous config saved to /var/cache/conftool/dbconfig/20260203-112156-marostegui.json
- 11:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1186.eqiad.wmnet with reason: Maintenance
- 11:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T415786)', diff saved to https://phabricator.wikimedia.org/P88480 and previous config saved to /var/cache/conftool/dbconfig/20260203-112130-marostegui.json
- 11:20 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db1223: After schema change
- 11:20 jclark@cumin1003: START - Cookbook sre.hosts.provision for host bast1004.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
- 11:18 jclark@cumin1003: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
- 11:18 jclark@cumin1003: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt bast1004 - jclark@cumin1003"
- 11:18 jclark@cumin1003: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt bast1004 - jclark@cumin1003"
- 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2146 (T415786)', diff saved to https://phabricator.wikimedia.org/P88478 and previous config saved to /var/cache/conftool/dbconfig/20260203-111636-marostegui.json
- 11:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2146.codfw.wmnet with reason: Maintenance
- 11:16 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T415786)', diff saved to https://phabricator.wikimedia.org/P88477 and previous config saved to /var/cache/conftool/dbconfig/20260203-111607-marostegui.json
- 11:12 jclark@cumin1003: START - Cookbook sre.dns.netbox
- 11:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P88476 and previous config saved to /var/cache/conftool/dbconfig/20260203-111120-marostegui.json
- 11:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P88475 and previous config saved to /var/cache/conftool/dbconfig/20260203-110108-marostegui.json
- 11:01 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P88474 and previous config saved to /var/cache/conftool/dbconfig/20260203-110057-marostegui.json
- 10:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T415786)', diff saved to https://phabricator.wikimedia.org/P88473 and previous config saved to /var/cache/conftool/dbconfig/20260203-105059-marostegui.json
- 10:45 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P88472 and previous config saved to /var/cache/conftool/dbconfig/20260203-104547-marostegui.json
- 10:33 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Translate an article' 'Event:Celebrate Women/Translate an article' Ammarpad # T416031
- 10:30 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T415786)', diff saved to https://phabricator.wikimedia.org/P88471 and previous config saved to /var/cache/conftool/dbconfig/20260203-103037-marostegui.json
- 10:29 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2192: After schema change
- 10:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-wmde: apply
- 10:29 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-wmde: apply
- 10:28 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2214: After schema change
- 10:28 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2161: After schema change
- 10:28 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-wikidata: apply
- 10:27 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-wikidata: apply
- 10:26 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: apply
- 10:24 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Suggested activities' 'Event:Celebrate Women/Suggested activities' Ammarpad # T416031
- 10:22 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-test-k8s: apply
- 10:17 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 10:17 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-sre: apply
- 10:16 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-search: apply
- 10:16 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-search: apply
- 10:16 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Resources' 'Event:Celebrate Women/Resources' Ammarpad # T416031
- 10:16 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-research: apply
- 10:15 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-research: apply
- 10:15 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
- 10:14 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-main: apply
- 10:13 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-platform-eng: apply
- 10:13 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-platform-eng: apply
- 10:12 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-ml: apply
- 10:12 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-ml: apply
- 10:11 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-dev: apply
- 10:11 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-dev: apply
- 10:11 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-analytics-product: apply
- 10:10 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-analytics-product: apply
- 10:10 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/postgresql-airflow-analytics-test: apply
- 10:10 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/postgresql-airflow-analytics-test: apply
- 10:09 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Learn how Wikipedia works' 'Event:Celebrate Women/Learn how Wikipedia works' Ammarpad # T416031
- 09:48 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1223.eqiad.wmnet with reason: Maintenance
- 09:44 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db2192: After schema change
- 09:43 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db2214: After schema change
- 09:42 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db2161: After schema change
- 09:41 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1223 T416298', diff saved to https://phabricator.wikimedia.org/P88457 and previous config saved to /var/cache/conftool/dbconfig/20260203-094116-marostegui.json
- 09:40 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1189 to s3 primary T416298', diff saved to https://phabricator.wikimedia.org/P88456 and previous config saved to /var/cache/conftool/dbconfig/20260203-094038-marostegui.json
- 09:38 marostegui: Starting s3 eqiad failover from db1223 to db1189 - T416298
- 09:37 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T416298
- 09:37 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1189 with weight 0 T416298', diff saved to https://phabricator.wikimedia.org/P88455 and previous config saved to /var/cache/conftool/dbconfig/20260203-093736-marostegui.json
- 09:17 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Improve an article' 'Event:Celebrate Women/Improve an article' Ammarpad # T416031
- 09:13 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.46.0-wmf.14 refs T413805
- 09:11 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1169 (T415786)', diff saved to https://phabricator.wikimedia.org/P88454 and previous config saved to /var/cache/conftool/dbconfig/20260203-091110-marostegui.json
- 09:11 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1169.eqiad.wmnet with reason: Maintenance
- 09:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T415786)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260203-091039-marostegui.json
- 09:00 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P88452 and previous config saved to /var/cache/conftool/dbconfig/20260203-090031-marostegui.json
- 08:59 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Events/2025' 'Event:Celebrate Women/Events/2025' Ammarpad # T416031
- 08:50 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P88451 and previous config saved to /var/cache/conftool/dbconfig/20260203-085022-marostegui.json
- 08:49 moritzm: installing libcommons-lang3-java security updates
- 08:47 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2145 (T415786)', diff saved to https://phabricator.wikimedia.org/P88450 and previous config saved to /var/cache/conftool/dbconfig/20260203-084737-marostegui.json
- 08:47 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2145.codfw.wmnet with reason: Maintenance
- 08:45 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Events/2024' 'Event:Celebrate Women/Events/2024' Ammarpad # T416031
- 08:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T415786)', diff saved to https://phabricator.wikimedia.org/P88449 and previous config saved to /var/cache/conftool/dbconfig/20260203-084014-marostegui.json
- 08:38 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Events' 'Event:Celebrate Women/Events' Ammarpad # T416031
- 08:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2003.wikimedia.org
- 08:27 moritzm: failover irc.wikimedia.org to irc1003.wikimedia.org
- 08:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2003.wikimedia.org
- 08:25 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Create an article' 'Event:Celebrate Women/Create an article' Ammarpad # T416031
- 08:21 jmm@dns1004: END - running authdns-update
- 08:20 jmm@dns1004: START - running authdns-update
- 08:19 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women/Add citations' 'Event:Celebrate Women/Add citations' Ammarpad # T416031
- 08:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1003.wikimedia.org
- 08:15 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1003.wikimedia.org
- 08:12 Ammar: Ran refreshImageMetadata.php for multiple files for T414643
- 07:34 ammarpad@deploy2002: mwscript-k8s job started: extensions/Translate/scripts/moveTranslatableBundle.php --wiki=metawiki --reason 'Requested at phab:T416031' 'Celebrate Women' 'Event:Celebrate Women' Ammarpad # T416031
- 07:24 moritzm: installing openssl security updates
- 07:09 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2141.codfw.wmnet with reason: Maintenance
- 06:55 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1163 (T415786)', diff saved to https://phabricator.wikimedia.org/P88448 and previous config saved to /var/cache/conftool/dbconfig/20260203-065541-marostegui.json
- 06:55 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1163.eqiad.wmnet with reason: Maintenance
- 06:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2161.codfw.wmnet with reason: schema change
- 06:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2192.codfw.wmnet with reason: Maintenance
- 06:14 marostegui@dns1006: END - running authdns-update
- 06:13 marostegui@dns1006: START - running authdns-update
- 06:11 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2192 T415900', diff saved to https://phabricator.wikimedia.org/P88447 and previous config saved to /var/cache/conftool/dbconfig/20260203-061142-marostegui.json
- 06:10 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2213 to s5 primary and set section read-write T415900', diff saved to https://phabricator.wikimedia.org/P88446 and previous config saved to /var/cache/conftool/dbconfig/20260203-061025-marostegui.json
- 06:10 marostegui@cumin1003: dbctl commit (dc=all): 'Set s5 codfw as read-only for maintenance - T415900', diff saved to https://phabricator.wikimedia.org/P88445 and previous config saved to /var/cache/conftool/dbconfig/20260203-061002-marostegui.json
- 06:04 marostegui: Starting s5 codfw failover from db2192 to db2213 - T415900
- 06:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 23 hosts with reason: Primary switchover s5 T415900
- 06:04 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2213 with weight 0 T415900', diff saved to https://phabricator.wikimedia.org/P88444 and previous config saved to /var/cache/conftool/dbconfig/20260203-060411-marostegui.json
- 06:03 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2214.codfw.wmnet with reason: Maintenance
- 06:00 marostegui@dns1006: END - running authdns-update
- 06:00 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2214 T415862', diff saved to https://phabricator.wikimedia.org/P88443 and previous config saved to /var/cache/conftool/dbconfig/20260203-060000-marostegui.json
- 05:59 marostegui@dns1006: START - running authdns-update
- 05:58 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2229 to s6 primary and set section read-write T415862', diff saved to https://phabricator.wikimedia.org/P88442 and previous config saved to /var/cache/conftool/dbconfig/20260203-055844-marostegui.json
- 05:58 marostegui@cumin1003: dbctl commit (dc=all): 'Set s6 codfw as read-only for maintenance - T415862', diff saved to https://phabricator.wikimedia.org/P88441 and previous config saved to /var/cache/conftool/dbconfig/20260203-055823-marostegui.json
- 05:51 marostegui: Starting s6 codfw failover from db2214 to db2229 - T415862
- 05:50 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 22 hosts with reason: Primary switchover s6 T415862
- 05:50 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2229 with weight 0 T415862', diff saved to https://phabricator.wikimedia.org/P88440 and previous config saved to /var/cache/conftool/dbconfig/20260203-055010-marostegui.json
- 05:02 mwpresync@deploy2002: Pruned MediaWiki: 1.46.0-wmf.11 (duration: 02m 53s)
- 04:48 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.46.0-wmf.14 refs T413805 (duration: 44m 29s)
- 04:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.46.0-wmf.14 refs T413805
- 03:07 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2239.codfw.wmnet with reason: Maintenance
- 03:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T415786)', diff saved to https://phabricator.wikimedia.org/P88439 and previous config saved to /var/cache/conftool/dbconfig/20260203-030644-marostegui.json
- 02:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P88438 and previous config saved to /var/cache/conftool/dbconfig/20260203-025135-marostegui.json
- 02:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227', diff saved to https://phabricator.wikimedia.org/P88437 and previous config saved to /var/cache/conftool/dbconfig/20260203-023627-marostegui.json
- 02:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2227 (T415786)', diff saved to https://phabricator.wikimedia.org/P88436 and previous config saved to /var/cache/conftool/dbconfig/20260203-022119-marostegui.json
- 02:13 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 12m 39s)
- 02:05 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
- 00:15 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2227 (T415786)', diff saved to https://phabricator.wikimedia.org/P88435 and previous config saved to /var/cache/conftool/dbconfig/20260203-001511-marostegui.json
- 00:15 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2227.codfw.wmnet with reason: Maintenance
- 00:14 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T415786)', diff saved to https://phabricator.wikimedia.org/P88434 and previous config saved to /var/cache/conftool/dbconfig/20260203-001445-marostegui.json
- 00:04 robh: eqsin cp5022 troubleshooting onsite in progress
2026-02-02
- 23:59 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P88433 and previous config saved to /var/cache/conftool/dbconfig/20260202-235937-marostegui.json
- 23:44 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P88432 and previous config saved to /var/cache/conftool/dbconfig/20260202-234429-marostegui.json
- 23:29 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T415786)', diff saved to https://phabricator.wikimedia.org/P88431 and previous config saved to /var/cache/conftool/dbconfig/20260202-232921-marostegui.json
- 22:40 herron: added 500G to the lv on mwlog1002
- 22:24 inflatador: bking@apt1002 `sudo -E reprepro -C thirdparty/opensearch3 copy trixie-wikimedia bookworm-wikimedia opensearch`
- 22:19 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
- 22:19 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T415786)', diff saved to https://phabricator.wikimedia.org/P88430 and previous config saved to /var/cache/conftool/dbconfig/20260202-221912-marostegui.json
- 22:04 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P88429 and previous config saved to /var/cache/conftool/dbconfig/20260202-220404-marostegui.json
- 21:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P88428 and previous config saved to /var/cache/conftool/dbconfig/20260202-214855-marostegui.json
- 21:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T415786)', diff saved to https://phabricator.wikimedia.org/P88427 and previous config saved to /var/cache/conftool/dbconfig/20260202-213347-marostegui.json
- 21:27 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2209 (T415786)', diff saved to https://phabricator.wikimedia.org/P88426 and previous config saved to /var/cache/conftool/dbconfig/20260202-212703-marostegui.json
- 21:26 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
- 21:26 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T415786)', diff saved to https://phabricator.wikimedia.org/P88425 and previous config saved to /var/cache/conftool/dbconfig/20260202-212638-marostegui.json
- 21:16 kemayo@deploy2002: Finished scap sync-world: Backport for Edit check: turn off the tone a/b test on frwiki, jawiki, ptwiki (T411914), Enable suggestions BetaFeature on beta wikis (T415504), WikimediaCustomizations: Set WMCBadEmailDomainsFile (T397244), filebackend: Clean up removed config params for multi-write backends (T328872) (duration: 10
- 21:12 kemayo@deploy2002: tgr, func, kemayo, esanders: Continuing with sync
- 21:11 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P88424 and previous config saved to /var/cache/conftool/dbconfig/20260202-211129-marostegui.json
- 21:07 kemayo@deploy2002: tgr, func, kemayo, esanders: Backport for Edit check: turn off the tone a/b test on frwiki, jawiki, ptwiki (T411914), Enable suggestions BetaFeature on beta wikis (T415504), WikimediaCustomizations: Set WMCBadEmailDomainsFile (T397244), filebackend: Clean up removed config params for multi-write backends (T328872) synced to
- 21:05 kemayo@deploy2002: Started scap sync-world: Backport for Edit check: turn off the tone a/b test on frwiki, jawiki, ptwiki (T411914), Enable suggestions BetaFeature on beta wikis (T415504), WikimediaCustomizations: Set WMCBadEmailDomainsFile (T397244), filebackend: Clean up removed config params for multi-write backends (T328872)
- 20:56 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P88423 and previous config saved to /var/cache/conftool/dbconfig/20260202-205621-marostegui.json
- 20:41 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T415786)', diff saved to https://phabricator.wikimedia.org/P88422 and previous config saved to /var/cache/conftool/dbconfig/20260202-204113-marostegui.json
- 20:24 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1212 (T415786)', diff saved to https://phabricator.wikimedia.org/P88421 and previous config saved to /var/cache/conftool/dbconfig/20260202-202451-marostegui.json
- 20:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on 6 hosts with reason: Maintenance
- 20:24 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1212.eqiad.wmnet with reason: Maintenance
- 20:24 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T415786)', diff saved to https://phabricator.wikimedia.org/P88420 and previous config saved to /var/cache/conftool/dbconfig/20260202-202404-marostegui.json
- 20:08 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P88419 and previous config saved to /var/cache/conftool/dbconfig/20260202-200855-marostegui.json
- 19:53 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P88418 and previous config saved to /var/cache/conftool/dbconfig/20260202-195345-marostegui.json
- 19:38 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T415786)', diff saved to https://phabricator.wikimedia.org/P88417 and previous config saved to /var/cache/conftool/dbconfig/20260202-193837-marostegui.json
- 18:42 bvibber@deploy2002: helmfile [eqiad] DONE helmfile.d/services/chart-renderer: apply
- 18:42 bvibber@deploy2002: helmfile [eqiad] START helmfile.d/services/chart-renderer: apply
- 18:41 bvibber@deploy2002: helmfile [codfw] DONE helmfile.d/services/chart-renderer: apply
- 18:41 bvibber@deploy2002: helmfile [codfw] START helmfile.d/services/chart-renderer: apply
- 18:40 bvibber@deploy2002: helmfile [staging] DONE helmfile.d/services/chart-renderer: apply
- 18:40 bvibber@deploy2002: helmfile [staging] START helmfile.d/services/chart-renderer: apply
- 18:33 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2194 (T415786)', diff saved to https://phabricator.wikimedia.org/P88416 and previous config saved to /var/cache/conftool/dbconfig/20260202-183312-marostegui.json
- 18:33 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2194.codfw.wmnet with reason: Maintenance
- 18:32 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T415786)', diff saved to https://phabricator.wikimedia.org/P88415 and previous config saved to /var/cache/conftool/dbconfig/20260202-183248-marostegui.json
- 18:22 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1198 (T415786)', diff saved to https://phabricator.wikimedia.org/P88414 and previous config saved to /var/cache/conftool/dbconfig/20260202-182210-marostegui.json
- 18:22 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
- 18:21 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T415786)', diff saved to https://phabricator.wikimedia.org/P88413 and previous config saved to /var/cache/conftool/dbconfig/20260202-182144-marostegui.json
- 18:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P88412 and previous config saved to /var/cache/conftool/dbconfig/20260202-181739-marostegui.json
- 18:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P88411 and previous config saved to /var/cache/conftool/dbconfig/20260202-180633-marostegui.json
- 18:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P88410 and previous config saved to /var/cache/conftool/dbconfig/20260202-180230-marostegui.json
- 17:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P88409 and previous config saved to /var/cache/conftool/dbconfig/20260202-175125-marostegui.json
- 17:47 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T415786)', diff saved to https://phabricator.wikimedia.org/P88408 and previous config saved to /var/cache/conftool/dbconfig/20260202-174721-marostegui.json
- 17:36 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T415786)', diff saved to https://phabricator.wikimedia.org/P88407 and previous config saved to /var/cache/conftool/dbconfig/20260202-173616-marostegui.json
- 16:49 elukey@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-sre: sync
- 16:48 elukey@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-sre: sync
- 16:42 dancy@deploy2002: Installation of scap version "4.241.0" completed for 2 hosts
- 16:40 dancy@deploy2002: Installing scap version "4.241.0" for 2 host(s)
- 16:20 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1189 (T415786)', diff saved to https://phabricator.wikimedia.org/P88406 and previous config saved to /var/cache/conftool/dbconfig/20260202-162042-marostegui.json
- 16:20 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
- 16:20 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T415786)', diff saved to https://phabricator.wikimedia.org/P88405 and previous config saved to /var/cache/conftool/dbconfig/20260202-162017-marostegui.json
- 16:05 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20260202-160504-marostegui.json
- 15:53 root@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Primary switchover test-s4 None
- 15:49 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P88403 and previous config saved to /var/cache/conftool/dbconfig/20260202-154956-marostegui.json
- 15:40 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2190 (T415786)', diff saved to https://phabricator.wikimedia.org/P88402 and previous config saved to /var/cache/conftool/dbconfig/20260202-154038-marostegui.json
- 15:40 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2190.codfw.wmnet with reason: Maintenance
- 15:40 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T415786)', diff saved to https://phabricator.wikimedia.org/P88401 and previous config saved to /var/cache/conftool/dbconfig/20260202-154013-marostegui.json
- 15:34 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T415786)', diff saved to https://phabricator.wikimedia.org/P88400 and previous config saved to /var/cache/conftool/dbconfig/20260202-153447-marostegui.json
- 15:25 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P88399 and previous config saved to /var/cache/conftool/dbconfig/20260202-152503-marostegui.json
- 15:19 moritzm: restarting Mailman on lists1004 to pick up openssl security updates
- 15:13 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:11 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 15:10 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 15:09 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P88398 and previous config saved to /var/cache/conftool/dbconfig/20260202-150955-marostegui.json
- 15:07 moritzm: restarting Exim on lists1004 to pick up openssl security updates
- 15:00 moritzm: restarting mailman-web on lists1004 to pick up openssl security updates
- 15:00 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:59 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:58 Lucas_WMDE: UTC afternoon backport+config window done
- 14:56 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Enable Wikibase GraphQL on beta wikidata (T415516) (duration: 10m 30s)
- 14:54 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T415786)', diff saved to https://phabricator.wikimedia.org/P88397 and previous config saved to /var/cache/conftool/dbconfig/20260202-145445-marostegui.json
- 14:52 lucaswerkmeister-wmde@deploy2002: jakob, lucaswerkmeister-wmde: Continuing with sync
- 14:51 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:48 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:47 lucaswerkmeister-wmde@deploy2002: jakob, lucaswerkmeister-wmde: Backport for Enable Wikibase GraphQL on beta wikidata (T415516) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:45 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable Wikibase GraphQL on beta wikidata (T415516)
- 14:44 arnoldokoth: restart vrts-daemon on vrts1003
- 14:39 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:36 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:36 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:35 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:35 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:33 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
- 14:33 dpogorzelski@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
- 14:27 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for zhwiki: Remove extra autoconfirmed limit for Tor user (T415335) (duration: 07m 51s)
- 14:23 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, stang: Continuing with sync
- 14:21 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, stang: Backport for zhwiki: Remove extra autoconfirmed limit for Tor user (T415335) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:19 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for zhwiki: Remove extra autoconfirmed limit for Tor user (T415335)
- 14:19 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1175 (T415786)', diff saved to https://phabricator.wikimedia.org/P88396 and previous config saved to /var/cache/conftool/dbconfig/20260202-141910-marostegui.json
- 14:19 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
- 14:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T415786)', diff saved to https://phabricator.wikimedia.org/P88395 and previous config saved to /var/cache/conftool/dbconfig/20260202-141844-marostegui.json
- 14:17 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Update ext-EventStreamConfig (T415638) (duration: 10m 45s)
- 14:13 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, joal: Continuing with sync
- 14:08 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, joal: Backport for Update ext-EventStreamConfig (T415638) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 14:07 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
- 14:06 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Update ext-EventStreamConfig (T415638)
- 14:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
- 14:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P88394 and previous config saved to /var/cache/conftool/dbconfig/20260202-140336-marostegui.json
- 14:02 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 14:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
- 13:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P88393 and previous config saved to /var/cache/conftool/dbconfig/20260202-134827-marostegui.json
- 13:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T415786)', diff saved to https://phabricator.wikimedia.org/P88392 and previous config saved to /var/cache/conftool/dbconfig/20260202-133319-marostegui.json
- 13:27 moritzm: installing Postgresql 15 security updates
- 13:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
- 13:17 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.hiddenparma (exit_code=0) Hiddenparma deployment to the alerting hosts with reason: "Fix bugs with no reason policy and haproxy actions - oblivian@cumin1003"
- 13:17 oblivian@cumin1003: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix bugs with no reason policy and haproxy actions - oblivian@cumin1003
- 13:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
- 13:16 oblivian@cumin1003: START - Cookbook sre.deploy.python-code hiddenparma to alert[1002,2002].wikimedia.org with reason: Fix bugs with no reason policy and haproxy actions - oblivian@cumin1003
- 13:16 oblivian@cumin1003: START - Cookbook sre.deploy.hiddenparma Hiddenparma deployment to the alerting hosts with reason: "Fix bugs with no reason policy and haproxy actions - oblivian@cumin1003"
- 12:55 moritzm: restarting Postfix on the MXes to pick up OpenSSL security updates
- 12:54 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1193: After schema change
- 12:54 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db1193: After schema change
- 12:46 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.newpool (exit_code=99) pool db1193: After schema change
- 12:45 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db1222: After schema change
- 12:37 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2177 (T415786)', diff saved to https://phabricator.wikimedia.org/P88389 and previous config saved to /var/cache/conftool/dbconfig/20260202-123726-marostegui.json
- 12:37 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
- 12:37 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T415786)', diff saved to https://phabricator.wikimedia.org/P88388 and previous config saved to /var/cache/conftool/dbconfig/20260202-123712-marostegui.json
- 12:37 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Samuel (WMF) out of all services on: 2487 hosts
- 12:33 moritzm: restarting nginx on puppetdb hosts
- 12:31 jmm@cumin2002: DONE (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: sretest2006.codfw.wmnet
- 12:30 slyngshede@dns1004: END - running authdns-update
- 12:29 slyngshede@dns1004: START - running authdns-update
- 12:27 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry (exit_code=0) rolling restart_daemons on A:docker-registry
- 12:22 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P88385 and previous config saved to /var/cache/conftool/dbconfig/20260202-122203-marostegui.json
- 12:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1166 (T415786)', diff saved to https://phabricator.wikimedia.org/P88384 and previous config saved to /var/cache/conftool/dbconfig/20260202-121735-marostegui.json
- 12:17 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
- 12:17 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88383 and previous config saved to /var/cache/conftool/dbconfig/20260202-121707-marostegui.json
- 12:08 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-docker-registry rolling restart_daemons on A:docker-registry
- 12:06 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P88380 and previous config saved to /var/cache/conftool/dbconfig/20260202-120654-marostegui.json
- 12:02 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P88379 and previous config saved to /var/cache/conftool/dbconfig/20260202-120157-marostegui.json
- 12:00 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db1193: After schema change
- 12:00 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db1222: After schema change
- 11:58 marostegui@cumin1003: END (FAIL) - Cookbook sre.mysql.newpool (exit_code=99) pool db1222: After schema change
- 11:57 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db1222: After schema change
- 11:51 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T415786)', diff saved to https://phabricator.wikimedia.org/P88376 and previous config saved to /var/cache/conftool/dbconfig/20260202-115142-marostegui.json
- 11:46 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P88375 and previous config saved to /var/cache/conftool/dbconfig/20260202-114648-marostegui.json
- 11:46 slyngshede@cumin1003: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AUgolnikova out of all services on: 2487 hosts
- 11:31 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88374 and previous config saved to /var/cache/conftool/dbconfig/20260202-113139-marostegui.json
- 11:16 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-eqiad
- 11:14 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-eqiad
- 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas (exit_code=0) rolling restart_daemons on A:schema-codfw
- 11:06 jmm@cumin2002: START - Cookbook sre.misc-clusters.roll-restart-reboot-eventschemas rolling restart_daemons on A:schema-codfw
- 10:45 moritzm: restarting Bitu on idm*
- 10:36 marostegui@cumin1003: END (PASS) - Cookbook sre.mysql.newpool (exit_code=0) pool db2249: After reimage
- 10:20 dpogorzelski@cumin1003: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster ml-staging-codfw: Kubernetes upgrade
- 10:17 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db1157 (T415786)', diff saved to https://phabricator.wikimedia.org/P88371 and previous config saved to /var/cache/conftool/dbconfig/20260202-101658-marostegui.json
- 10:16 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
- 09:51 marostegui@cumin1003: START - Cookbook sre.mysql.newpool pool db2249: After reimage
- 09:50 dpogorzelski@cumin1003: END (FAIL) - Cookbook sre.k8s.wipe-cluster (exit_code=99) Wipe the K8s cluster ml-staging-codfw: Kubernetes upgrade
- 09:46 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2249.codfw.wmnet with OS trixie
- 09:45 ihurbain@deploy2002: Finished scap sync-world: Backport for Upgrading psy/psysh (v0.12.10 => v0.12.19) (T416050), Bump wikimedia/parsoid to 0.23.0-a13.1 (T415328), Bump wikimedia/parsoid to 0.23.0-a13.1 (T415888 T415328) (duration: 06m 36s)
- 09:40 ihurbain@deploy2002: reedy, cscott, ihurbain: Continuing with sync
- 09:40 ihurbain@deploy2002: reedy, cscott, ihurbain: Backport for Upgrading psy/psysh (v0.12.10 => v0.12.19) (T416050), Bump wikimedia/parsoid to 0.23.0-a13.1 (T415328), Bump wikimedia/parsoid to 0.23.0-a13.1 (T415888 T415328) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 09:39 dpogorzelski@cumin1003: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster ml-staging-codfw: Kubernetes upgrade
- 09:38 ihurbain@deploy2002: Started scap sync-world: Backport for Upgrading psy/psysh (v0.12.10 => v0.12.19) (T416050), Bump wikimedia/parsoid to 0.23.0-a13.1 (T415328), Bump wikimedia/parsoid to 0.23.0-a13.1 (T415888 T415328)
- 09:35 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) depool all services in codfw/ml-staging-codfw: maintenance
- 09:35 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster depool all services in codfw/ml-staging-codfw: maintenance
- 09:34 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2156 (T415786)', diff saved to https://phabricator.wikimedia.org/P88368 and previous config saved to /var/cache/conftool/dbconfig/20260202-093418-marostegui.json
- 09:34 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
- 09:33 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T415786)', diff saved to https://phabricator.wikimedia.org/P88367 and previous config saved to /var/cache/conftool/dbconfig/20260202-093354-marostegui.json
- 09:33 dpogorzelski@cumin1003: END (PASS) - Cookbook sre.k8s.pool-depool-cluster (exit_code=0) depool all services in codfw/ml-staging-codfw: maintenance
- 09:33 dpogorzelski@cumin1003: START - Cookbook sre.k8s.pool-depool-cluster depool all services in codfw/ml-staging-codfw: maintenance
- 09:27 elukey: cleanup nginx-related packages and configs from urldownloader hosts to clean up alerts - T405631
- 09:24 marostegui@cumin1003: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2249.codfw.wmnet with reason: host reimage
- 09:18 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P88366 and previous config saved to /var/cache/conftool/dbconfig/20260202-091845-marostegui.json
- 09:17 marostegui@cumin1003: START - Cookbook sre.hosts.downtime for 2:00:00 on db2249.codfw.wmnet with reason: host reimage
- 09:07 marostegui@cumin1003: START - Cookbook sre.hosts.reimage for host db2249.codfw.wmnet with OS trixie
- 09:04 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db2249.codfw.wmnet with reason: Reimage to debian trixie
- 09:03 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P88365 and previous config saved to /var/cache/conftool/dbconfig/20260202-090337-marostegui.json
- 09:03 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2249 T415358', diff saved to https://phabricator.wikimedia.org/P88364 and previous config saved to /var/cache/conftool/dbconfig/20260202-090328-marostegui.json
- 08:56 kharlan@deploy2002: Finished scap sync-world: Backport for BlockUtils: Log x-provenance and IP reputation fields (T415354) (duration: 10m 05s)
- 08:50 kharlan@deploy2002: kharlan: Continuing with sync
- 08:48 kharlan@deploy2002: kharlan: Backport for BlockUtils: Log x-provenance and IP reputation fields (T415354) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:48 marostegui@cumin1003: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T415786)', diff saved to https://phabricator.wikimedia.org/P88363 and previous config saved to /var/cache/conftool/dbconfig/20260202-084806-marostegui.json
- 08:46 kharlan@deploy2002: Started scap sync-world: Backport for BlockUtils: Log x-provenance and IP reputation fields (T415354)
- 08:45 kharlan@deploy2002: Finished scap sync-world: Backport for Enable watchlist labels everywhere (prod and beta) (T413967) (duration: 41m 47s)
- 08:31 kharlan@deploy2002: kharlan, samwilson: Continuing with sync
- 08:27 kharlan@deploy2002: kharlan, samwilson: Backport for Enable watchlist labels everywhere (prod and beta) (T413967) synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.
- 08:12 moritzm: installing openssl security updates
- 08:09 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
- 08:04 kharlan@deploy2002: Started scap sync-world: Backport for Enable watchlist labels everywhere (prod and beta) (T413967)
- 08:02 joal: Restarting druid middle-managers to recover from OOM - T415799
- 06:33 marostegui@cumin1003: dbctl commit (dc=all): 'Depooling db2149 (T415786)', diff saved to https://phabricator.wikimedia.org/P88361 and previous config saved to /var/cache/conftool/dbconfig/20260202-063304-marostegui.json
- 06:32 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
- 06:27 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1222.eqiad.wmnet with reason: Maintenance
- 06:25 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1222 T415983', diff saved to https://phabricator.wikimedia.org/P88360 and previous config saved to /var/cache/conftool/dbconfig/20260202-062554-marostegui.json
- 06:25 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1162 to s2 primary T415983', diff saved to https://phabricator.wikimedia.org/P88359 and previous config saved to /var/cache/conftool/dbconfig/20260202-062522-marostegui.json
- 06:23 marostegui: Starting s2 eqiad failover from db1222 to db1162 - T415983
- 06:22 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1162 with weight 0 T415983', diff saved to https://phabricator.wikimedia.org/P88358 and previous config saved to /var/cache/conftool/dbconfig/20260202-062212-marostegui.json
- 06:21 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T415983
- 06:14 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2161.codfw.wmnet with reason: long schema change
- 06:13 marostegui@dns1006: END - running authdns-update
- 06:13 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db2161 T415748', diff saved to https://phabricator.wikimedia.org/P88357 and previous config saved to /var/cache/conftool/dbconfig/20260202-061310-marostegui.json
- 06:12 marostegui@dns1006: START - running authdns-update
- 06:12 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db2165 to s8 primary and set section read-write T415748', diff saved to https://phabricator.wikimedia.org/P88356 and previous config saved to /var/cache/conftool/dbconfig/20260202-061217-marostegui.json
- 06:11 marostegui@cumin1003: dbctl commit (dc=all): 'Set s8 codfw as read-only for maintenance - T415748', diff saved to https://phabricator.wikimedia.org/P88355 and previous config saved to /var/cache/conftool/dbconfig/20260202-061150-marostegui.json
- 06:11 marostegui: Starting s8 codfw failover from db2161 to db2165 - T415748
- 06:04 marostegui@cumin1003: dbctl commit (dc=all): 'Set db2165 with weight 0 T415748', diff saved to https://phabricator.wikimedia.org/P88354 and previous config saved to /var/cache/conftool/dbconfig/20260202-060437-marostegui.json
- 06:02 marostegui: Deploy schema change on old s8 eqiad master db1193 T411164 T411163
- 05:59 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1193.eqiad.wmnet with reason: long schema change
- 05:57 marostegui@cumin1003: dbctl commit (dc=all): 'Depool db1193 T416107', diff saved to https://phabricator.wikimedia.org/P88353 and previous config saved to /var/cache/conftool/dbconfig/20260202-055755-marostegui.json
- 05:57 marostegui@cumin1003: dbctl commit (dc=all): 'Promote db1209 to s8 primary T416107', diff saved to https://phabricator.wikimedia.org/P88352 and previous config saved to /var/cache/conftool/dbconfig/20260202-055717-marostegui.json
- 05:56 marostegui: Starting s8 eqiad failover from db1193 to db1209 - T416107
- 05:53 marostegui@cumin1003: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s8 T416107
- 05:53 marostegui@cumin1003: dbctl commit (dc=all): 'Set db1209 with weight 0 T416107', diff saved to https://phabricator.wikimedia.org/P88351 and previous config saved to /var/cache/conftool/dbconfig/20260202-055304-marostegui.json
- 02:14 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 13m 22s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
2026-02-01
- 02:01 mwpresync@deploy2002: Finished scap build-images: Publishing wmf/next image (duration: 01m 14s)
- 02:00 mwpresync@deploy2002: Started scap build-images: Publishing wmf/next image
Other archives
2000s
- Archive 1: 2004 Jun - 2004 Sep
- Archive 2: 2004 Oct - 2004 Nov
- Archive 3: 2004 Dec - 2005 Mar
- Archive 4: 2005 Apr - 2005 Jul
- Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
- Archive 6: 2005 Nov - 2006 Feb
- Archive 7: 2006 Mar - 2006 Jun
- Archive 8: 2006 Jul - 2006 Sep
- Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
- Archive 10: 2007 Feb - 2007 Jun
- Archive 11: 2007 Jul - 2007 Dec
- Archive 12: 2008 Jan - 2008 Jul
- Archive 12a: 2008 Aug
- Archive 12b: 2008 Sept
- Archive 13: 2008 Oct - 2009 Jun
- Archive 14: 2009 Jun - 2009 Dec
2010s
- Archive 15: 2010 Jan - 2010 Jun
- Archive 16: 2010 Jul - 2010 Oct
- Archive 17: 2010 Nov - 2010 Dec
- Archive 18: 2011 Jan - 2011 Jun
- Archive 19: 2011 Jul - 2011 Dec
- Archive 20: 2011 Dec - 2012 Jun, with revision history 2007-02-21 to 2012-03-27
- Archive 21: 2012 Jul - 2013 Jan
- Archive 22: 2013 Jan - 2013 Jul
- Archive 23: 2013 Aug - 2013 Dec
- Archive 24: 2014 Jan - 2014 Mar
- Archive 25: 2014 April - 2014 September
- Archive 26: 2014 October - 2014 December
- Archive 27: 2015 January - 2015 July
- Archive 28: 2015 August - 2015 December
- Archive 29: 2016 January - 2016 May
- Archive 30: 2016 June - 2016 August
- Archive 31: 2016 September - 2016 December
- Archive 32: 2017 January - 2017 July
- Archive 33: 2017 August - 2017 December
- Archive 34: 2018 January - 2018 April
- Archive 35: 2018 May - 2018 August
- Archive 36: 2018 September - 2018 December
- Archive 37: 2019 January - 2019 April
- Archive 38: 2019 May - 2019 August
- Archive 39: 2019 September - 2019 December
2020-2024
- Archive 40: 2020 January - 2020 April
- Archive 41: 2020 May - 2020 July
- Archive 42: 2020 August - 2020 November
- Archive 43: 2020 December
- Archive 44: 2021 January - 2021 April
- Archive 45: 2021 May - 2021 July
- Archive 46: 2021 August - 2021 October
- Archive 47: 2021 November - 2021 December
- Archive 48: 2022 January
- Archive 49: 2022 February
- Archive 50: 2022 March
- Archive 51: 2022 April 1-15
- Archive 52: 2022 April 16-30
- Archive 53: 2022 May
- Archive 54: 2022 June
- Archive 55: 2022 July
- Archive 56: 2022 August
- Archive 57: 2022 September
- Archive 58: 2022 October
- Archive 59: 2022 November 1-15
- Archive 60: 2022 November 16-30
- Archive 61: 2022 December
- Archive 62: 2023 January
- Archive 63: 2023 February
- Archive 64: 2023 March
- Archive 65: 2023 April
- Archive 66: 2023 May
- Archive 67: 2023 June
- Archive 68: 2023 July
- Archive 69: 2023 August 1-15
- Archive 70: 2023 August 16-31
- Archive 71: 2023 September
- Archive 72: 2023 October
- Archive 73: 2023 November
- Archive 74: 2023 December
- Archive 75: 2024 January
- Archive 76: 2024 February
- Archive 77: 2024 March
- Archive 78: 2024 April
- Archive 79: 2024 May 1-15
- Archive 80: 2024 May 16-31
- Archive 81: 2024 June 1-15
- Archive 82: 2024 June 16-30
- Archive 83: 2024 July
- Archive 84: 2024 August
- Archive 85: 2024 September
- Archive 86: 2024 October
- Archive 87: 2024 November
- Archive 88: 2024 December
2025-present
- Archive 89: 2025 January
- Archive 90: 2025 February
- Archive 91: 2025 March
- Archive 92: 2025 April
- Archive 93: 2025 May
- Archive 94: 2025 June
- Archive 95: 2025 July
- Archive 96: 2025 August
- Archive 97: 2025 September
- Archive 98: 2025 October
- Archive 99: 2025 November
- Archive 100: 2025 December
- Archive 101: 2026 January
- Archive 102: 2026 February
- Archive 103: 2026 March